OrganizeSupport and improve data generation, access, and management in CGIAR
The Platform embraces the power of big data analytics, supporting CGIAR as it becomes a leader in generating actionable data-driven insights for stakeholders.
It builds capacity throughout CGIAR to generate and manage big data, assisting CGIAR and its partners’ efforts to comply with open access / open data principles to unlock important research and datasets. It empowers researchers to strengthen data analytical capacity, developing practical big data tools and services in a coordinated way. It also addresses critical gaps, both organizational and technical, expanding the horizon of CGIAR research.
Data quality at the source
The Data Management Strategy is based on three pillars: establishing a process, supporting compliance, and enabling a data culture in alignment with the CGIAR Open Access and Data Management (OADM) Policy. In addition, the Big Data Platform has invested in an online tool (GARDIAN) that enables users to easily search and discover open datasets and publications across databases at all CGIAR Centers, with the intention of making this a key mechanism for monitoring and measuring compliance with the CGIAR open access policy. Some key guiding principles of the plan include:
- In accordance with the CGIAR OADM Policy, the Big Data Platform is mandated to produce international public goods and ensure that these are open via FAIR principles – that is, the data are Findable, Accessible, Interoperable and Reusable (FAIR). This enables the data to be used to enhance innovation, impact, and uptake.
- The Big Data Platform also provides data managers at Centers a Data Management Support Pack. This tool was designed to help the research community produce high quality, reusable, and open data from research activities. It consists of documents, templates, and videos covering a range of aspects related to data management and interoperability, ranging from overarching concepts and strategies through to day-to-day activities.
- The Big Data Platform coordinates and supports a monthly webinar series and a number of cross-Center groups and Communities of Practice. These activities are designed to support the management and “FAIRification” of information resources and has one related Community of Practice on Ontologies that helps to classify agronomic and breeding concepts and knowledge.
To monitor and accelerate CGIAR Research Centers’ progress towards making their data FAIR, the Platform developed and launched a robust pan-CGIAR data search and discoverability portal. For the first time datasets, publications, and crop varieties across all 15 centers and 11 gene banks of the CGIAR can be easily found.
This data harvesting tool, named the Global Agriculture Research Data Innovation and Acceleration Network (GARDIAN), provides easy access to thousands of datasets across all CGIAR centers to enable new analyses and enhance innovation. It will soon facilitate data discovery from other relevant data repositories, and allow users to upload data to be rendered FAIR. GARDIAN also enables data visualization and mapping, with pipelines for data analysis and modeling to be made available in 2019.
Today, there are nearly 100,000 publications and 3,000 datasets in GARDIAN. It continues to grow, with several new features planned for 2019.
Starting from machine-readable structured data, available on the different CGIAR center repositories with an open license, achieving FAIRness at the metadata level, is only the first step towards deriving bigger, actionable knowledge and value from shared research data.
GARDIAN changes the way we search and find data
- By using data mining for discovering the meaning (semantics) of data,
- By expressing them as Semantic Web resources, and
- By reusing established specifications (e.g. W3C RDF).
Privacy & Ethics Guidelines
While the enthusiasm for data sharing grows, we have been working to ensure that data sharing and use comply with ethical standards that protect those who could be vulnerable to exploitation. An ongoing predicament is how to protect private farm and farmer data while being able to provide them with valuable personalised solutions. To mitigate this, the Platform has developed a high-level guidelines resource for the Platform and the CGIAR System as a whole.
In 2017 the Platform engaged a lawyer to survey the privacy and ethics frameworks of all Centers as well as external partners. In 2018 we completed surveys of each of the 15 centers’ privacy and ethics standards and in 2018 released the first set of guidelines to assist researchers to navigate the evolving implications of technology, confidentiality, intellectual property, consent, access and sharing of benefits.
AgroFIMS: Standardization of data collection and description
The Agronomy Field Information Management System (AgroFIMS), developed on CGIAR’s HiDAP (Highly-interactive Data Analysis Platform) created by CGIAR’s International Potato Center (CIP), draws fully on ontologies, particularly the Agronomy Ontology and the Crop Ontology. It consists of modules that represent the typical cycle of operations in agronomic trial management, and enables the creation of data collection sheets and digital data collection using the same ontology-based set of variables, terminology, units and protocols.
Standardizes data collection and annotation
Enables digital collection of agronomy trial data
Allows data quality checks, statistical reports
Aligns a priori with CGIAR’s CG Core metadata schema
Funding for AgroFIMS was provided by the Bill and Melinda Gates Foundation’s Open Access, Open Data Initiative, and the CGIAR Big Data Platform.
A beta version will be released mid 2019 for field testing and digital data collection using the KDSmart application developed in collaboration with Diversity Arrays Technology.
Learning and Capacity Building
The Platform is investing in learning and capacity building initiatives to accelerate data sharing as well as key analytic capabilities across the CGIAR. These are being delivered over a combination of in-person and online channels to raise the awareness and capability of centers to share data and to use emerging data analysis techniques.
As a result of allocating a series of grants to each of our 15 partner centers, we have seen a series of inspiring trends emerge as each center implements strategies to mobilise data.
- Increased investment in developing data repositories and software infrastructure to build open data sharing and storage capabilities
- Investment in new staff roles for data curation, collection, and analysis, while providing additional training for current staff
- Reallocation of staff resources to collect, store, and unlock data.
The Platform hosts monthly webinars for CGIAR members, discussing how to engage and improve upon capacity building initiatives. Contact firstname.lastname@example.org for more details.