OrganizeSupport and improve data generation, access, and management in CGIAR
The Platform embraces the power of big data analytics, supporting CGIAR as it becomes a leader in generating actionable data-driven insights for stakeholders.
It builds capacity throughout CGIAR to generate and manage big data, assisting CGIAR and its partners’ efforts to comply with open access / open data principles to unlock important research and datasets. It empowers researchers to strengthen data analytical capacity, developing practical big data tools and services in a coordinated way. It also addresses critical gaps, both organizational and technical, expanding the horizon of CGIAR research.
Data quality guidelines
The Data Management Strategy is based on three pillars: establishing a process, supporting compliance, and enabling a data culture in alignment with the CGIAR Open Access and Data Management (OADM) Policy. In addition, the Big Data Platform has invested in an online tool (GARDIAN) that enables users to easily search and discover open datasets and publications across databases at all CGIAR Centers, with the intention of making this a key mechanism for monitoring and measuring compliance with the CGIAR open access policy. Some key guiding principles of the plan include:
- In accordance with the CGIAR OADM Policy, the Big Data Platform is mandated to produce international public goods and ensure that these are open via FAIR principles – that is, the data are Findable, Accessible, Interoperable and Reusable. This enables the data to be used to enhance innovation, impact, and uptake.
- The Big Data Platform also provides data managers at Centers a Data Management Support Pack. This tool was designed to help the research community produce high quality, reusable, and open data from research activities. It consists of documents, templates, and videos covering a range of aspects related to data management and interoperability, ranging from overarching concepts and strategies through to day-to-day activities.
- The Big Data Platform coordinates and supports a monthly webinar series and a number of cross-Center groups and Communities of Practice. These activities are designed to support the management and “FAIRification” of information resources and has one related Community of Practice on Ontologies that helps to classify agronomic and breeding concepts and knowledge.
To monitor and accelerate CGIAR Research Centers’ progress towards making their data Findable, Accessible, Interoperable and Reusable (FAIR) the Platform developed and launched a robust prototype of the first pan-CGIAR data search tool, enabling any user to do keyword searches and discover CGIAR publications and datasets.
This tool has been named the Global Agriculture Research Data Innovation and Acceleration Network (GARDIAN).
As of August 2018 GARDIAN showcased more than 93,000 publications and 2,100 datasets; it continues to grow, with several new features planned for 2018.
How GARDIAN changes the way we search and find data
Starting from machine-readable structured data, available on the different CGIAR center repositories with an open licence, and achieving FAIRness at the metadata level is only the first step towards the grander picture of deriving bigger, actionable knowledge and value from shared research data!
- By using data mining for discovering the meaning (semantics) of data,
- and expressing them as Semantic Web resources,
- and reusing established specifications (e.g. W3C RDF).
A constant risk for the Platform is that major privacy breach of farm and farmer data creates a controversial environment for working on Big Data related efforts. To mitigate this, the Platform monitored the external environment with the goal of using proactive communications with maximum transparency to protect the reputation of the Platform. In addition, in 2017 the Platform engaged a lawyer to survey the privacy and ethics frameworks of all Centers as well as external partners, and to develop high-level risk assessments and guidelines for the Platform and the CGIAR System as a whole.
This work will be built on in 2018, to produce guidelines and actionable support for Centers and others in the sector.