FAIR is the new Open when it comes to big data
When we talk about big data, it’s important to remember that bigger is not always better and open is not always enough.
CGIAR recognizes that, in order to maximize impact to advantage smallholder farmers in developing countries, decades of valuable research outputs produced by Centers and Programs must be made widely available and open.
Over the past 10 years, CGIAR has demonstrated its commitment to open access and open data (OA-OD) through the signing of CGIAR’s Open Access and Data Management Policy by Centers in 2013, the launching of and active engagement with the Open Access, Open Data (OA/OD) Initiative, and, of course, the launch of the CGIAR Platform for Big Data in Agriculture in 2017.
2018 saw the launch of the Global Agricultural Research Data Innovation Acceleration Network (GARDIAN) — a pan-CGIAR data search and discoverability portal — by the BIG DATA Platform. For the first time, GARDIAN allows users to discover and access datasets and publications across CGIAR’s 13 Centers regardless of where they are archived. GARDIAN also enables data exploration — the mapping and visual querying of certain large datasets.
The ratification of the Open Access and Data Management Policy and activities through the OA/OD Initiative accelerated CGIAR’s momentum towards Open Data in particular, with a steep increase in discoverable data across the system and in the number of open datasets.
Since 2019, GARDIAN has also made it possible to access data and publications from national, governmental, and non-governmental organizations such as USDA, the Government of India, DFID, USAID, and the World Bank. By the end of 2019, GARDIAN had facilitated the discovery of approximately 155,000 publications and 23,000 datasets from across the agricultural sector, the large majority of which are open.
This achievement and the importance of open data cannot be understated; you cannot have big data without open data.
However, when we talk about big data, it’s important to remember that bigger is not always better and open is not always enough.
Medha Devare, BIG DATA Module Lead and GARDIAN architect explained:
“Today’s digital landscape and data science capabilities offer great power and potential to solve agricultural problems. But to harness the potential of big data technologies, our data assets must be easily discoverable, downloadable, interpretable and actionable by humans as well as machines. Data must be not only open but also FAIR.”
FAIR data refers to data that is Findable, Accessible, Interoperable and Reusable.
“Reproducibility is the core of good science.” Said Brian King, Coordinator for the Platform for Big Data in Agriculture.
“There is no better way to accelerate the science the global food system needs than to ensure that research data is readily findable, accessible, interoperable, and reusable, as well as open.” He said.
The true value of GARDIAN is its capacity to enable us to reach this goal.
In a recent interview with Food Tank, Devare explained how GARDIAN addresses CGIAR Centers´ needs to search information sources across different platforms.
“Typically, each Center has at least two repositories: one for data, and another for publications, and they are on different platforms that generally don’t speak to each other. So we needed a way for people to search across Centers and repositories using single or multiple keywords—soil, water, drought-tolerant maize, you name it—to identify the resources that exist for that topic across CGIAR,” said Devare.
GARDIAN is testing the use of ontologies to allow cross-domain data querying and exploration necessary to form and address complex research questions across CGIAR.
One such ontology is the Agronomy Ontology (AgrO), which powers the Agronomy Field Information Management System (AgroFIMS), officially launched in 2019 to enable a priori harmonization with metadata and data interoperability standards. The ability to collect such interoperable, FAIR data is essential for data reuse and, increasingly, for compliance with funder mandates.
GARDIAN has several new features planned for 2020 that will further increase its capacity for impact, including: access to new data and publications sources; user-friendly workflows to make data and publications FAIRer; new high-value data products focused on climate adaptation and gender in agricultural research; more plug-and-play capabilities with the data discoverable via GARDIAN; a `GARDIAN Labs´ feature to allow communities to collaborate on tools to facilitate data processing and analysis; and workflows to help users more easily annotate metadata and datasets to render data FAIR.
On celebrating the 10th anniversary of Open Data Day, BIG DATA Platform Coordinator Brian King is convinced the best is yet to come: “We hope the next ten years will see vast amounts of data that is made not only open but FAIR, also.”
March 7, 2020
CGIAR Platform for Big Data in Agriculture