GARDIAN and facilitating data interoperability at CGIAR
The CGIAR online search engine, GARDIAN, is easy to navigate and to perform simple queries to locate data and publications; however, there is a challenge to making these data useable on a large scale.
Combinations of large datasets could enable scientific advances in such areas as genetic modeling, management optimization, and variety selection, and may potentially reduce the need for collection of additional field experimental data. The CGIAR research centers generate large amounts of data, which could gain value through the application of the FAIR (Findable, Accessible, Interoperable, Reusable) principles, particularly for data which are suitable for quantitative analyses.
The CGIAR online search engine, GARDIAN, is easy to navigate and to perform simple queries to locate data and publications; however, there is a challenge to making these data useable on a large scale. Currently, datasets are stored in many different formats using vocabularies to describe dataset content, which is determined on an ad hoc basis by each researcher.
Data interoperability tools and standards were developed by the community of agricultural modelers associated with Agricultural Model Intercomparison and Improvement Project (AgMIP). These tools were developed to allow multiple crop models to access consistent input data regardless of source data formats and internal model requirements. The AgMIP data tools, methods, and standards have been implemented in diverse applications including multi-model assessments, desktop data translation applications, data discovery, and dissemination through Application Programming Interfaces (APIs), and large-scale modeling applications on high-performance computers.
When the ICASA Data Dictionary, adopted in the AgMIP project, is used as the definition of terms in a CGIAR dataset, existing AgMIP data translation tools allow rapid translation of data to crop model-specific formats for multiple crop models. Making data useful and combinable on a large scale using these AgMIP tools would require annotation of each dataset with terms and definitions that are in alignment with ICASA terms.
A small pilot project funded for the Crop Modeling Community of Practice, carried out by Senthold Asseng, Cheryl Porter, Gerrit Hoogenboom and Chris Villalobos from the University of Florida, demonstrated the effectiveness of AgMIP tools for enabling interoperability and reusability for CGIAR crop experimental data. This project highlighted three very different types of datasets retrieved from GARDIAN and evaluated for usefulness in crop modeling applications. The steps involved to locate, access, and prepare the data for crop models using AgMIP data translation tools were documented. Data sets which were annotated using a standard vocabulary required the least amount of effort and resources to prepare for modeling or other quantitative applications.
Two out of three of the field data sets resulting from this project have been made available to the international crop modeling community through the AgMIP Crop Site Database.
The model-ready datasets can be downloaded in the following links:
Building on this demonstration project, a process is being developed to expose data in additional CGIAR datasets using the Agronomy Ontology (AgrO) as the standard terminology reference. The goal of this effort is to facilitate data queries in GARDIAN that can include a measure of the appropriateness of each dataset for use in quantitative analyses. Each dataset will include metadata that fully describe the terminology used in that dataset with links to AgrO definitions and units. The University of Florida and AgMIP researchers are working with IFPRI to develop these annotations and variable mappings for selected datasets, with linkages to the AgMIP data translation tools in the GARDIAN APIs.
February 12, 2019