The Global Agricultural Research Data Innovation & Acceleration Network

GARDIAN is CGIAR´s flagship data harvester. It enables the discovery of publications and datasets from across the thirty-odd institutional publications and data repositories from CGIAR Centers and beyond. It´s a key component of the Platform´s objective to establish the infrastructures, tools, and approaches to making CGIAR data Findable, Accessible, Interoperable, Reusable (FAIR).

GARDIAN employs text-mining to enrich the associated metadata to enhance discovery, and will soon test data mining techniques with cleaned, well-annotated datasets to enhance interoperability. Plans for GARDIAN include further demonstration of the value of interoperable data via seamless interactivity of discovered data with key analytical/visualization tools, including models and maps.

 

GARDIAN FAIR metrics

Privacy and Ethics Guidelines

Ontologies Community of Practice

FAIR metrics

One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows.

The FAIR Data Principles is a set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable.

However, those principles are not orthogonal and have not been designed for automated machine-based evaluation. To this end, we have adopted the Netherlands Institute for Permanent Access to Digital Research Resources (DANS) metrics for FAIR compliance.

The Birth and Rise of GARDIAN

A Timeline

The Birth and Rise of GARDIAN

A Timeline

2008-2009

The idea is born

New insights can be created by applying NCBI’s approach to agricultural data

Medha Devare, author and architect of GARDIAN, began using molecular data and tools curated by the National Center for Biotechnology Information (NCBI) as a post-doctoral associate at Cornell in the early 2000s.

NCBI was formed to share data and tools that were crucial to create transformative insights and innovations in the biomedical, genomics, genetics, and allied sectors, and began enabling this open data and tooling starting in the mid-1980s. Devare realized that a similar approach could transform the agricultural domain as well.

The idea is born

New insights can be created by applying NCBI’s approach to agricultural data

Medha Devare, author and architect of GARDIAN, began using molecular data and tools curated by the National Center for Biotechnology Information (NCBI) as a post-doctoral associate at Cornell in the early 2000s.

NCBI was formed to share data and tools that were crucial to create transformative insights and innovations in the biomedical, genomics, genetics, and allied sectors, and began enabling this open data and tooling starting in the mid-1980s. Devare realized that a similar approach could transform the agricultural domain as well.

2011

First, standardizing data collection

 

While coordinating a CIMMYT project in Nepal to improve productivity and profitability in farming systems Devare realized that most of the time, farmers had simple questions:

“Should I direct seed rice or set up a nursery and transplant rice this season?”

“If I direct seed, what variety should I use and how can I manage the crop for the highest returns?”

Working with smallholder farmers and knowing the NCBI model made clear to Devare that agricultural information could be better managed. Combined with the power of new ICT and data science tools to provide actionable insights. In turn, better data management would enable greater utilization and aggregation of data assets from different experiments, projects, and disciplines.

First, standardizing data collection

While coordinating a CIMMYT project in Nepal to improve productivity and profitability in farming systems Devare realized that most of the time, farmers had simple questions:

“Should I direct seed rice or set up a nursery and transplant rice this season?”

“If I direct seed, what variety should I use and how can I manage the crop for the highest returns?”

Working with smallholder farmers and knowing the NCBI model made clear to Devare that agricultural information could be better managed. Combined with the power of new ICT and data science tools to provide actionable insights. In turn, better data management would enable greater utilization and aggregation of data assets from different experiments, projects, and disciplines.

Although Devare tried to standardize data collection within the team in Nepal, she often encountered poorly-described data that required many days to format in order to perform meaningful analyses across study sites. It was clear that data collection needed to be standardized at collection rather than farther along the research project lifecycle.

May 2017

CGIAR launches its Platform for Big Data in Agriculture

The CGIAR Platform for Big Data in Agriculture was launched, with Devare leading the Organize Module to continue to foster best practices in data management towards open and FAIR data assets across CGIAR.

March 2017

CGIAR Core Metadata Schema v.1.0 is released

As a first step towards data harmonisation and interoperability, the CGIAR Core Metadata Schema v.1.0. was released in March 2017, following a consultative process with the DMTF and OAWG members (who continue to work with the Platform for Big Data in Agriculture within the Organize Module).

Sep-Oct 2017

Meet Ceres, GARDIAN’s precursor

In September 2017, work began on CGIAR e-Research (or “CeRes”), a data discovery tool to enable the discovery of publications and datasets across 30-odd CGIAR repositories. CeRes was a first step towards enhancing the impact and innovation potential of CGIAR’s agricultural data archives.

A small team of developers from AgroKnow (currently at SCiO) with expertise in agriculture and computational biology were brought on as a partner to develop CeRes.

CeRes was unveiled in October 2017 at the first annual Big Data in Agriculture Convention held in Cali, Colombia at the International Center for Tropical Agriculture.

2017 – 2018

From CeRes to GARDIAN, CGIAR’s powerful data search engine

 

Nov 2017 – Sep 2018

The team worked to improve features and functionalities such as adding algorithms that look for related publications and data, ontologies to refine searches, and visual data.

The tool was renamed GARDIAN (Global Agricultural Research Data Innovation Acceleration Network). GARDIAN is used as a source of current information for the CGIAR website.

Aug 2018

As the emerging field of big data gained traction around the world, GARDIAN attracted attention from researchers, data scientists, and journalists. Devare spoke with FoodTank about building a big data platform for agriculture.

 

Oct 2018

 A GARDIAN demonstration is given at the 2018 Big Data in Agriculture Convention session in Nairobi, Kenya at the World Agroforestry Center.

Nov 2018

The DMTF and the OAWG merge into the Data and Information Management Community of Practice during the annual meeting coordinated as part of Organize activities following the Convention, in Naivasha, Kenya.

The OAWG continues as an active working group within the Data and Information Management Community, and various aspects of data management are led by the Metadata, Ontology, and Dataverse Working Groups through regular online meetings.

The Community continues to enhance capacity around data management through a webinar series.

Early 2019

GARDIAN’s capabilities continue to expand

 

March 2019

Data from the United Kingdom’s Research for Development Outputs repository was integrated with GARDIAN, expanding the tool’s discovery capabilities beyond CGIAR data resources.

The GARDIAN team published guidelines on how to make data FAIR.

April 2019

Data exploration capabilities were put in place. Users can visually explore production data for 33 crops from 2005 with a variety of parameters such as rainfed or irrigated yield, harvested area, or production.

June 2019

GARDIAN and its new capabilities were launched by the Platform for Big Data in Agriculture. The data discovery tool points to nearly 100,000 publications and 3,000 data sets from the 15 CGIAR Centers and 11 Genebanks.

July 2019

The GARDIAN application program interface (API) was made publicly available, as well as an engine to detect the presence of Personally Identifiable Information (PII) in data before it is published. These features are part of a prototype of “GARDIAN Services,” which are being tested for integration into a workflow to enable users to upload data directly to a “GARDIAN repository.”

By the end of 2019, the workflow will also enable the upload of FAIR resources to other repositories.

Late 2019

GARDIAN is on track to unlock innovation for agricultural development

 

Sep 2019

GARDIAN appeared in FoodTank’s list of 21 tools that democratize data for farmers for equipping experts across the world with the information needed to tackle agricultural challenges.

Oct 2019

The GARDIAN team continues to improve geospatial view capabilities, including the ability to obtain summary crop production data by country or by bounding box. This will be further enhanced with crop production datasets from the 2000 and 2010 Spatial Production Allocation Model, (SPAM) with features to easily visualize temporal changes. A seven terrabyte dataset from the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) will also be visualisable through GARDIAN’s data exploration capabilities. 

Moving forward, GARDIAN will continually enable wider discoverability of agricultural resources from partners and other non-CGIAR sources. Relevant data resources from other repositories such as the United States Department of Agriculture’s Ag Data Commons, PubMed Central the European Nucleotide Archive, and the Indian Council of Agriculture will be discoverable through GARDIAN searches in the coming months.