The Global Agricultural Research Data Innovation & Acceleration Network

Visit GARDIAN

GARDIAN is CGIAR´s flagship data harvester. It enables the discovery of publications and datasets from across the thirty-odd institutional publications and data repositories from CGIAR Centers and beyond. It´s a key component of the Platform´s objective to establish the infrastructures, tools, and approaches to making CGIAR data Findable, Accessible, Interoperable, Reusable (FAIR).

GARDIAN employs text-mining to enrich the associated metadata to enhance discovery, and will soon test data mining techniques with cleaned, well-annotated datasets to enhance interoperability. Plans for GARDIAN include further demonstration of the value of interoperable data via seamless interactivity of discovered data with key analytical/visualization tools, including models and maps.



GARDIAN FAIR metrics



Privacy and Ethics Guidelines



Ontologies Community of Practice

FAIR metrics

One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows.

The FAIR Data Principles is a set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable.

However, those principles are not orthogonal and have not been designed for automated machine-based evaluation. To this end, we have adopted the Netherlands Institute for Permanent Access to Digital Research Resources (DANS) metrics for FAIR compliance.

The Birth and Rise of GARDIAN

A Timeline

The Birth and Rise of GARDIAN

A Timeline

2008-2009

The idea is born

New insights can be created by applying NCBI’s approach to agricultural data

Medha Devare, author and architect of GARDIAN, began using molecular data and tools curated by the National Center for Biotechnology Information (NCBI) as a post-doctoral associate at Cornell in the early 2000s.

NCBI was formed to share data and tools that were crucial to create transformative insights and innovations in the biomedical, genomics, genetics, and allied sectors, and began enabling this open data and tooling starting in the mid-1980s. Devare realized that a similar approach could transform the agricultural domain as well.

The idea is born

New insights can be created by applying NCBI’s approach to agricultural data

2011

First, standardizing data collection

While coordinating a CIMMYT project in Nepal to improve productivity and profitability in farming systems Devare realized that most of the time, farmers had simple questions:

“Should I direct seed rice or set up a nursery and transplant rice this season?”

“If I direct seed, what variety should I use and how can I manage the crop for the highest returns?”

Working with smallholder farmers and knowing the NCBI model made clear to Devare that agricultural information could be better managed. Combined with the power of new ICT and data science tools to provide actionable insights. In turn, better data management would enable greater utilization and aggregation of data assets from different experiments, projects, and disciplines.

First, standardizing data collection

While coordinating a CIMMYT project in Nepal to improve productivity and profitability in farming systems Devare realized that most of the time, farmers had simple questions:

“Should I direct seed rice or set up a nursery and transplant rice this season?”

“If I direct seed, what variety should I use and how can I manage the crop for the highest returns?”

Although Devare tried to standardize data collection within the team in Nepal, she often encountered poorly-described data that required many days to format in order to perform meaningful analyses across study sites. It was clear that data collection needed to be standardized at collection rather than farther along the research project lifecycle.

May 2017

Go to homepage

CGIAR launches its Platform for Big Data in Agriculture

The CGIAR Platform for Big Data in Agriculture was launched, with Devare leading the Organize Module to continue to foster best practices in data management towards open and FAIR data assets across CGIAR.

March 2017

CGIAR Core Metadata Schema v.1.0 is released

As a first step towards data harmonisation and interoperability, the CGIAR Core Metadata Schema v.1.0. was released in March 2017, following a consultative process with the DMTF and OAWG members (who continue to work with the Platform for Big Data in Agriculture within the Organize Module).

Sep-Oct 2017

Meet Ceres, GARDIAN’s precursor

In September 2017, work began on CGIAR e-Research (or “CeRes”), a data discovery tool to enable the discovery of publications and datasets across 30-odd CGIAR repositories. CeRes was a first step towards enhancing the impact and innovation potential of CGIAR’s agricultural data archives.

A small team of developers from AgroKnow (currently at SCiO) with expertise in agriculture and computational biology were brought on as a partner to develop CeRes.

CeRes was unveiled in October 2017 at the first annual Big Data in Agriculture Convention held in Cali, Colombia at the International Center for Tropical Agriculture.

2017 – 2018

From CeRes to GARDIAN, CGIAR’s powerful data search engine

Nov 2017 – Sep 2018

The team worked to improve features and functionalities such as adding algorithms that look for related publications and data, ontologies to refine searches, and visual data.

The tool was renamed GARDIAN (Global Agricultural Research Data Innovation Acceleration Network). GARDIAN is used as a source of current information for the CGIAR website.

Aug 2018

As the emerging field of big data gained traction around the world, GARDIAN attracted attention from researchers, data scientists, and journalists. Devare spoke with FoodTank about building a big data platform for agriculture.

Oct 2018

A GARDIAN demonstration is given at the 2018 Big Data in Agriculture Convention session in Nairobi, Kenya at the World Agroforestry Center.

Nov 2018

The DMTF and the OAWG merge into the Data and Information Management Community of Practice during the annual meeting coordinated as part of Organize activities following the Convention, in Naivasha, Kenya.

The OAWG continues as an active working group within the Data and Information Management Community, and various aspects of data management are led by the Metadata, Ontology, and Dataverse Working Groups through regular online meetings.

The Community continues to enhance capacity around data management through a webinar series.

Early 2019

GARDIAN’s capabilities continue to expand

March 2019

Data from the United Kingdom’s Research for Development Outputs repository was integrated with GARDIAN, expanding the tool’s discovery capabilities beyond CGIAR data resources.

The GARDIAN team published guidelines on how to make data FAIR.

April 2019

Data exploration capabilities were put in place. Users can visually explore production data for 33 crops from 2005 with a variety of parameters such as rainfed or irrigated yield, harvested area, or production.

June 2019

GARDIAN and its new capabilities were launched by the Platform for Big Data in Agriculture. The data discovery tool points to nearly 100,000 publications and 3,000 data sets from the 15 CGIAR Centers and 11 Genebanks.

July 2019

The GARDIAN application program interface (API) was made publicly available, as well as an engine to detect the presence of Personally Identifiable Information (PII) in data before it is published. These features are part of a prototype of “GARDIAN Services,” which are being tested for integration into a workflow to enable users to upload data directly to a “GARDIAN repository.”

By the end of 2019, the workflow will also enable the upload of FAIR resources to other repositories.

Late 2019

GARDIAN is on track to unlock innovation for agricultural development

Sep 2019

GARDIAN appeared in FoodTank’s list of 21 tools that democratize data for farmers for equipping experts across the world with the information needed to tackle agricultural challenges.

Oct 2019

The GARDIAN team continues to improve geospatial view capabilities, including the ability to obtain summary crop production data by country or by bounding box. This will be further enhanced with crop production datasets from the 2000 and 2010 Spatial Production Allocation Model, (SPAM) with features to easily visualize temporal changes. A seven terrabyte dataset from the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) will also be visualisable through GARDIAN’s data exploration capabilities.

Moving forward, GARDIAN will continually enable wider discoverability of agricultural resources from partners and other non-CGIAR sources. Relevant data resources from other repositories such as the United States Department of Agriculture’s Ag Data Commons, PubMed Central the European Nucleotide Archive, and the Indian Council of Agriculture will be discoverable through GARDIAN searches in the coming months.

Explore GARDIAN now

Search the website

Discover agricultural data and publications

Powered by GARDIAN

Become a youth in data partner

Submit an initiative!

AgroFIMS: Your new companion for easy standardization of data collection and description

The Agronomy Field Information Management System (AgroFIMS) allows users to create fieldbooks to collect agronomic data that is already tied to a metadata standard (the CG Core Metadata Schema, aligned with the standard Dublin Core), and semantic standards like the Agronomy Ontology (AgrO), generating data that is Findable, Accessible, Interoperable, and Reusable (FAIR) at collection. AgroFIMS therefore standardizes data collection and description for easy aggregation and inter-linking across disparate datasets. The fieldbooks you create can be exported to the Android-based KDSmart data collection application, and collected data imported back to AgroFIMS for statistical analysis and reports. In 2021 AgroFIMS will allow you to set up agronomic survey questionnaires, for data collection via ODK. It will also allow easy upload of your “born FAIR” data to Dataverse repository platforms with Dublin Core-compliant metadata schemas. Funding for AgroFIMS was provided by the Bill and Melinda Gates Foundation’s Open Access, Open Data Initiative, and the CGIAR Platform for Big Data in Agriculture. AgroFIMS is under GPL license. Go to AGROFIMS →

Responsible Data Management Guidelines to protect privacy

CGIAR Platform for Big Data in Agriculture advocates open data for agricultural research for development. It considers that opening up research data for scrutiny and reuse confers significant benefits to society.

However, the Platform appreciates that not all research data can be open and that a broad range of legitimate circumstances may require data to be restricted.

As an integral component of its advocacy for open data, the Platform promotes responsible data management through the entire research data lifecycle from planning, collecting, storing, disclosing or publishing, transferring, discovery and archiving.

These guidelines were created from information collected from: review on best and emerging practices across various sectors in the fast changing landscape of privacy and ethics (130 external resources); privacy and ethic materials sourced from seven CGIAR centers; first draft was circulated for input and feedback across CGIAR and incorporated into this edition. It’s important to note that this is an evolving document, the next stage is to consult externally for further input.

These Guidelines are intended to assist agricultural researchers handle privacy and personally identifiable information (PII) in the research project data lifecycle.

Check the guidelines →

REUSE / TRANSFER

Ensure consistency with the DMP-PII and the purpose for which prior informed consent has been obtained
Revaluate likelihood of (re-)identification and risk of harm, particularly if it involves a public data-set containing PII (as above)
Ensure PII is stored securely to protect privacy (as above)
Minimize use of PII and risk of disclosure through pro-privacy access controls and analytical tools (as above)

Don’t transfer data containing PII unless have explicit consent
Don’t transfer data containing PII in the absence of a data sharing agreement identifying aspects such as purpose and scope of use, privacy protections measures, confidentiality and any limitations)
Don’t reuse or transfer PII until any inconsistencies with the DMP-PII and/or purpose compatibility have been resolved (e.g. through updated ethics review or consent from participant)

ARCHIVING / DISCARDING

Plan for archiving or data destruction early in the process. Destroying data can be more secure, however, archiving can be beneficial if the data has ongoing evidentiary, scientific or cultural value. If archiving, identify where and how, the budget require
Ensure DMP-PII and purpose compatibility (as above)
Ensure adequate security measures to protect privacy (as above)

Don’t wait until the end of the project to assess archiving needs when time and resources may be limited
Don’t assume the longevity of a particular format, future-proof your archives data
Don’t forget to budget for archiving data, this should be done as part of your Data Management Plan

PUBLISHING AND DISCOVERY

Ensure DMP-PII and purpose compatibility (as above)
Revaluate likelihood of (re-)identification and risk of harm, particularly if it involves a public data-set containing PII
Indicate in metadata the availability of raw data or minimized data containing PII, if available bilaterally
Minimize use of PII and risk of disclosure through pro-privacy access controls and analytical tools

Don’t include PII in public datasets unless absolutely necessary to preserve the data’s analytic potential, scientific utility or benefit to the participant (and subject to participants informed consent and a rigorous risk assessment)

STORAGE AND ANALYSIS

Ensure compatibility with the DMP-PII (as above) and also the purpose for which prior informed consent has been obtained

Ensure PII is stored securely to protect privacy, through organizational or project specific safeguards to prevent unauthorized access, accidental disclosure or breach of data (physical & technical)

encryption for the storage and transmission of PII
access control measures to limited access to PII
two-factor or multifactor authentication
cloud services & back-end security

Don’t store data in unsecured locations or on unsecured devices or servers

Don’t store encrypted data and encryption keys in locations where they can be easily accessed simultaneously

Don’t underestimate the importance and value of administrative safeguards to standardize practices (i.e. organizational policies, procedures and maintenance of security measures that are designed to protect private information, data and access)

COLLECTION

Ensure compatibility with the DMP-PII
De-identify data to anonymize by default unless it will impair the data’s analytic potential, scientific utility or benefit to the participant,
If you cannot anonymize, minimize the PII and pseudonymize to reduce the disclosure risk
Provide research participants sufficient information to use reasoned judgment to decide whether or not they wish to participate in the project
Ensure informed consent is designed to address the following elements:
- competence, comprehension, full disclosure, voluntariness
- legitimate scientific purpose for which the PII is collected and scope of use (e.g. stored, transferred, published and whether as anonymized, minimized or raw data)
- foreseeable risk of privacy loss and consequences
- meaningful alternatives including opt-in protection/anonymization
- safeguards to protect privacy, conditions on which PII may be shared and any limitations on reuse or third- party access and use of PII
- permission to follow-up or contact the participant and for what purpose (including by third- parties)
- participant’s right to withdraw and rights regarding their data (e.g. to be informed; to access; to rectify; to object; to erase)
- inclusion of physical, phone and/or electronic contact (at least two forms of contact) that participant can reach to exert her/rights
- explicit consent and participant’s acknowledgement of understanding
- if written, provide the participant a copy of processed informed consent
Use plain language and adapt informed consent to meet the needs of vulnerable populations (e.g. obtain orally or in local language)

Don’t collect PII unless you have a Data Management Plan and any necessary approvals in place, including the recorded approval of the potential participant
Don’t collect PII unless you absolutely need it
Don’t assume that removal of direct identifiers is sufficient to anonymize data or that all de-identification techniques will result in anonymized data. Consider the risk of re-identification of a research participant, particularly if datasets are combined. If there is a reasonable risk of re-identification the information should be handled as PII (i.e. undertake risk analysis, evaluate stronger anonymization techniques, seek informed consent for the disclosure of data and explain its possible consequences)
Don’t include vulnerable participants or communities if their ability or capacity to provide voluntary informed consent is genuinely in question
Don’t underestimate the potential of quasi or indirect identifiers to identify an individual, particularly the inherent ability of location-based data to identify participants and their communities, and the increased risk of harm this may pose to potentially vulnerable individuals/communities
Avoid seeking overly broad consent that may call into question transparency or a research participant’s understanding regarding the use of their PII, be specific regarding the activities, purpose and limitations associated with PII so that the participant can make a genuinely informed decision and downstream users can evaluate purpose compatibility and seek fresh consent if needed

PLANNING AND APPROVAL

Develop a Data Management Plan which governs the handling of PII in the research project and beyond (DMP-PII). It should address:
- the type and nature of PII
- compliance requirements (including necessary forms for obtaining consent, and ethics clearance, if applicable)
- legitimate research objectives that will be advanced by the PII
- foreseeable risks and consequences if participants are identified from the data
- privacy protection measures (or lack thereof) for collection, storage, transfer and publishing
- process for obtaining informed consent
- timeframe or trigger for archiving or deletion of PII
Employ stricter standards for research involving vulnerable populations such as children or illiterate participants or sensitive data such as ethnicity or religious beliefs
Undertake due-diligence of datasets previously collected by you or third parties to ensure you are entitled/permitted to use for your research project
Consult the legal, IRB or ethics clearance committee or any other relevant institutional group for specific institutional, local, regional or national policies and regulatory frameworks that may apply to PII in the context of your work

Don’t leave the handling of PII and privacy protection as an after-thought, plan ahead!
Don’t forget to check local laws and donor or third-party requirements in addition to institutional policies governing research ethics and privacy protection (seek expert support if unsure!)
Don’t ignore ethical practices/standards, if your institution does not have an ethics framework or clearance process in place self-assess!
In assessing whether information is capable of identifying someone (i.e. PII) don’t limit your focus to direct identifiers, also consider indirect/quasi identifiers. Appreciate this will depend on the context of the research project, the data in question and external data which is or may become otherwise available (i.e. there is no exhaustive list).
In assessing risk of harm don’t forget to consider potential harm to the participant’s community or groups of individuals that can otherwise be identified or associated with the participant

The Global Agricultural Research Data Innovation & Acceleration Network

GARDIAN FAIR metrics

Privacy and Ethics Guidelines

Ontologies Community of Practice

FAIR metrics

The Birth and Rise of GARDIAN

A Timeline

The Birth and Rise of GARDIAN

A Timeline

New insights can be created by applying NCBI’s approach to agricultural data

The idea is born

New insights can be created by applying NCBI’s approach to agricultural data

First, standardizing data collection

CGIAR launches its Platform for Big Data in Agriculture

CGIAR Core Metadata Schema v.1.0 is released

Meet Ceres, GARDIAN’s precursor

From CeRes to GARDIAN, CGIAR’s powerful data search engine

GARDIAN’s capabilities continue to expand

GARDIAN is on track to unlock innovation for agricultural development

Search the website

Discover agricultural data and publications

Powered by GARDIAN

Become a youth in data partner

Submit an initiative!

AgroFIMS: Your new companion for easy standardization of data collection and description

Responsible Data Management Guidelines to protect privacy

<img class="wp-image-93311 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/REUSE_arrow.png" alt="" width="100" height="100" />

REUSE / TRANSFER

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

ARCHIVING / DISCARDING

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class="wp-image-93312 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/rss-transparent-300x300px.png" alt="" width="100" height="100" />

PUBLISHING AND DISCOVERY

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class="wp-image-93295 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/data-analysis-icon.png" alt="" width="100" height="100" />

STORAGE AND ANALYSIS

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class=" wp-image-93249 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/data-collection-icon.png" alt="" width="100" height="75" />

COLLECTION

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class=" wp-image-93217 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/planning-icon.png" alt="" width="100" height="114" />

PLANNING AND APPROVAL

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />