2018 Winner

Machine learning for smarter seed selection

Overview

Partners

Timeline

News

All Winners

Project overview

Partners

Timeline

News & Resources

All the Winners

The Inspire Challenge is an initiative to challenge partners, universities, and others to use CGIAR data to create innovative pilot projects that will scale. We look for novel approaches that democratize data-driven insights to inform local, national, regional, and global policies and applications in agriculture and food security in real time; helping people–especially smallholder farmers and producers–to lead happier and healthier lives.

This proposal was selected as a 2018 winner, with the team receiving 100,000 USD to put their ideas into practice.

Machine learning for smarter seed selection

Each year, farmers around the world decide what to plant in their fields. The seeds they choose matter greatly; livelihoods and food security, for the farmer and others linked by the global food system, are at stake. While some seed varieties have the potential to produce record-breaking yields, they also carry risk and depend on specific conditions. Other seed varieties have more stable, but comparatively lower yields.

Using machine learning, researchers can predict both yields and risks associated with different seeds at a specific farm and select a mixture of varieties that represents the optimal trade-off. Using data from hundreds of on-farm and experimental International Maize and Wheat Improvement Center (CIMMYT) sites, as well as a network of seed companies producing varieties for diverse agro-ecologies, BioSense will develop machine learning models that predict the performance of seed varieties in particular conditions in order to advise maize farmers in Mexico on what to plant.

The project benefits seed companies and farmers alike. Seed companies can ensure that they sell the best seed variety possible for a specific region, which reduces the risk of marketing the wrong type of seed to the target population. In turn, this reduces farmers’ risk of low crop yield or crop failure. It also minimizes risk for government programs that subsidize maize seed to support smallholder farmers. As smallholder farmers’ production increases, so will Mexico’s maize self-sufficiency.

Finally, the project will serve as a decision support tool. Through climate prediction data, seed companies and farmers can plan for the use of seeds best suited to changing, future climates. This service will be provided for free to the seed companies involved with the project–most of which are small and otherwise lack the capital needed to access decision support services.

Team

Dr Sanja Brdar | Email
Head of Knowledge Technologies Group at BioSense Institute

Marko Panić | Email
Machine Learning Expert at BioSense Institute

Oskar Marko | Email
Data Analytics Expert at BioSense Institute

Gordan Mimifá
Weather and Climate Expert at BioSense Institute

Dr Kai Sonder | Email
Head of GIS Unit at CIMMYT

Dr Alberto Chassaign
Maize Seed systems specialist for Latin America at CIMMYT

Project Partners

CIMMYT provides maize growth and yield data from hundreds of on-farm and experimental sites.

BioSense is developing machine learning models that predict the performance of maize seed varieties in particular conditions through the use of deep learning, big data analytics, and satellite data processing.

Step by step



October 2018

US$100K grant

The project was one of five winners of the Inspire Challenge 2018 and was awarded US$100K at the second annual convention of the CGIAR Platform for Big Data in Agriculture, 3-5 October 2018.



February 2019

Database formed

The team created a database with all the necessary data about maize production in Mexico. It includes satellite images, performance data, and weather information.

Example of satellite image data with normalized difference vegetation index calculated.



April 2019

Digitisation of field records

Hundreds of samples about maize growth and agricultural practice on CIMMYT’s test fields were integrated into the database.

Samples of new maize varieties were collected from 126 evaluation sites in 2017 and from 115 in 2018.

Project partner Biosense used machine learning techniques to determine which of the maize varieties would work best at any of these sites, comparing climate, soils, and other parameters. If the developed algorithm is successful, the team hopes to use it in other countries in order to give a variety of recommendations to farmers in specific locations.

CIMMYT’s maize testing sites from which data was collected for the team’s Mexico maize production database.



April 2019

Dataset enrichment

The team developed modules for automatic acquisition of soil data, historical weather data, and satellite vegetation indices based on the GPS location.



April 2019

Feature engineering

The team then engineered meteorological and satellite features that are good indicators of yield, based on the domain knowledge.

April 2019

Data fusion

The team fused data coming from heterogeneous sources into a single database suitable for machine learning.



May - August 2019

Machine learning

A data-driven yield prediction model will be developed based on soil and climate data, as well as a model for assessing yield stability—the hybrids’ response to drought and other factors.

August - September 2019

Development of seed selection algorithm

A module for finding the portfolio of hybrids with optimal yield/risk trade-off at the particular farm will be developed. Fertiliser application will be optimised using machine learning and evolutionary algorithms.

October 2019

Evaluation

The system will be thoroughly tested and its performance will be evaluated. The team will then assess its potential for scaling up on a global level.



Stay tuned for more updates!

Project News and Resources

Machine learning for smarter seed selection to reduce risks for Mexican maize farmers

Published on maize.org. The International Maize and Wheat Improvement Center (CIMMYT) and BioSense Institute jointly won the CGIAR Platform for Big Data in Agriculture Inspire ...

Drones fly in to change the way we work in research fields

Published on https://www.icrisat.org The future of drones in agriculture is a subject undergoing intense study. In line with it a ...

Now, get critical seed data in one click

Published on https://www.icrisat.org A modern digital seed ‘catalog’ and seed ‘roadmap’ tool is now available for information about the quality and ...

Multinational training workshop on analytical tools builds momentum on crop improvement

Published on https://www.icrisat.org 24 Dec 2018 It is a really exciting time for crop improvement with new tools available and ...

Digital agriculture holds great promise for agricultural development

By Dr Anthony Whitbread Published on https://www.icrisat.org 30 Nov 2018 ICT4D is a one-of-its-kind conference that explores the use of ...

Meet all the Winners

Inspire Winner 2019

Gamifying weather forecasting: “Let it rain” campaign

Inspire Winner 2019

Hungry cities: Inclusive food markets in Africa

Inspire Winner 2019

Rapid genomic detection of aquaculture pathogens

Inspire Winner 2019

Real-time East Africa live groundwater use database

Revealing informal food flows through free WiFi

Inspire Winner 2018

Machine learning for smarter seed selection

Seeing is believing – Using smartphone camera data

Inspire Winner 2018

CubicA: The new farmer advisory app

An integrated data pipeline for smallscale fisheries

MARPLE: Real time diagnostics for devastating wheat rust

Farm.ink: Analysing livestock social media data for farmer chatbot

Using commercial microwave links (CMLs) to estimate rainfalls

PlantVillage Nuru: Pest and disease monitoring using artificial intelligence

Inspire Winner 2017

Using IVR to connect farmers to market

Search the website

Discover agricultural data and publications

Powered by GARDIAN

Become a youth in data partner

Submit an initiative!

AgroFIMS: Your new companion for easy standardization of data collection and description

The Agronomy Field Information Management System (AgroFIMS) allows users to create fieldbooks to collect agronomic data that is already tied to a metadata standard (the CG Core Metadata Schema, aligned with the standard Dublin Core), and semantic standards like the Agronomy Ontology (AgrO), generating data that is Findable, Accessible, Interoperable, and Reusable (FAIR) at collection. AgroFIMS therefore standardizes data collection and description for easy aggregation and inter-linking across disparate datasets. The fieldbooks you create can be exported to the Android-based KDSmart data collection application, and collected data imported back to AgroFIMS for statistical analysis and reports. In 2021 AgroFIMS will allow you to set up agronomic survey questionnaires, for data collection via ODK. It will also allow easy upload of your “born FAIR” data to Dataverse repository platforms with Dublin Core-compliant metadata schemas. Funding for AgroFIMS was provided by the Bill and Melinda Gates Foundation’s Open Access, Open Data Initiative, and the CGIAR Platform for Big Data in Agriculture. AgroFIMS is under GPL license. Go to AGROFIMS →

Responsible Data Management Guidelines to protect privacy

CGIAR Platform for Big Data in Agriculture advocates open data for agricultural research for development. It considers that opening up research data for scrutiny and reuse confers significant benefits to society.

However, the Platform appreciates that not all research data can be open and that a broad range of legitimate circumstances may require data to be restricted.

As an integral component of its advocacy for open data, the Platform promotes responsible data management through the entire research data lifecycle from planning, collecting, storing, disclosing or publishing, transferring, discovery and archiving.

These guidelines were created from information collected from: review on best and emerging practices across various sectors in the fast changing landscape of privacy and ethics (130 external resources); privacy and ethic materials sourced from seven CGIAR centers; first draft was circulated for input and feedback across CGIAR and incorporated into this edition. It’s important to note that this is an evolving document, the next stage is to consult externally for further input.

These Guidelines are intended to assist agricultural researchers handle privacy and personally identifiable information (PII) in the research project data lifecycle.

Check the guidelines →

REUSE / TRANSFER

Ensure consistency with the DMP-PII and the purpose for which prior informed consent has been obtained
Revaluate likelihood of (re-)identification and risk of harm, particularly if it involves a public data-set containing PII (as above)
Ensure PII is stored securely to protect privacy (as above)
Minimize use of PII and risk of disclosure through pro-privacy access controls and analytical tools (as above)

Don’t transfer data containing PII unless have explicit consent
Don’t transfer data containing PII in the absence of a data sharing agreement identifying aspects such as purpose and scope of use, privacy protections measures, confidentiality and any limitations)
Don’t reuse or transfer PII until any inconsistencies with the DMP-PII and/or purpose compatibility have been resolved (e.g. through updated ethics review or consent from participant)

ARCHIVING / DISCARDING

Plan for archiving or data destruction early in the process. Destroying data can be more secure, however, archiving can be beneficial if the data has ongoing evidentiary, scientific or cultural value. If archiving, identify where and how, the budget require
Ensure DMP-PII and purpose compatibility (as above)
Ensure adequate security measures to protect privacy (as above)

Don’t wait until the end of the project to assess archiving needs when time and resources may be limited
Don’t assume the longevity of a particular format, future-proof your archives data
Don’t forget to budget for archiving data, this should be done as part of your Data Management Plan

PUBLISHING AND DISCOVERY

Ensure DMP-PII and purpose compatibility (as above)
Revaluate likelihood of (re-)identification and risk of harm, particularly if it involves a public data-set containing PII
Indicate in metadata the availability of raw data or minimized data containing PII, if available bilaterally
Minimize use of PII and risk of disclosure through pro-privacy access controls and analytical tools

Don’t include PII in public datasets unless absolutely necessary to preserve the data’s analytic potential, scientific utility or benefit to the participant (and subject to participants informed consent and a rigorous risk assessment)

STORAGE AND ANALYSIS

Ensure compatibility with the DMP-PII (as above) and also the purpose for which prior informed consent has been obtained

Ensure PII is stored securely to protect privacy, through organizational or project specific safeguards to prevent unauthorized access, accidental disclosure or breach of data (physical & technical)

encryption for the storage and transmission of PII
access control measures to limited access to PII
two-factor or multifactor authentication
cloud services & back-end security

Don’t store data in unsecured locations or on unsecured devices or servers

Don’t store encrypted data and encryption keys in locations where they can be easily accessed simultaneously

Don’t underestimate the importance and value of administrative safeguards to standardize practices (i.e. organizational policies, procedures and maintenance of security measures that are designed to protect private information, data and access)

COLLECTION

Ensure compatibility with the DMP-PII
De-identify data to anonymize by default unless it will impair the data’s analytic potential, scientific utility or benefit to the participant,
If you cannot anonymize, minimize the PII and pseudonymize to reduce the disclosure risk
Provide research participants sufficient information to use reasoned judgment to decide whether or not they wish to participate in the project
Ensure informed consent is designed to address the following elements:
- competence, comprehension, full disclosure, voluntariness
- legitimate scientific purpose for which the PII is collected and scope of use (e.g. stored, transferred, published and whether as anonymized, minimized or raw data)
- foreseeable risk of privacy loss and consequences
- meaningful alternatives including opt-in protection/anonymization
- safeguards to protect privacy, conditions on which PII may be shared and any limitations on reuse or third- party access and use of PII
- permission to follow-up or contact the participant and for what purpose (including by third- parties)
- participant’s right to withdraw and rights regarding their data (e.g. to be informed; to access; to rectify; to object; to erase)
- inclusion of physical, phone and/or electronic contact (at least two forms of contact) that participant can reach to exert her/rights
- explicit consent and participant’s acknowledgement of understanding
- if written, provide the participant a copy of processed informed consent
Use plain language and adapt informed consent to meet the needs of vulnerable populations (e.g. obtain orally or in local language)

Don’t collect PII unless you have a Data Management Plan and any necessary approvals in place, including the recorded approval of the potential participant
Don’t collect PII unless you absolutely need it
Don’t assume that removal of direct identifiers is sufficient to anonymize data or that all de-identification techniques will result in anonymized data. Consider the risk of re-identification of a research participant, particularly if datasets are combined. If there is a reasonable risk of re-identification the information should be handled as PII (i.e. undertake risk analysis, evaluate stronger anonymization techniques, seek informed consent for the disclosure of data and explain its possible consequences)
Don’t include vulnerable participants or communities if their ability or capacity to provide voluntary informed consent is genuinely in question
Don’t underestimate the potential of quasi or indirect identifiers to identify an individual, particularly the inherent ability of location-based data to identify participants and their communities, and the increased risk of harm this may pose to potentially vulnerable individuals/communities
Avoid seeking overly broad consent that may call into question transparency or a research participant’s understanding regarding the use of their PII, be specific regarding the activities, purpose and limitations associated with PII so that the participant can make a genuinely informed decision and downstream users can evaluate purpose compatibility and seek fresh consent if needed

PLANNING AND APPROVAL

Develop a Data Management Plan which governs the handling of PII in the research project and beyond (DMP-PII). It should address:
- the type and nature of PII
- compliance requirements (including necessary forms for obtaining consent, and ethics clearance, if applicable)
- legitimate research objectives that will be advanced by the PII
- foreseeable risks and consequences if participants are identified from the data
- privacy protection measures (or lack thereof) for collection, storage, transfer and publishing
- process for obtaining informed consent
- timeframe or trigger for archiving or deletion of PII
Employ stricter standards for research involving vulnerable populations such as children or illiterate participants or sensitive data such as ethnicity or religious beliefs
Undertake due-diligence of datasets previously collected by you or third parties to ensure you are entitled/permitted to use for your research project
Consult the legal, IRB or ethics clearance committee or any other relevant institutional group for specific institutional, local, regional or national policies and regulatory frameworks that may apply to PII in the context of your work

Don’t leave the handling of PII and privacy protection as an after-thought, plan ahead!
Don’t forget to check local laws and donor or third-party requirements in addition to institutional policies governing research ethics and privacy protection (seek expert support if unsure!)
Don’t ignore ethical practices/standards, if your institution does not have an ethics framework or clearance process in place self-assess!
In assessing whether information is capable of identifying someone (i.e. PII) don’t limit your focus to direct identifiers, also consider indirect/quasi identifiers. Appreciate this will depend on the context of the research project, the data in question and external data which is or may become otherwise available (i.e. there is no exhaustive list).
In assessing risk of harm don’t forget to consider potential harm to the participant’s community or groups of individuals that can otherwise be identified or associated with the participant

2018 Winner

Machine learning for smarter seed selection

Team

Project Partners

Step by step

US$100K grant

Database formed

Digitisation of field records

Dataset enrichment

Feature engineering

Data fusion

Machine learning

Development of seed selection algorithm

Evaluation

Stay tuned for more updates!

Project News and Resources

Meet all the Winners

Search the website

Discover agricultural data and publications

Powered by GARDIAN

Become a youth in data partner

Submit an initiative!

AgroFIMS: Your new companion for easy standardization of data collection and description

Responsible Data Management Guidelines to protect privacy

<img class="wp-image-93311 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/REUSE_arrow.png" alt="" width="100" height="100" />

REUSE / TRANSFER

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

ARCHIVING / DISCARDING

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class="wp-image-93312 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/rss-transparent-300x300px.png" alt="" width="100" height="100" />

PUBLISHING AND DISCOVERY

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class="wp-image-93295 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/data-analysis-icon.png" alt="" width="100" height="100" />

STORAGE AND ANALYSIS

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class=" wp-image-93249 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/data-collection-icon.png" alt="" width="100" height="75" />

COLLECTION

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />

<img class=" wp-image-93217 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/planning-icon.png" alt="" width="100" height="114" />

PLANNING AND APPROVAL

<img class="alignnone size-full wp-image-92805 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/tips-icon-orange-100px.png" alt="" width="100" height="100" />

<img class=" wp-image-93476 aligncenter" src="https://bigdata.cgiar.org/wp-content/uploads/2019/01/DONT-DO-ICON.png" alt="" width="100" height="100" />