Rapid genomic detection of aquaculture pathogens
The Inspire Challenge is an initiative to challenge partners, universities, and others to use CGIAR data to create innovative pilot projects that will scale. We look for novel approaches that democratize data-driven insights to inform local, national, regional, and global policies and applications in agriculture and food security in real time; helping people–especially smallholder farmers and producers–to lead happier and healthier lives.
This proposal was selected as a 2019 winner, with the team receiving 100,000 USD to put their ideas into practice.
Aquaculture is the world’s fastest growing food sector increasingly and is recognized for its potential to alleviate poverty and hunger in small-scale systems. However, progress is limited by diseases and lack of knowledge and tools to identify fish pathogens, track their origin and manage their spread. Whole genome sequencing informs how pathogens change and move through environments, permitting implementation of evidence-based biosecurity to minimize disease impact. Offsite sequencing services are expensive and cause prohibitive delays. The project proposes leveraging offline supervised machine learning associated with the MinION portable sequencing device for low-cost diagnostics of fish pathogens in remote locations, allowing real-time disease investigation and data-driven management.
“Rapid genomic detection of aquaculture pathogens” will pilot a readily deployable “lab-in-a-backpack” for pond-side identification and quantitation of pathogens affecting tilapia. Equipped with a portable DNA-extraction system, a hand-held DNA sequencer (MinION), a battery-operated minicomputer (MinIT) and an intuitive purpose-built software package, users without experience in molecular biology/bioinformatics will be able to identify fish pathogens from both water samples and infected tissues remotely and in real-time with limited electricity and internet connectivity. Causal agents of disease outbreaks will be identified in a fraction of the time and cost required for external laboratory analysis (hours vs. weeks/months; $40 vs. >$100/sample). Targeted end-users are tilapia breeding/quarantine/biosecurity centers, academics, vets and SME vaccine producers.
The primary drawback of the ONT sequencing technology is its base-calling error rate (<5 %) that limits applicability for accurate pathogen identification essential for veterinary epidemiology. To address this shortfall, we will develop a new bioinformatics software package that leverages supervised machine learning to identify fish pathogens using HMM profiles. These models provide a position-specific scoring system that can account for base-calling inaccuracies and will be trained on sequences from isoclonal pathogens obtained using the MinION. This will enable pathovar classification for new sequences derived from target pathogens in spite of the high error rates associated with the MinION. This pipeline is already under development and will be made publicly available as an open-source R package and R Shiny GUI on GitHub and the Comprehensive R Archive Network (CRAN) upon the completion of satisfactory reference bench-marking.
Step by step
The project was one of five winners of the Inspire Challenge 20179 and was awarded US$100K at the inaugural annual convention of the CGIAR Platform Big Data in Agriculture, during 16-18 of October.