Rapid genomic detection of aquaculture pathogens
The Inspire Challenge is an initiative to challenge partners, universities, and others to use CGIAR data to create innovative pilot projects that will scale. We look for novel approaches that democratize data-driven insights to inform local, national, regional, and global policies and applications in agriculture and food security in real time; helping people–especially smallholder farmers and producers–to lead happier and healthier lives.
This proposal was selected as a 2019 winner, with the team receiving 100,000 USD to put their ideas into practice.
Aquaculture, the farming of aquatic organisms in both coastal and inland areas, accounts for 50 percent of the world’s fish that is used for food today. It is practiced by both some of the poorest farmers in developing countries and by multinational companies.
However, development of aquaculture systems is often limited by fish diseases and a lack of knowledge and tools to identify fish pathogens, track their origin, and manage their spread.
Whole genome sequencing informs how pathogens change and move through environments, permitting implementation of evidence-based biosecurity to minimize disease impact.
Offsite sequencing services are expensive and cause prohibitive delays. Therefore, the project proposes leveraging offline supervised machine learning associated with the MinION portable sequencing device for low-cost diagnostics of fish pathogens in remote locations, allowing real-time disease investigation and data-driven management.
The project will pilot a readily deployable “lab-in-a-backpack” for pond-side identification and quantitation of pathogens affecting tilapia. Equipped with a portable DNA-extraction system, a hand-held DNA sequencer (MinION), a battery-operated minicomputer (MinIT), and an intuitive purpose-built software package, users without experience in molecular biology or bioinformatics will be able to identify fish pathogens from both water samples and infected tissues remotely and in real-time, with limited electricity and internet connectivity.
These tools will enable tilapia breeding, quarantine, and biosecurity centers, as well as academics and vets, to identify causal agents of disease outbreaks in a fraction of the time and cost required for external laboratory analysis; the project’s tests give results in hours rather than weeks or months and cost roughly 40 USD as opposed to more than 100 USD.
Learn more about the project in this WorldFish video:
Step by step
The project was one of four winners of the Inspire Challenge 2019 and was awarded US$100K at the Convention of the CGIAR Platform for Big Data in Agriculture, during 16-18 October, 2019.
Bacterial genomes sequencing and expansion of the team
The team has sequenced 30 bacterial genomes and welcomed a new PhD student, Suvra Das, from Bangladesh, to the team.
Under the primary supervision of Associate Professor Andrew Barnes at the University of Queensland, and co-supervision of Dr Jerome Delamare-Deboutteville at WorldFish, and Dr Shaun Wilkinson from Wilderlab, Suvra will research processing methods for DNA extraction and library preparation to optimize cost and performance of field sequencing tests.
Generation of aquatic pathogen genomic typing data
The team will complete 50 bacterial genome sequences, generating two types of data:
- Highly accurate sequence data for all target aquatic pathogens derived from long and short read sequencing that will be used to build the reference training database for machine learning algorithms.
- Raw nanopore read data for model development. These data will be generated at the University of Queensland, Mahidol University / BIOTEC’s CENTEX Shrimp, and WorldFish.
Optimisation of field data acquisition and upload methodology
The team will compare sample collection and processing methods to optimise cost and performance of the field sequencing workflows.
Sample extraction and library preparation and indexing methods will be compared to ensure that they can be completed in semi-remote locations.
Building a software environment for typing pathogens from fuzzy data
Additionally, as a way of addressing the base-calling error rate (<5 percent) of the MinION sequencing technology, the team will develop a new bioinformatics software package that leverages machine learning to identify fish pathogens.
Two approaches will be compared. In the first approach, hidden Markov models (HMMs) will be used to compare experimental data to a reference database of hierarchical regions of differentiation. The second approach considers that all genomic regions provide information on strain type. Therefore, a rapid alignment method can be used to bin query samples probabilistically with the correct strain or type.
These models provide a position-specific scoring system that can account for base-calling inaccuracies and will be trained on sequences from isoclonal pathogens obtained using the MinION.
This pipeline is already under development and will be made publicly available as an open-source R package and R Shiny GUI on GitHub and the Comprehensive R Archive Network (CRAN) upon the completion of satisfactory reference bench-marking.
Developing a manual and training modules for field samplers
The team will build training modules for field-based samplers composed of factsheets, short video tutorials, and easy-to-follow protocols for end-user software interface for the data-outputs.
The manual will cover the entire process from biological sample collection to performing a sequencing run for analysis.
Mid-scale field deployment and testing
Community engagement sessions will be organized with farmers and health specialists in Bangladesh and/or Malaysia to promote the benefits of the technology.
Field sampling kits and data upload interface will be deployed on a farm experiencing an outbreak caused by an know bacterial disease by a known disease to showcase the technology and answer a defined epidemiological research question.