Application of Deep Learning in Microbiome
Keywords:Microbiome, Deep learning, Phylogeny
AbstractWith the rapid development of high-throughput sequencing technology, massive microbial data has been accumulated. The understanding of the microbial data could help us to find the relationships between microbes and diseases. However, due to the high dimensionality, sparseness, and complexity of the data, traditional machine learning methods have insufficient learning and representational ability. Meanwhile, the rise of deep learning enables us to deal with these complex problems effectively. In this survey, we introduce the application of machine learning in microbial data analysis and focus on microbial classification and feature selection tasks. In particular, we discuss the current application and challenges of deep learning in microbial studies. Based on these discussions, we recommend that before using deep learning to conduct microbiome-wide association studies, it is essential to consider prior knowledge such as phylogeny, which would improve the accuracy and interpretability of the model.
P.J. Turnbaugh, R.E. Ley, M. Hamady, C.M. Fraser-Liggett, R. Knight, J.I. Gordon, The human microbiome project, Nature. 449 (2007), 804–810.
J.C. Clemente, L.K. Ursell, L.W. Parfrey, R. Knight, The impact of the gut microbiota on human health: an integrative view, Cell. 148 (2012), 1258–1270.
S.M. Collins, M. Surette, P. Bercik, The interplay between the intestinal microbiota and the brain, Nat. Rev. Microbiol. 10 (2012), 735–742.
I.B. Jeffery, P.W. O’toole, L. Öhman, M.J. Claesson, J. Deane, E.M.M. Quigley, M. Simrén, An irritable bowel syndrome subtype defined by species-specific alterations in faecal microbiota, Gut. 61 (2012), 997–1006.
T. Yang, E.M. Richards, C.J. Pepine, M.K. Raizada, The gut microbiota and the brain–gut–kidney axis in hypertension and chronic kidney disease, Nat. Rev. Nephrol. 14 (2018), 442–456.
E.A. Grice, J.A. Segre, The human microbiome: our second genome, Ann. Rev. Genomics Hum. Genet. 13 (2012), 151–170.
J. Qin, R. Li, J. Raes, T. Arumugam, et al., A human gut microbial gene catalogue established by metagenomic sequencing, Nature. 464 (2010), 59–65.
C. Huttenhower, D. Gevers, R. Knight, et al., Structure, function and diversity of the healthy human microbiome, Nature. 486 (2012), 207.
J.A. Gilbert, R.A. Quinn, J. Debelius, Z.Z. Xu, J. Morton, N. Garg,J.K. Jansson, P.C. Dorrestein, R. Knight, Microbiome-wide association studies link dynamic microbial consortia to disease, Nature. 535 (2016), 94–103.
E. Thursby, N. Juge, Introduction to the human gut microbiota, Biochem. J. 474 (2017), 1823–1836.
S. Behjati, P.S. Tarpey, What is next generation sequencing?, Arch. Dis. Child. Educ. Pract. 98 (2013), 236–238.
H. Teeling, F.O. Glöckner, Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective, Brief. Bioinformatics. 13 (2012), 728–742.
P. Domingos, A few useful things to know about machine learning, Commun. ACM. 55 (2012), 78–87.
Y. LeCun, Y. Bengio, G. Hintonm , Deep learning, Nature. 521 (2015), 436–444.
D. Knights, E.K. Costello, R. Knight, Supervised classification of human microbiota, FEMS Microbiol. Rev. 35 (2011), 343–359.
T.H. Clarke, A. Gomez, H. Singh, K.E. Nelson, L.M. Brinkac, Integrating the microbiome as a resource in the forensics toolkit, Forensic Sci. Int. Genetics. 30 (2017), 141–147.
X. Hao, R. Jiang, T. Chen, Clustering 16s rRNA for OTU prediction: a method of unsupervised Bayesian clustering, Bioinformatics. 27 (2011), 611–618.
A. Statnikov, M. Henaff, V. Narendra, K. Konganti, Z. Li, L. Yang, Z. Pei, M.J. Blaser, C.F. Aliferis, A.V. Alekseyenko, A comprehensive evaluation of multicategory classification methods for microbiomic data, Microbiome. 1 (2013), 11.
M. Yazdani, B.C. Taylor, J.W. Debelius, W. Li, R. Knight, L. Smarr, Using machine learning to identify major shifts in human gut microbiome protein family abundance in disease, in 2016 IEEE International Conference on Big Data (Big Data), IEEE, Washington, DC, USA, 2016, pp. 1272–1280.
J. Tap, M. Derrien, H. Törnblom, R. Brazeilles, S. Cools-Portier, J. Doré, S. Störsrud, B.L. Nevé, L. Öhman, M. Simrén, Identification of an intestinal microbiota signature associated with severity of irritable bowel syndrome, Gastroenterology. 152 (2017), 111–123.
E. Pasolli, D.T. Truong, F. Malik, L. Waldron, N. Segata, Machine learning meta-analysis of large metagenomic datasets: tools and biological insights, PLoS Comput. Biol. 12 (2016), e1004977.
P. Mamoshina, A. Vieira, E. Putin, A. Zhavoronkov, Applications of deep learning in biomedicine, Mol. Pharm. 13 (2016), 1445–1454.
C. Angermueller, T. Pärnamaa, L. Parts, O. Stegle, Deep learning for computational biology, Mol. Syst. Biol. 12 (2016), 878.
M. Wainberg, D. Merico, A. Delong, B.J. Frey, Deep learning in biomedicine, Nat. Biotechnol. 36 (2018), 829–838.
D.M. Camacho, K.M. Collins, R.K. Powers, J.C. Costello, J.J. Collins, Next-generation machine learning for biological networks, Cell. 173 (2018), 1581–1592.
G. Eraslan, Ž. Avsec, J. Gagneur, F.J. Theis, Deep learning: new computational modelling techniques for genomics, Nat. Rev. Genetics. 20 (2019), 389–403.
J.A. Gilbert, J.K. Jansson, R. Knight, The earth microbiome project: successes and aspirations, BMC Biol. 12 (2014), 69.
J.A. Navas-Molina, E.R. Hyde, J.G. Sanders, R. Knight, The microbiome and big data, Curr. Opin. Syst. Biol. 4 (2017), 92–96.
S.S. Mande, M.H. Mohammed, T.S. Ghosh, Classification of metagenomic sequences: methods and challenges, Brief. Bioinfor. 13 (2012), 669–681.
J. Jovel, J. Patterson, W. Wang, et al., Characterization of the gut microbiome using 16s or shotgun metagenomics, Front. Microbiol. 7 (2016), 459.
C. Quince, A.W. Walker, J.T. Simpson, N.J. Loman, N. Segata, Shotgun metagenomics, from sampling to analysis, Nat. Biotechnol. 35 (2017), 833–844.
T. Abe, S. Kanaya, M. Kinouchi, Y. Ichiba, T. Kozuki, T. Ikemura, Informatics for unveiling hidden genome signatures, Genome Res. 13 (2003), 693–702.
S.D. Essinger, R. Polikar, G.L. Rosen, Neural network-based taxonomic clustering for metagenomics, in The 2010 International Joint Conference on Neural Networks (IJCNN), IEEE, Barcelona, Spain, 2010, pp. 1–7.
D.R. Kelley, S.L. Salzberg, Clustering metagenomic sequences with interpolated markov models, BMC Bioinform. 11 (2010), 544–544.
Q. Liang, P.W. Bible, Y. Liu, B. Zou, L. Wei, Deepmicrobes: taxonomic classification for metagenomics with deep learning, NAR Genomics Bioinform. 2 (2020), lqaa009.
M. Rojas-Carulla, I.O. Tolstikhin, G. Luque, N. Youngblut, R. Ley, B. Schölkopf, Genet: deep representations for metagenomics, arXiv preprint arXiv:1901.11015, 2019, p. 537795.
Z. Rasheed , H. Rangwala, Metagenomic taxonomic classification using extreme learning machines, J. Bioinform. Comput. Biol. 10 (2012), 1250015.
A. Fiannaca, L.L. Paglia, M.L. Rosa, G.L. Bosco, G. Renda, R. Rizzo, S. Gaglio, A. Urso, Deep learning models for bacteria taxonomic classification of metagenomic data, BMC Bioinform. 19 (2018), 61–76.
K. Cadwell, The virome in host health and disease, Immunity. 42 (2015), 805–813.
Z. Fang, J. Tan, S.Wu, M. Li, C. Xu, Z. Xie, H. Zhu, Ppr-meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning, GigaScience. 8 (2019), giz066.
J. Ren, K. Song, C. Deng, N.A. Ahlgren, J.A. Fuhrman, Y. Li, X. Xie, R. Poplin, F. Sun, Identifying viruses from metagenomic data using deep learning, Quant. Biol. 8 (2020), 64–77.
A.O. Abdelkareem, M.I. Khalil, M. Elaraby, H. Abbas, A.H.A. Elbehery, Virnet: deep attention model for viral reads identification, in 2018 13th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 2018.
A. Tampuu, Z. Bzhalava, J. Dillner, R. Vicente, Viraminer: deep learning on raw DNA sequences for identifying viral genomes in human samples, PLoS One. 14 (2019), e0222271.
K. Kieft, Z. Zhou, K. Anantharaman, Vibrant: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences, Microbiome. 8 (2020), 1–23.
H. Noguchi, J. Park, T. Takagi, Metagene: prokaryotic gene finding from environmental genome shotgun sequences, Nucleic Acids Res. 34 (2006), 5623–5630.
S.W. Zhang, X.-Y. Jin, T. Zhang, Gene prediction in metagenomic fragments with deep learning, BioMed Res. Int. 2017 (2017), 4740354.
M. Boolchandani, A.W. D’Souza, G. Dantas, Sequencing-based methods and resources to study antimicrobial resistance, Nat. Rev. Genetics. 20 (2019), 356–370.
G. Arango-Argoty, E. Garner, A. Pruden, L.S. Heath, P.J. Vikesland, L. Zhang, Deeparg: a deep learning approach for predicting antibiotic resistance genes from metagenomic data, Microbiome. 6 (2018), 23–23.
J.M. Stokes, K. Yang, K. Swanson, et al., A deep learning approach to antibiotic discovery, Cell. 180 (2020), 475–483.
J. Wang, H. Jia, Metagenome-wide association studies: finemining the microbiome, Nat. Rev. Microbiol. 14 (2016), 508–522.
R.A. Power, P. Parkhill, T. de Oliveira, Microbial genome-wide association studies: lessons from human gwas, Nat. Rev. Genetics. 18 (2017), 41–50.
G. Ditzler, R. Polikar, G. Rosen, Multi-layer and recursive neural networks for metagenomic classification, IEEE Trans. Nanobiosci. 14 (2015), 608–616.
T.H. Nguyen, E. Prifti, Y. Chevaleyre, N. Sokolovska, J.-D.Zucker, Disease classification in metagenomics with 2d embeddings and deep learning, in La Conférence sur l’Apprentissage automatique (CAp), CoRR, 2018. http://arxiv.org/abs/1806.09046
A.D. Washburne, J.T. Morton, J. Sanders, D. McDonald, Q. Zhu, A.M. Oliverio, R. Knight, Methods for phylogenetic analysis of microbiome data, Nat. Microbiol. 3 (2018), 652–661.
D. Reiman, A. Metwally, J. Sun, Y. Dai, Popphy-cnn: A phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data, IEEE J. Biomed. Health Inform. 24 (2020), 2993–3001.
D. Fioravanti, Y. Giarratano, V. Maggio, C. Agostinelli, M. Chierici, G. Jurman, C. Furlanello, Phylogenetic convolutional neural networks in metagenomics, BMC Bioinform. 19 (2018), 49–49.
Q. Zhu, Q. Zhu, M. Pan, X. Jiang, X. Hu, T. He, The phylogenetic tree based deep forest for metagenomic data classification, in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 2018, pp. 279–282.
N. LaPierre, J.-T. Chelsea, G. Zhou, W. Wang, Metapheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction, Methods. 166 (2019), 74–82.
J. Yu, Q. Feng, S.H. Wong, et al., Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer, Gut. 66 (2017), 70–78.
Y. Saeys, I. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics. 23 (2007), 2507–2517.
L. van der Maaten, E. Postma, J. van den Herik, Dimensionality reduction: a comparative review J. Mach. Learn Res. 10 (2009), 13.
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science. 313 (2006), 504–507.
Z. Qiang, B. Li, T. He, G. Li, J. Xingpeng, Robust biomarker discovery for microbiome-wide association studies, Methods. 173 (2020), 144–151.
Q. Zhu, X. Jiang, Q. Zhu, M. Pan, T. He, Graph embedding deep learning guides microbial biomarkers’ identification, Front. Genetics. 10 (2019), 1182.
T. Ching, D.S. Himmelstein, B.K. Beaulieu-Jones, et al., Opportunities and obstacles for deep learning in biology and medicine, J. Royal Soc. Interface. 15 (2018), 20170387.
I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.
C.B. Azodi, J. Tang, S.-H. Shiu, Opening the black box: interpretable machine learning for geneticists, Trends Genetics. 36 (2020), 442–455.
W. James Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, B. Yu, Definitions, methods, and applications in interpretable machine learning, Proc. Natl. Acad. Sci. 116 (2019), 22071–22080.
G. Montavon, W. Samek, K.-R. Müller, Methods for interpreting and understanding deep neural networks, Digital Signal Process. 73 (2018), 1–15.
T. Chen, C. Guestrin, Xgboost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, J. Machine Learn. Res. 15 (2014), 1929–1958.
M. Feurer, A. Klein, K. Eggensperger, et al., Efficient and robust automated machine learning, in: NIPS’15 Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, vol. 2, pp. 2755–2763.
M. Abadi, P. Barham, J. Chen, et al., Tensorflow: a system for large-scale machine learning, in OSDI’16 Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, USENIX Association, 2016, pp. 265–283.
F. Chollet, Keras: the python deep learning library, ASCL, 2018.
F. Seide, A. Agarwal, CNTK: Microsoft’s open-source deeplearning toolkit, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 2135–2135.
B. Steiner, Z. DeVito, S. Chintala, et al., Pytorch: an imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, R. Garnett, (Eds.), Advances in Neural Information Processing Systems. Vol 32. Curran Associates, Inc, 2019, pp. 8026–8037.
S.R. Young, D.C. Rose, T.P. Karnowski, S.-H. Lim, R.M. Patton, Optimizing deep learning hyper-parameters through an evolutionary algorithm, in Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments (MLHPC ’15), Austin, TX, USA, 2015.
D. Laredo, Y. Qin, O. Schütze, J.-Q. Sun, Automatic model selection for neural networks, 2019. arXiv preprint arXiv:190506010.
How to Cite
Copyright (c) 2020 Qiang Zhu, Ban Huo, Han Sun, Bojing Li, Xingpeng Jiang
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).