1,475
Views
0
CrossRef citations to date
0
Altmetric
ORIGINAL ARTICLE

Mutation prediction and phylogenetic analysis of SARS-CoV2 protein sequences using LSTM based encoder-decoder model

, , & ORCID Icon
Pages 103-121 | Received 06 Oct 2022, Accepted 03 Mar 2023, Published online: 23 Mar 2023

References

  • Carvalho, P. C., Fischer, J. S., & Chen, E. I. (2009). DomProtein explorer: A tool for exploring domain-domain interactions in protein structures. Bioinformatics, 25(9), 1235–1236.
  • Centers for Disease Control and Prevention (2021). Emerging SARS-CoV-2 variants. https://www.cdc.gov/coronavirus/2019-ncov/more/science-and-research/scientific-brief-emerging-variants.html.
  • Chen, J., Gao, K., Wang, R., & Wei, G.-W. (2021). Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies. Chemical Science, 12(20), 6929–6948. doi:10.1039/d1sc01203g
  • Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. doi:10.1093/nar/gkh340
  • Gorbalenya, A. E., & Lauber, C. (2017). Phylogeny of viruses reference module in biomedical sciences.
  • Hall, T. (1999). BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series, 41, 95–98.
  • Hall, T. (2004). "BioEdit version 7.0. 0." Distributed by the author, website: www. mbio. ncsu. edu/BioEdit/bioedit.Html.
  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
  • Korber, B., Fischer, W. M., Gnanakaran, S., Yoon, H., Theiler, J., Abfalterer, W., … Bhattacharya, T. (2020). Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv. https://www.biorxiv.org/content/10.1101/2020.04.29.069054v2.
  • Koumakis, L. (2020). Deep learning models in genomics. Computational and Structural Biotechnology Journal, 18, 1466–1473. doi:10.1016/j.csbj.2020.06.017
  • Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular Biology and Evolution, 33(7), 1870–1874. doi:10.1093/molbev/msw054
  • Lauring, A. S., & Hodcroft, E. B. (2021). Genetic variants of SARS-CoV-2-what do they mean? JAMA, 325(6), 529–531. doi:10.1001/jama.2020.27124
  • Lopez-Rincon, A., Tonda, A., Mendoza-Maldonado, L., Mulders, D. G. J. C., Molenkamp, R., Perez-Romero, C. A., … Kraneveld, A. D. (2021). Classi_cation and speci_c primer design for accurate detection of SARS-CoV-2 using deep learning. Sci. Rep., Vol, 11(1), 1–11.
  • Lu, R., Zhao, X., Li, J., Niu, P., Yang, B., Wu, H., … Bi, Y. (2020). Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet, 395(10224), 565–574.
  • Mohamed, T., Sayed, S., Salah, A., Houssein, E. H. (2021). Long short-term memory neural networks for RNA viruses mutations prediction. Mathematical Problems in Engineering, Article ID 9980347, 9. doi:10.1155/2021/9980347
  • National Center for Biotechnology Information (NCBI) Bethesda (MD). (1988). National Library of Medicine (US), National Center for Biotechnology Information; https://www.ncbi.nlm.nih.gov/. Accessed 30 January 2022.
  • Nawaz, M. S., Fournier-Viger, P., Shojaee, A., & Fujita, H. (2021). Using artificial intelligence techniques for COVID-19 genome analysis. Applied Intelligence (Dordrecht, Netherlands), 51(5), 3086–3103. doi:10.1007/s10489-021-02193-w
  • Nguyen, T. T., Pathirana, P. N., Nguyen, T., Nguyen, Q. V. H., Bhatti, A., Nguyen, D. C., … Abdelrazek, M. (2021). Genomic mutations and changes in protein secondary structure and solvent accessibility of SARS-CoV-2 (COVID-19 virus). Scientific Reports, 11(1), 1–16.
  • Pathan, R. K., Biswas, M., & Khandaker, M. U. (2020). Time series prediction of COVID-19 by mutation rate analysis using recurrent neural network-based LSTM model. Chaos, Solitons, and Fractals, 138, 110018. doi:10.1016/j.chaos.2020.110018
  • Rambaut, A., Holmes, E. C., O'Toole, Á., Hill, V., McCrone, J. T., Ruis, C., … Pybus, O. G. (2020). A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology, 5(11), 1403–1407. doi:10.1038/s41564-020-0770-5
  • Sah, S., Dr.Surendiran, B., Dr.Dhanalakshmi, R., & Kamerkar, A. (2021). Classification and alignment of SARS-CoV2 sequences using machine learning approach. International Journal of Advanced Research in Management, Architecture, Technology and Engineering, 7, 34–44.
  • Sah, S., Surendiran, B., & Dhanalakshmi, R. (2023). Genomic sequence similarity of SARS-CoV2 nucleotide sequences using biopython: Key for finding cure and vaccines. In Application of deep learning methods in healthcare and medical science (pp. 211–223). USA: Apple Academic Press.
  • Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26(10), 1135–1145. doi:10.1038/nbt1486
  • Sievers, F., & Higgins, D. G. (2018). Clustal Omega for making accurate alignments of many protein sequences. Protein Science: A Publication of the Protein Society, 27(1), 135–145. doi:10.1002/pro.3290
  • Smith, Y. (2019). Amino acids and protein sequences news. https://www.news-medical.net/life-sciences/Amino-Acids-and-Protein-Sequences.aspx. Accessed 26 Feb 2019
  • Stranger, B. E., & Dermitzakis, E. T. (2006). From DNA to RNA to disease and back: The ‘central dogma’ of regulatory disease variation Hum. Genomics, 2(6), 383–390.
  • Taly, J. F., Magis, C., Bussotti, G., Chang, J. M., Di Tommaso, P., Erb, I., … Notredame, C. (2011). The coffee served blind: A new view on the multiple sequence alignment problem. PLoS One. 6(12), e28817. doi:10.1371/journal.pone.0028817
  • Tomita, N., Mori, H., & Mochizuki, T. (2015). An efficient way of selecting multiple sequences for BioEdit. Bioscience, Biotechnology, and Biochemistry, 79(12), 2013–2015.
  • Wang, R., Chen, J., Gao, K., & Wei, G.-W. (2021). Vaccine-escape and fast-growing mutations in the United Kingdom, the United States, Singapore, Spain, India, and other COVID-19-devastated countries. Genomics, 113(4), 2158–2170. doi:10.1016/j.ygeno.2021.05.006
  • Whata, A., & Chimedza, C. (2021). Deep learning for SARS COV-2 genome sequences. IEEE Access: Practical Innovations, Open Solutions, 9, 59597–59611. doi:10.1109/ACCESS.2021.3073728
  • World Health Organization (2021). Tracking SARS-CoV-2 variants. https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/.
  • Xu, J., Guo, H. C., Wei, Y. Q., Shu, L., Wang, J., Li, J. S., … Sun, S. Q. (2015). Phylogenetic analysis of canine parvovirus isolates from Sichuan and Gansu provinces of China in 2011. Transboundary and Emerging Diseases, 62, 91–95.
  • Yan, S., & Wu, G. (2021). Neural network to predict probabilistically possible mutations in hemagglutinins from Eurasia H1 influenza A virus. In 2nd International Conference on Computer Vision, Image, and Deep Learning, vol. 11911, pp. 283–289. SPIE.
  • Zhou, B., Zhou, H., Zhang, X., Xu, X., Chai, Y., Zheng, Z., … Zhou, Z. (2023). TEMPO: A transformer-based mutation prediction framework for SARS-CoV-2 evolution. Computers in Biology and Medicine, 152, 12–21.