1,193
Views
0
CrossRef citations to date
0
Altmetric
Report

BERT2DAb: a pre-trained model for antibody representation based on amino acid sequences and 2D-structure

, , , , , & ORCID Icon show all
Article: 2285904 | Received 25 Jun 2023, Accepted 16 Nov 2023, Published online: 27 Nov 2023

References

  • Shan S, Luo S, Yang Z, Hong J, Su Y, Ding F, Fu L, Li C, Chen P, Ma J, et al. Deep learning guided optimization of human antibody against SARS-CoV-2 variants with broad neutralization. Proc Natl Acad Sci USA. 2022;119(11):e2122954119. doi:10.1073/pnas.2122954119.
  • de Assis RR, Jain A, Nakajima R, Jasinskas A, Felgner J, Obiero JM, Norris PJ, Stone M, Simmons G, Bagri A, et al. Analysis of SARS-CoV-2 antibodies in COVID-19 convalescent blood using a coronavirus antigen microarray. Nat Commun. 2021;12(1):6. doi:10.1038/s41467-020-20095-2.
  • Ge J, Wang R, Ju B, Zhang Q, Sun J, Chen P, Zhang S, Tian Y, Shan S, Cheng L, et al. Antibody neutralization of SARS-CoV-2 through ACE2 receptor mimicry. Nat Commun. 2021;12(1):250. doi:10.1038/s41467-020-20501-9.
  • Zhao J, Nussinov R, Wu WJ, Ma B. In silico methods in antibody design. Antibod. 2018;7(3):22. doi:10.3390/antib7030022.
  • Norman RA, Ambrosetti F, Bonvin AMJJ, Colwell LJ, Kelm S, Kumar S, Krawczyk K. Computational approaches to therapeutic antibody design: established methods and emerging trends. Brief Bioinform. 2020;21(5):1549–12. doi:10.1093/bib/bbz095.
  • Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel. 2012;25(10):507–22. doi:10.1093/protein/gzs024.
  • Liang T, Chen H, Yuan J, Jiang C, Hao Y, Wang Y, Feng Z, Xie X-Q. IsAb: a computational protocol for antibody design. Brief Bioinform. 2021;22(5):bbab143. doi:10.1093/bib/bbab143.
  • Mason DM, Friedensohn S, Weber CR, Jordi C, Wagner B, Meng SM, Ehling RA, Bonati L, Dahinden J, Gainza P, et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat Biomed Eng. Published online 2021 April 15;5(6):600–12. doi:10.1038/s41551-021-00699-9.
  • Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R, Valencia A. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformat. 2018;34(15):2605–13. doi:10.1093/bioinformatics/bty166.
  • Rawi R, Mall R, Kunji K, Shen CH, Kwong PD, Chuang GY, Hancock J. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformat. 2018;34(7):1092–98. doi:10.1093/bioinformatics/btx662.
  • Chen X, Dougherty T, Hong C, Schibler R, Zhao YC, Sadeghi R, Matasci N, Wu YC, Kerman I. Predicting antibody developability from sequence using machine learning. bioRxiv. Published online 2020 June 20. doi:10.1101/2020.06.18.159798.
  • Biswas S, Khimulya G, Alley EC, Esvelt KM, Church GM. Low-N protein engineering with data-efficient deep learning. Nat Methods. 2021;18(4):389–96. doi:10.1038/s41592-021-01100-y.
  • Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods. 2019;16(12):1315–22. doi:10.1038/s41592-019-0598-1.
  • Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M, Martelli PL. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformat. 2022;38(8):2102–10. doi:10.1093/bioinformatics/btac020.
  • Elnaggar A, Heinzinger M, Dallago C, Rihawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M et al. ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE Trans Pattern Anal Mach Intell. Published online 2021:1–1. doi:10.1109/TPAMI.2021.3095381
  • Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Sci. 2023;379(6637):1123–30. doi:10.1126/science.ade2574.
  • Murphy K, Weaver C. Janeway’s immunobiology. Garland science. 2016.
  • Ruffolo JA, Gray JJ, Sulam J Deciphering antibody affinity maturation with language models and weakly supervised learning. Published online 2021 December 14 [Accessed 2022 November 9]. http://arxiv.org/abs/2112.07782
  • Olsen TH, Moal IH, Deane CM, Lengauer T. AbLang: an antibody language model for completing antibody sequences. Bioinformat Adv. 2022;2(1):vbac046. doi:10.1093/bioadv/vbac046.
  • Gao X, Cao C, Lai L . Pre-training with a rational approach for antibody. 2023. https://www.biorxiv.org/content/10.1101/2023.01.19.524683v2.abstract.
  • Ruffolo JA, Chu LS, Mahajan SP, Gray JJ. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat Commun. 2023;14(1):2389. doi:10.1038/s41467-023-38063-x.
  • Prytuliak R. Recognition of short functional motifs in protein sequences. 2018. https://edoc.ub.uni-muenchen.de/22474/.
  • Asgari E, McHardy AC, Mofrad MRK. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecx). Sci Rep. 2019;9(1):3577. doi:10.1038/s41598-019-38746-w.
  • Totrov M, Dash C. Estimated secondary structure propensities within V1/V2 region of HIV gp120 are an important global antibody neutralization sensitivity determinant. PLoS ONE. 2014;9(4):e94002. doi:10.1371/journal.pone.0094002.
  • Roig X, Novella IS, Giralt E, Andreu D. Examining the relationship between secondary structure and antibody recognition in immunopeptides from foot-and-mouth disease virus. Lett Pept Sci. 1994;1(1):39–49. doi:10.1007/BF00132761.
  • Saini S, Agarwal M, Pradhan A, Pareek S, Singh AK, Dhawan G, Dhawan U, Kumar Y. Exploring the role of framework mutations in enabling breadth of a cross-reactive antibody (CR3022) against the SARS-CoV-2 RBD and its variants of concern. J Biomol Struct Dyn. 2023;41(6):2341–54. doi:10.1080/07391102.2022.2030800.
  • Zhang Y, Tiňo P, Leonardis A, Tang K. A survey on neural network interpretability. IEEE Trans Emerg Top Comput Intell. 2021;5(5):726–42. doi:10.1109/TETCI.2021.3100641.
  • Vig J, Madani A, Varshney LR, Xiong C, Socher R, Rajani NF. Bertology meets biology: interpreting attention in protein language models. Published online 2021 March 28[Accessed November 10, 2022]. http://arxiv.org/abs/2006.15222.
  • Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):2579–2605.
  • Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y, Wang W, et al. Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc Chem Res. 2000;33(12):889–97. doi:10.1021/ar000033j.
  • Gilson MK, Zhou HX. Calculation of protein-ligand binding affinities. Annu Rev Biophys Biomol Struct. 2007;36(1):21–42. doi:10.1146/annurev.biophys.36.040306.132550.
  • Wang M, Cang Z, Wei GW. A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nat Mach Intell. 2020;2(2):116–23. doi:10.1038/s42256-020-0149-6.
  • Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–89. doi:10.1038/s41586-021-03819-2.
  • Al-Lazikani B, Lesk AM, Chothia C. Standard conformations for the canonical structures of immunoglobulins. J Mol Biol. 1997;273(4):927–48. doi:10.1006/jmbi.1997.1354.
  • Iuchi H, Matsutani T, Yamada K, Iwano N, Sumi S, Hosoda S, Zhao S, Fukunaga T, Hamada M. Representation learning applications in biological sequence analysis. Comput Struct Biotechnol J. 2021;19:3198–208. doi:10.1016/j.csbj.2021.05.039.
  • Leem J, Mitchell LS, Farmery JHR, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning. Patterns. 2022;3(7):100513. doi:10.1016/j.patter.2022.100513.
  • Liberis E, Velickovic P, Sormanni P, Vendruscolo M, Lio P, Hancock J. Parapred: antibody paratope prediction using convolutional and recurrent neural networks. Bioinformat. 2018;34(17):2944–50. doi:10.1093/bioinformatics/bty305.
  • Olsen TH, Boyles F, Deane CM. Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. Published online 2021 October 29;31(1):141–46. doi:10.1002/pro.4205.
  • Raybould MIJ, Kovaltsuk A, Marks C, Deane CM, Wren J. CoV-AbDab: the coronavirus antibody database. Bioinformat. 2021;37(5):734–35. doi:10.1093/bioinformatics/btaa739.
  • Dong J, Yao ZJ, Zhang L, Luo F, Lin Q, Lu A-P, Chen AF, Cao D-S. PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. J Cheminformat. 2018;10(1):16. doi:10.1186/s13321-018-0270-2.
  • Sirin S, Apgar JR, Bennett EM, Keating AE. AB‐bind: antibody binding mutational database for computational affinity predictions. Protein Sci. 2016;25(2):393–409. doi:10.1002/pro.2829.
  • Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, Abbeel P, Song Y. et al. Evaluating protein transfer learning with TAPE Advances in Neural Information Processing Systems 32 . 2019.
  • Ezkurdia I, Graña O, Izarzugaza JMG, Tress ML. Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8: CASP8 domain and contact assessment. Proteins Struct Funct Bioinforma. 2009;77(S9):196–209. doi:10.1002/prot.22554.
  • Raybould MIJ, Marks C, Lewis AP, Shi J, Bujotzek A, Taddese B, Deane CM. Thera-SAbDab: the therapeutic structural antibody database. Nucleic Acids Res. 2020;48(D1):D383–D88. doi:10.1093/nar/gkz827.
  • Kotowski K, Smolarczyk T, Roterman‐Konieczna I, Stapor K. ProteinUnet—an efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures. J Comput Chem. 2021;42(1):50–59. doi:10.1002/jcc.26432.
  • Shibata Y, Kida T, Fukamachi S, Takeda M, Shinohara A, Shinohara T, Arikawa S. Byte pair encoding: a text compression scheme that accelerates pattern matching. Published online 1999.
  • Wu Y, Schuster M, Chen Z. Google’s neural machine translation system: bridging the gap between human and machine translation. Published online 2016 October 8 [Accessed 2022 November 9]. http://arxiv.org/abs/1609.08144.
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems 30. 2017.