7,902
Views
6
CrossRef citations to date
0
Altmetric
Report

Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features

ORCID Icon, ORCID Icon, ORCID Icon, , , , , ORCID Icon, , , ORCID Icon, , , , ORCID Icon, ORCID Icon & ORCID Icon show all
Article: 2163584 | Received 16 Jun 2022, Accepted 26 Dec 2022, Published online: 22 Jan 2023

References

  • Spiess C, Zhai Q, Carter PJ. Alternative molecular formats and therapeutic applications for bispecific antibodies. Mol Immunol. 2015;67:95–13. doi:10.1016/j.molimm.2015.01.003.
  • Zhong X, D’Antona AM. Recent advances in the molecular design and applications of multispecific biotherapeutics. Antibodies (Basel, Switzerland). 2021;10. doi:10.3390/antib10020013.
  • Klinger M, Benjamin J, Kischel R, Stienen S, Zugmaier G. Harnessing T cells to fight cancer with BiTE® antibody constructs–past developments and future directions. Immunol Rev. 2016;270:193–208. doi:10.1111/imr.12393.
  • Dong J, Sereno A, Aivazian D, Langley E, Miller BR, Snyder WB, Chan E, Cantele M, Morena R, Joseph IBJK, et al. A stable IgG-like bispecific antibody targeting the epidermal growth factor receptor and the type I insulin-like growth factor receptor demonstrates superior anti-tumor activity. mAbs. 2011;3:273–88. doi:10.4161/mabs.3.3.15188.
  • Moore GL, Bernett MJ, Rashid R, Pong EW, Nguyen DHT, Jacinto J, Eivazi A, Nisthal A, Diaz JE, Chu SY, et al. A robust heterodimeric Fc platform engineered for efficient development of bispecific antibodies of multiple formats. Methods (San Diego, Calif). 2019;154:38–50. doi:10.1016/j.ymeth.2018.10.006.
  • Sawant MS, Streu CN, Wu L, Tessier PM. Toward drug-like multispecific antibodies by design. Int J Mol Sci. 2020;21:7496. doi:10.3390/ijms21207496.
  • Miller BR, Demarest SJ, Lugovskoy A, Huang F, Wu X, Snyder WB, Croner LJ, Wang N, Amatucci A, Michaelson JS, et al. Stability engineering of scFvs for the development of bispecific and multivalent antibodies. Protein eng des sel: PEDS. 2010;23:549–57.
  • Lapidoth GD, Baran D, Pszolla GM, Norn C, Alon A, Tyka MD, Fleishman SJ. AbDesign: an algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins: Struct Funct Bioinform. 2015;83:1385–406. doi:10.1002/prot.24779.
  • Goldenzweig A, Goldsmith M, Hill SE, Gertman O, Laurino P, Ashani Y, Dym O, Unger T, Albeck S, Prilusky J, et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol Cell. 2016;63:337–46. doi:10.1016/j.molcel.2016.06.012.
  • Warszawski S, Katz AB, Lipsh R, Khmelnitsky L, Nissan GB, Javitt G, Dym O, Unger T, Knop O, Albeck S, et al. Optimizing antibody affinity and stability by the automated design of the variable light-heavy chain interfaces. PLoS Comput Biol. 2019;15:1–24. doi:10.1371/journal.pcbi.1007207.
  • Kumar MDS, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 2006;34:D204–6. doi:10.1093/nar/gkj103.
  • Nikam R, Kulandaisamy A, Harini K, Sharma D, Gromiha MM. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res. 2020;49:D420–D424. doi:10.1093/nar/gkaa1035.
  • Gromiha MM, Suresh MX. Discrimination of mesophilic and thermophilic proteins using machine learning algorithms. Proteins. 2008;70:1274–79. doi:10.1002/prot.21616.
  • Jia L, Yarlagadda R, Reed CC, Zhang Y. Structure based thermostability prediction models for protein single point mutations with machine learning tools. PLoS ONE. 2015;10:1–19. doi:10.1371/journal.pone.0138022.
  • Yang Y, Urolagin S, Niroula A, Ding X, Shen B, Vihinen M . Pon-tstab: protein variant stability predictor. importance of training data quality. Int J Mol Sci. 2018;19(4):1009. doi:10.3390/ijms19041009.
  • Sarkisyan KS, Bolotin DA, Meer MV, Usmanova DR, Mishin AS, Sharonov GV, Ivankov DN, Bozhanova NG, Baranov MS, Soylemez O, et al. Local fitness landscape of the green fluorescent protein. Nature. 2016;533:397–401. doi:10.1038/nature17995.
  • Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J, Abbeel P, Song YS. Evaluating protein transfer learning with TAPE. Adv Neural Inf Process Syst. 2019;32:9689–701.
  • Shanehsazzadeh A, Belanger D, Dohan D. Is transfer learning necessary for protein landscape prediction? 2020;1–10.
  • Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–75. doi:10.1126/science.aan0693.
  • Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods. 2019;16:1315–22. doi:10.1038/s41592-019-0598-1.
  • Gray VE, Hause RJ, Luebeck J, Shendure J, Fowler DM. Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell Systems. 2018;6:116–124.e3. doi:10.1016/j.cels.2017.11.003.
  • Hsu C, Nisonoff H, Fannjiang C, Listgarten J. Learning protein fitness models from evolutionary and assay-labeled data. Nat Biotechnol. 2022;40:1114–22. doi:10.1038/s41587-021-01146-5.
  • Mason DM, Friedensohn S, Weber CR, Jordi C, Wagner B, Meng SM, Ehling RA, Bonati L, Dahinden J, Gainza P, et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat Biomed Eng. 2021;5:600–12. doi:10.1038/s41551-021-00699-9.
  • Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, et al., Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences; 2021; Vol. 118. doi: 10.1073/pnas.2016239118.
  • Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inf Process Syst. 2021;34:35. https://proceedings.neurips.cc/paper/2021/hash/f51338d736f95dd42427296047067694-Abstract.html
  • Ruffolo JA, Sulam J, Gray JJ. 2021. Antibody structure prediction using interpretable deep learning. bioRxiv. doi:10.1016/j.patter.2021.100406.
  • Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CP, Springer M, Sander C, Marks DS. Mutation effects predicted from sequence co-variation. Nat Biotechnol. 2017;35:128–35. doi:10.1038/nbt.3769.
  • Ruffolo JA, Gray JJ, Sulam J. Deciphering antibody affinity maturation with language models and weakly supervised learning. Patterns (New York, N.Y.). 2021;3. doi:10.1016/j.patter.2021.100406.
  • Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J Immunol. 2018;201:2502–09. doi:10.4049/jimmunol.1800708.
  • Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods. 2018;15:816–22. doi:10.1038/s41592-018-0138-4.
  • Nijkamp E, Ruffolo J, Weinstein EN, Naik N, Madani A. ProGen2: exploring the boundaries of protein language models. arXiv. 2022;2206.13517. doi:10.48550/arXiv.2206.13517.
  • Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, Shapovalov MV, Renfrew PD, Mulligan VK, Kappel K, et al. The Rosetta all-atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13:3031–48. doi:10.1021/acs.jctc.7b00125.
  • Koenig P, Lee CV, Walters BT, Janakiraman V, Stinson J, Patapoff TW, Fuh G. Mutational landscape of antibody variable domains reveals a switch modulating the interdomain conformational dynamics and antigen binding. Proceedings of the National Academy of Sciences of the United States of America; 2017; Vol. 114, p. E486–E495. doi:10.1073/pnas.1613231114.
  • Fuh G, Wu P, Liang WC, Ultsch M, Lee CV, Moffat B, Wiesmann C. Structure-function studies of two synthetic anti-vascular endothelial growth factor Fabs and comparison with the Avastin Fab. J Biol Chem. 2006;281:6625–31. doi:10.1074/jbc.M507783200.
  • Huang P, Chu SKS, Frizzo HN, Connolly MP, Caster RW, Siegel JB. Evaluating protein engineering thermostability prediction tools using an independently generated dataset. ACS Omega. 2020;5:6487–93. doi:10.1021/acsomega.9b04105.
  • Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–W388. doi:10.1093/nar/gki387.
  • Leem J, Mitchell LS, Farmery JH, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning. bioRxiv. 2021. doi:10.1016/j.patter.2022.100513.
  • Akbar R, Robert PA, Pavlović M, Jeliazkov JR, Snapkov I, Slabodkin A, Weber CR, Scheffer L, Miho E, Haff IH, et al. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Rep. 2021;34:108856. doi:10.1016/j.celrep.2021.108856.
  • Schneider C, Buchanan A, Taddese B, Deane CM, Valencia A. DLAB: deep learning methods for structure-based virtual screening of antibodies. Bioinformatics. 2021;38:377–83. doi:10.1093/bioinformatics/btab660.
  • Krause B, Murray I, Renals S, Liang L. Multiplicative lstm for sequence modelling in 5th International Conference on Learning Representations; 2017; p. 2872–80; Toulon, France.
  • Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al. The pfam protein families database. Nucleic Acids Res. 2004;32:D138–D141. doi:10.1093/nar/gkh121.
  • Ma EJ, Kummer A. Reimplementing unirep in jax. bioRxiv. 2020.
  • Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. Uniref: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–88. doi:10.1093/bioinformatics/btm098.
  • Detlefsen NS, Hauberg S, Boomsma W. What is a meaningful representation of protein sequences? ArXiv preprint arXiv:2012.02679 (2020).
  • Jones JE, Chapman S. On the determination of molecular fields. —II. From the equation of state of a gas. Proceedings of the Royal Society of London Series A, Containing Papers of a Mathematical and Physical Character. 1924;106:463–77.
  • Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins: Struct Funct Bioinform. 1999;35:133–52. doi:10.1002/(SICI)1097-0134(19990501)35:2<133::AID-PROT1>3.0.CO;2-N.
  • Park H, Bradley P, Greisen P, Liu Y, Mulligan VK, Kim DE, Baker D, Dimaio F. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J Chem Theory Comput. 2016;12:6201–12. doi:10.1021/acs.jctc.6b00819.
  • Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. J Mol Biol. 2003;326:1239–59. doi:10.1016/S0022-2836(03)00021-4.
  • O’Meara MJ, Leaver-Fay A, Tyka MD, Stein A, Houlihan K, DiMaio F, Bradley P, Kortemme T, Baker D, Snoeyink J, et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J Chem Theory Comput. 2015;11:609–22. doi:10.1021/ct500864r.
  • Leaver-Fay A, O’Meara MJ, Tyka M, Jacak R, Song Y, Kellogg EH, Thompson J, Davis IW, Pache RA, Lyskov S, et al. Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol. 2013;523:109–43.
  • Shapovalov MV, Dunbrack RLJ. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure (London, England: 1993). 2011;19:844–58. doi:10.1016/j.str.2011.03.019.