6,022
Views
2
CrossRef citations to date
0
Altmetric
Report

A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties

ORCID Icon, , , , , , , , , & show all
Article: 2248671 | Received 23 Feb 2023, Accepted 11 Aug 2023, Published online: 23 Aug 2023

References

  • Bailly M, Mieczkowski C, Juan V, Metwally E, Tomazela D, Baker J, Uchida M, Kofman E, Raoufi F, Motlagh S, et al. Predicting antibody developability Profiles through early stage Discovery screening. MAbs. 2020;12(1):1743053. doi:10.1080/19420862.2020.1743053.
  • Raybould MIJ, Marks C, Krawczyk K, Taddese B, Nowak J, Lewis AP, Bujotzek A, Shi J, Deane CM. Five computational developability guidelines for therapeutic antibody profiling. Proc Natl Acad Sci USA. 2019;116(10):4025–12. doi:10.1073/pnas.1810576116.
  • Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156–64. doi:10.1016/j.conctc.2018.08.001.
  • Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov. 2004;3(8):711–16. doi:10.1038/nrd1470.
  • Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–89. doi:10.1038/s41586-021-03819-2.
  • Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, et al. Accurate prediction of protein structures and interactions using a 3-track neural network. Sci. 2021;373(6557):871–76. doi:10.1126/science.abj8754.
  • Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Sci. 2023;379(6637):1123–30. doi:10.1126/science.ade2574.
  • Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig PA, Crichlow GV, Dalenberg K, Duarte JM, et al. RCSB protein data Bank (Rcsb.Org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 2023;51(D1):D488–508. doi:10.1093/nar/gkac1077.
  • UniProt Consortium T, Bateman A, Martin M-J, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, et al. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–9. doi:10.1093/nar/gkaa1100.
  • Olsen TH, Boyles F, Deane CM. Observed antibody space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 2022;31(1):141–46. doi:10.1002/pro.4205.
  • Schneider C, Raybould MIJ, Deane CM. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Res. 2022;50(D1):D1368–72. doi:10.1093/nar/gkab1050.
  • Leem J, Mitchell LS, Farmery JHR, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning. Patterns. 2022;3(7):100513. doi:10.1016/j.patter.2022.100513.
  • Khass M, Vale AM, Burrows PD, Schroeder HW. The sequences encoded by immunoglobulin diversity (DH) gene segments play key roles in controlling B-cell development, antigen-binding site diversity, and antibody production. Immunol Rev. 2018;284(1):106–19. doi:10.1111/imr.12669.
  • Prihoda D, Maamary J, Waight A, Juan V, Fayadat-Dilman L, Svozil D, Bitton DA. BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. MAbs. 2022;14:2020203. doi:10.1080/19420862.2021.2020203.
  • Olsen TH, Moal IH, Deane CM. AbLang: an antibody language model for completing antibody sequences. Bioinforma Adv. 2022;2(1):vbac046. doi:10.1093/bioadv/vbac046.
  • Ruffolo JA, Gray JJ. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Biophys J. 2022;121(3):155a–56. doi:10.1016/j.bpj.2021.11.1942.
  • Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19(6):679–82. doi:10.1038/s41592-022-01488-1.
  • Almagro JC, Teplyakov A, Luo J, Sweet RW, Kodangattil S, Hernandez-Guzman F, Gilliland GL. Second antibody modeling assessment (AMA-II). Proteins Struct Funct Bioinforma. 2014;82(8):1553–62. doi:10.1002/prot.24567.
  • Abanades B, Georges G, Bujotzek A, Deane CM. Ablooper: fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics. 2022;38(7):4.
  • Abanades B, Wong WK, Boyles F, Georges G, Bujotzek A, Deane CM. ImmuneBuilder: Deep-learning models for predicting the structures of immune proteins. Commun Biol. 2023;6:1–8. doi:10.1038/s42003-023-04927-7.
  • Marks C, Deane CM. Antibody H3 structure prediction. Comput Struct Biotechnol J. 2017;15:222–31. doi:10.1016/j.csbj.2017.01.010.
  • Fernández-Quintero ML, Kokot J, Waibl F, Fischer A-L, Quoika PK, Deane CM, Liedl KR. Challenges in antibody structure prediction. MAbs. 2023;15(1):2175319. doi:10.1080/19420862.2023.2175319.
  • Molecular Operating Environment (MOE), 2022. 02 Chemical Computing Group ULC, 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2023.
  • Schrödinger Release 2022-3: BioLuminate, Schrödinger, LLC, New York, NY, 2021.
  • 1. BIOVIA, Dassault Systèmes, Discovery Studio, 2021, San Diego: Dassault Systèmes, 2021.
  • Chen T, Guestrin C. Xgboost: A Scalable Tree boosting System [Internet]. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016 [cited 2023 Jan 4]. page 785–94. http://arxiv.org/abs/1603.02754
  • Introduction to Boosted trees — xgboost 2.0.0-dev documentation [Internet]. [accessed 2023 Feb 13]: https://xgboost.readthedocs.io/en/latest/tutorials/model.html
  • Moez A. PyCaret: An open source, low-code machine learning library in Python [Internet]. 2020 [cited 2023 Jan 4]; Available from: https://www.pycaret.org
  • Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–23. doi:10.1093/bioinformatics/btp163.
  • Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7(1):539. doi:10.1038/msb.2011.75.
  • Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991;193(1):72–82. doi:10.1016/0003-2697(91)90045-U.
  • Ferri FJ, Pudil P, Hatef M, Kittler J. Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684. In: Gelsema E Kanal L editors. Machine intelligence and pattern recognition. North-Holland; 1994. pp. 403–13. [accessed 2023 Jul 6]. https://www.sciencedirect.com/science/article/pii/B9780444818928500407.
  • Jetha A, Thorsteinson N, Jmeian Y, Jeganathan A, Giblin P, Fransson J. Homology modeling and structure-based design improve hydrophobic interaction chromatography behavior of integrin binding antibodies. MAbs. 2018;10(6):890–900. doi:10.1080/19420862.2018.1475871.
  • Salgado JC, Rapaport I, Asenjo JA. Predicting the behaviour of proteins in hydrophobic interaction chromatography. 1: Using the hydrophobic imbalance (HI) to describe their surface amino acid distribution. J Chromatogr A. 2006;1107(1–2):110–19. doi:10.1016/j.chroma.2005.12.032.
  • de Groot NS, Pallarés I, Avilés FX, Vendrell J, Ventura S. Prediction of “hot spots” of aggregation in disease-linked polypeptides. BMC Struct Biol. 2005;5(1):18. doi:10.1186/1472-6807-5-18.
  • Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S. AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinform. 2007;8(1):65. doi:10.1186/1471-2105-8-65.
  • Zhang C, Vasmatzis G, Cornette JL, DeLisi C. Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol. 1997;267(3):707–26. doi:10.1006/jmbi.1996.0859.
  • Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci U S A. 1981;78(6):3824–28. doi:10.1073/pnas.78.6.3824.
  • Vanommeslaeghe K, MacKerell AD. CHARMM additive and polarizable force fields for biophysics and computer-aided drug design. Biochim Biophys Acta. 2015;1850(5):861–71. doi:10.1016/j.bbagen.2014.08.004.
  • Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Prediction of aggregation prone regions of therapeutic proteins. J Phys Chem B. 2010;114(19):6614–24. doi:10.1021/jp911706q.
  • Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL. Design of therapeutic proteins with enhanced stability. Proc Natl Acad Sci USA. 2009;106(29):11937–42. doi:10.1073/pnas.0904191106.
  • Tjong H, Zhou H-X. Prediction of protein solubility from calculation of transfer free energy. Biophys J. 2008;95(6):2601–09. doi:10.1529/biophysj.107.127746.
  • Spassov VZ, Kemmish H, Yan L. Two physics-based models for pH-dependent calculations of protein solubility. Protein Sci. 2022;31(5):e4299. doi:10.1002/pro.4299.
  • Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984;179(1):125–42. doi:10.1016/0022-2836(84)90309-7.
  • Sankar K, Krystek SR Jr, Carl SM, Day T, Maier JKX. AggScore: Prediction of aggregation-prone regions in proteins based on the distribution of surface patches. Proteins Struct Funct Bioinforma. 2018;86(11):1147–56. doi:10.1002/prot.25594.
  • Negron C, Fang J, McPherson MJ, Stine WB, McCluskey AJ. Separating clinical antibodies from repertoire antibodies, a path to in silico developability assessment. MAbs. 2022;14(1):2080628. doi:10.1080/19420862.2022.2080628.
  • Sankar K, Trainor K, Blazer LL, Adams JJ, Sidhu SS, Day T, Meiering E, Maier JKX. A descriptor set for quantitative structure-property relationship prediction in Biologics. Mol Inform. 2022;41(9):2100240. doi:10.1002/minf.202100240.
  • Trainor K, Gingras Z, Shillingford C, Malakian H, Gosselin M, Lipovšek D, Meiering EM. Ensemble modeling and Intracellular aggregation of an engineered immunoglobulin-like Domain. J Mol Biol. 2016;428(6):1365–74. doi:10.1016/j.jmb.2016.02.016.
  • Jain T, Sun T, Durand S, Hall A, Houston NR, Nett JH, Sharkey B, Bobrowicz B, Caffry I, Yu Y, et al. Biophysical properties of the clinical-stage antibody landscape. Proc Natl Acad Sci U S A. 2017;114(5):944–49. doi:10.1073/pnas.1616408114.
  • Shehata L, Maurer DP, Wec AZ, Lilov A, Champney E, Sun T, Archambault K, Burnina I, Lynaugh H, Zhi X, et al. Affinity maturation enhances antibody specificity but compromises conformational stability. Cell Rep. 2019;28(13):3300–8.e4. doi:10.1016/j.celrep.2019.08.056.
  • Cai Z, Zafferani M, Akande OM, Hargrove AE. Quantitative Structure–Activity Relationship (QSAR) study Predicts Small-molecule binding to RNA structure. J Med Chem. 2022;65(10):7262–77. doi:10.1021/acs.jmedchem.2c00254.
  • Platts JA. Theoretical prediction of hydrogen bond donor capacity. Phys Chem Chem Phys. 2000;2(5):973–80. doi:10.1039/a908853i.
  • Spassov VZ, Yan L. A pH-dependent computational approach to the effect of mutations on protein stability. J Comput Chem. 2016;37(29):2573–87. doi:10.1002/jcc.24482.
  • Spassov VZ, Yan L. A fast and accurate computational approach to protein ionization. Protein Sci. 2008;17(11):1955–70. doi:10.1110/ps.036335.108.
  • Agrawal NJ, Helk B, Kumar S, Mody N, Sathish HA, Samra HS, Buck PM, Li L, Trout BL. Computational tool for the early screening of monoclonal antibodies for their viscosities. MAbs. 2016;8(1):43–48. doi:10.1080/19420862.2015.1099773.
  • Thorsteinson N, Gunn JR, Kelly K, Long W, Labute P. Structure-based charge calculations for predicting isoelectric point, viscosity, clearance, and profiling antibody therapeutics. MAbs. 2021;13(1):1981805. doi:10.1080/19420862.2021.1981805.
  • Gainza P, Sverrisson F, Monti F, Rodolà E, Boscaini D, Bronstein MM, Correia BE. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods. 2020;17(2):184–92. doi:10.1038/s41592-019-0666-6.
  • Gainza P, Wehrle S, Van Hall-Beauvais A, Marchand A, Scheck A, Harteveld Z, Buckley S, Ni D, Tan S, Sverrisson F, et al. De Novo design of protein interactions with learned surface fingerprints. Nature. 2023;617(7959):176–84. doi:10.1038/s41586-023-05993-x.
  • Roney JP, Ovchinnikov S. State-of-the-Art estimation of protein model accuracy using AlphaFold. Phys Rev Lett. 2022;129(23):238101. doi:10.1103/PhysRevLett.129.238101.
  • Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. doi:10.1038/s41592-019-0686-2.
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Cikit-learn: Machine learning in Python. Mach learn PYTHON. J Mach Learn Res. 12:2825–30.
  • Xu Y, Roach W, Sun T, Jain T, Prinz B, Yu T-Y, Torrey J, Thomas J, Bobrowicz P, Vasquez M, et al. Addressing polyspecificity of antibodies selected from an in vitro yeast presentation system: a FACS-based, high-throughput selection and analytical tool. Protein Eng Des Sel. 2013;26(10):663–70. doi:10.1093/protein/gzt047.