Views

CrossRef citations to date

Altmetric

Report

Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features

Ameya Harmalkara Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, USA

https://orcid.org/0000-0001-6863-9634 View further author information

Roshan Raob Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA

https://orcid.org/0000-0003-4412-3742 View further author information

Yuxuan Richard Xiec Department of Bioengineering and Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA

https://orcid.org/0000-0003-1664-9114 View further author information

Jonas Honerd Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, GermanyView further author information

Wibke Deistingd Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, GermanyView further author information

Jonas Anlahrd Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, GermanyView further author information

Anja Hoenigd Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, GermanyView further author information

Julia Czwiklad Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany

https://orcid.org/0000-0001-7856-789X View further author information

Eva Sienz-Widmannd Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, GermanyView further author information

Doris Raud Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, GermanyView further author information

Austin J. Ricee Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA

https://orcid.org/0000-0002-4165-4241 View further author information

Timothy P. Rileye Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USAView further author information

Danqing Lie Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USAView further author information

Hannah B. Catteralle Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USAView further author information

Christine E. Tinbergf Therapeutic Discovery, Amgen Research, Amgen Inc, South San Francisco, CA, USA

https://orcid.org/0000-0002-6179-0435 View further author information

Jeffrey J. Graya Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, USA

https://orcid.org/0000-0001-6380-2324 View further author information

Kathy Y. Weif Therapeutic Discovery, Amgen Research, Amgen Inc, South San Francisco, CA, USACorrespondence[email protected]

https://orcid.org/0000-0002-8794-1385 View further author information

show all

ABSTRACT

Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches – one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features – to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, $ρ$ , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task ( $ρ =$ 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.

KEYWORDS:

Acknowledgments

The authors thank Ai Ching Lim for her support of the project.

Disclosure statement

All authors except for AH, RR, TPR, and JJG are current employees of Amgen. AH and RR were interns at Amgen. TPR is a former employee of Amgen.

Data availability statement

The source code for TherML (zero-shot, fine-tuned and supervised models) is available at https://github.com/AmeyaHarmalkar/therML for non-commercial use only. The experimental thermostability data and sequences are from internal antibody engineering studies and cannot be made available as the sequences are an intellectual property of Amgen. Any additional information required to reanalyze the data reported in this paper is available from the lead author upon request.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19420862.2022.2163584

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

AH and JJG were partially supported by the NIH R35 GM141881.

Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features

Information for

Open access

Opportunities

Help and information

Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features

ABSTRACT

Acknowledgments

Disclosure statement

Data availability statement

Supplementary material

Correction Statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature