Views

CrossRef citations to date

Altmetric

Report

A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties

Andrew B. Waighta Discovery Biologics, Protein Sciences, Merck & Co., Inc, South San Francisco, CA, USACorrespondence[email protected]

https://orcid.org/0000-0002-4110-1452 View further author information

David Prihodab Discovery Informatics, MSD Czech Republic s.r.o, Prague, Czech RepublicView further author information

Rojan Shresthaa Discovery Biologics, Protein Sciences, Merck & Co., Inc, South San Francisco, CA, USAView further author information

Kevin Metcalfa Discovery Biologics, Protein Sciences, Merck & Co., Inc, South San Francisco, CA, USAView further author information

Marc Baillya Discovery Biologics, Protein Sciences, Merck & Co., Inc, South San Francisco, CA, USAView further author information

Marco Anconab Discovery Informatics, MSD Czech Republic s.r.o, Prague, Czech RepublicView further author information

Talal Widatallac Computational and Structural Chemistry, Merck & Co., Inc, South San Francisco, CA, USAView further author information

Zachary Rollinsc Computational and Structural Chemistry, Merck & Co., Inc, South San Francisco, CA, USAView further author information

Alan C Chengc Computational and Structural Chemistry, Merck & Co., Inc, South San Francisco, CA, USAView further author information

Danny A. Bittonb Discovery Informatics, MSD Czech Republic s.r.o, Prague, Czech RepublicView further author information

Laurence Fayadat-Dilmana Discovery Biologics, Protein Sciences, Merck & Co., Inc, South San Francisco, CA, USAView further author information

show all

ABSTRACT

Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.

KEYWORDS:

Abbreviations

CDR	=	complementarity-determining region
Cryo-EM	=	cryogenic electron microscopy
XGBoost	=	extreme gradient boosting
HIC	=	hydrophobic interaction chromatography
HIC-RT	=	hydrophobic interaction chromatography retention time
MOE	=	Molecular Operating Environment
mAbs	=	monoclonal antibodies
PSR	=	poly-specificity reagent
SFS	=	sequential feature selection
Fv	=	fragment variable
SAP	=	surface aggregation propensity

Acknowledgments

The authors would like to thank members of the Protein Sciences department in Merck and Co. SSF, Discovery Biologics, for designing, generating, and characterizing molecules used to generate the models used in this publication. We would also like to thank Will Long for assistance in writing and troubleshooting SVL scripts, Jodi Shalusky for providing Discovery Studio Perl scripts for processing of sequences, and Galen Wo for assistance with Spotfire assisted collection of assay data from disparate sources.

Disclosure statement

During the execution of this work, all authors were employees of subsidiaries of Merck & Co., Inc., Kenilworth, NJ, USA and stock-holders of Merck & Co., Inc., Kenilworth, NJ, USA.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19420862.2023.2248671

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties

Information for

Open access

Opportunities

Help and information

A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties

ABSTRACT

Abbreviations

Acknowledgments

Disclosure statement

Supplementary material

Correction Statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature