6,022
Views
2
CrossRef citations to date
0
Altmetric
Report

A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties

ORCID Icon, , , , , , , , , & show all
Article: 2248671 | Received 23 Feb 2023, Accepted 11 Aug 2023, Published online: 23 Aug 2023
 

ABSTRACT

Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.

Abbreviations

CDR=

complementarity-determining region

Cryo-EM=

cryogenic electron microscopy

XGBoost=

extreme gradient boosting

HIC=

hydrophobic interaction chromatography

HIC-RT=

hydrophobic interaction chromatography retention time

MOE=

Molecular Operating Environment

mAbs=

monoclonal antibodies

PSR=

poly-specificity reagent

SFS=

sequential feature selection

Fv=

fragment variable

SAP=

surface aggregation propensity

Acknowledgments

The authors would like to thank members of the Protein Sciences department in Merck and Co. SSF, Discovery Biologics, for designing, generating, and characterizing molecules used to generate the models used in this publication. We would also like to thank Will Long for assistance in writing and troubleshooting SVL scripts, Jodi Shalusky for providing Discovery Studio Perl scripts for processing of sequences, and Galen Wo for assistance with Spotfire assisted collection of assay data from disparate sources.

Disclosure statement

During the execution of this work, all authors were employees of subsidiaries of Merck & Co., Inc., Kenilworth, NJ, USA and stock-holders of Merck & Co., Inc., Kenilworth, NJ, USA.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19420862.2023.2248671

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.