ABSTRACT
Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.
Abbreviations
CDR | = | complementarity-determining region |
Cryo-EM | = | cryogenic electron microscopy |
XGBoost | = | extreme gradient boosting |
HIC | = | hydrophobic interaction chromatography |
HIC-RT | = | hydrophobic interaction chromatography retention time |
MOE | = | Molecular Operating Environment |
mAbs | = | monoclonal antibodies |
PSR | = | poly-specificity reagent |
SFS | = | sequential feature selection |
Fv | = | fragment variable |
SAP | = | surface aggregation propensity |
Acknowledgments
The authors would like to thank members of the Protein Sciences department in Merck and Co. SSF, Discovery Biologics, for designing, generating, and characterizing molecules used to generate the models used in this publication. We would also like to thank Will Long for assistance in writing and troubleshooting SVL scripts, Jodi Shalusky for providing Discovery Studio Perl scripts for processing of sequences, and Galen Wo for assistance with Spotfire assisted collection of assay data from disparate sources.
Disclosure statement
During the execution of this work, all authors were employees of subsidiaries of Merck & Co., Inc., Kenilworth, NJ, USA and stock-holders of Merck & Co., Inc., Kenilworth, NJ, USA.
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/19420862.2023.2248671
Correction Statement
This article has been corrected with minor changes. These changes do not impact the academic content of the article.