ABSTRACT
Despite the advances in surface-display systems for directed evolution, variants with high affinity are not always enriched due to undesirable biases that increase target-unrelated variants during biopanning. Here, our goal was to design a library containing improved variants from the information of the “weakly enriched” library where functional variants were weakly enriched. Deep sequencing for the previous biopanning result, where no functional antibody mimetics were experimentally identified, revealed that weak enrichment was partly due to undesirable biases during phage infection and amplification steps. The clustering analysis of the deep sequencing data from appropriate steps revealed no distinct sequence patterns, but a Bayesian machine learning model trained with the selected deep sequencing data supplied nine clusters with distinct sequence patterns. Phage libraries were designed on the basis of the sequence patterns identified, and four improved variants with target-specific affinity (EC50 = 80–277 nM) were identified by biopanning. The selection and use of deep sequencing data without undesirable bias enabled us to extract the information on prospective variants. In summary, the use of appropriate deep sequencing data and machine learning with the sequence data has the possibility of finding sequence space where functional variants are enriched.
Abbreviations
CDR: Complementarity-determining region
CD: Circular dichroism
E. coli: Escherichia coli
ELISA: Enzyme-linked immunosorbent assay
ER: Enrichment ratio
Ig: Immunoglobulin
IMAC: Immobilized metal ion affinity chromatography
PCR: Polymerase chain reaction
SEC: Size exclusion chromatography
Acknowledgments
We thank Ms Miho Hosoya, Hiromi Ogata, and Yuri Ishigaki for experimental support. This work was partly supported by a Scientific Research Grant from the Japan Society for the Promotion of Science research fellowship (M.U., JP20H00315); by the project “Development of the Key Technologies for the Next-Generation Artificial Intelligence/Robots” of the New Energy and Industrial Technology Development Organization (M.U.); and by Support for Pioneering Research Initiated by the Next Generation (SPRING) from the Japan Science and Technology Agency (T.I., JPMJSP2114). The computations were performed using supercomputers in AI Bridging Cloud Infrastructure (ABCI), National Institute of Genetics (NIG), Academic Center for Computing and Media Studies (ACCMS, Kyoto University), Human Genome Center (SHIROKANE; The University of Tokyo), and Research Institute for Information Technology (RIIT, Kyushu University).
Disclosure statement
No potential conflict of interest was reported by the authors.
Author contributions
M.U. and T.K. conceived of the study and directed the project; T.I., T.D.N., H.Nishi, Y.S. and M.U. developed the methodology; T.I., T.D.N., S.K. and H.Nakazawa conducted an investigation process; M.U., Y.S., T.K. and K.T. designed the experimental strategy and supervised analysis; T.I., T.D.N. and Y.K. visualized the experimental data; T.I., M.U. and H.Nakazawa wrote the original draft of the manuscript; T.D.N., Y.S., T.K. and K.T. reviewed and edited the manuscript.
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/19420862.2023.2168470