718
Views
0
CrossRef citations to date
0
Altmetric
Data Note

Data Sets of Human and Mouse Protein Kinase Inhibitors With Curated Activity Data Including Covalent Inhibitors

& ORCID Icon
Article: FSO892 | Received 27 May 2023, Accepted 31 Jul 2023, Published online: 16 Aug 2023

Abstract

Aim: Generation of high-quality data sets of protein kinase inhibitors (PKIs). Methodology: Publicly available PKIs with reliable activity data were curated. PKIs with very weak activity were classified as inactive. Analogue series and PKIs containing reactive groups (warheads) enabling covalent inhibition were systematically identified. Exemplary results & data: A total of 155,579 human and 3057 mouse PKIs were obtained. Human PKIs were active 440 kinases and included 13,949 covalent PKIs. The collection of qualifying PKIs and corresponding inactive compounds is made available as an open access deposition. Limitations & next steps: Potential limitations include activity data incompleteness and assay variance. The data set can be used to investigate PKIs with alternative modes of action and calibrate computational methods.

Plain Language Summary

Protein kinases are proteins that play a role in how cells grow. In cancer cells, protein kinases are altered, which can cause abnormal growth. Protein kinase inhibitors (PKIs) specifically target protein kinases and are considered for treating different diseases, like cancer. In this study, we investigated a large number of PKIs that are available to the public to find ones with reliable activity data. We aim to understand how their structure affects their activity, including how these compounds bind to protein kinases. This helps us to identify different types of PKIs. Understanding PKIs is important for both basic research in the protein kinase field and drug discovery.

Graphical abstract

In drug discovery, inhibitors of human kinases are intensively investigated for a variety of therapeutic applications. Shown is a phylogenetic tree representation of the human kinome. Blue dots represent protein kinases for which chemical inhibitors are available. The dots are scaled in size according to the number of inhibitors per kinase. Currently available inhibitors cover more than 80% of the human kinome.

PKIs are intensely investigated as drug candidates in different therapeutic areas [Citation1–3]. The human kinome comprises 518 kinases [Citation4]. Currently, PKIs with high-confidence activity data are available for more than 400 kinases, as further discussed below, and most of these extensively or still preliminarily explored kinases are considered as drug targets for the treatment of a variety of acute or chronic diseases [Citation1–3]. Thus far, 71 human PKIs are approved as drugs by the US FDA [Citation5]. Most PKIs are directed against the adenosine triphosphate (ATP) cofactor binding site in the catalytic kinase domain that is essentially conserved across the human kinome, giving rise to frequent multi-kinase activity of PKIs and polypharmacology [Citation2,Citation3]. ATP site-directed PKIs mostly act by competitive binding including different inhibitory mechanisms [Citation6,Citation7]. These PKIs either occupy the ATP pocket or bind to the activation loop and stabilize an inactive conformation. Other PKIs elicit allosteric effects at various sites distributed across the catalytic kinase domain [Citation6,Citation8] or covalently inhibit kinases [Citation9,Citation10]. Competitive or allosteric covalent inhibition is facilitated by targeting the side chains of free cysteines or other residues with reactive groups conventionally termed warheads [Citation9–11]. Compared with competitive ATP site-directed PKIs, only limited numbers of covalent and especially allosteric PKIs have been reported thus far.

In recent years, there has been substantial growth in the number of PKIs that are publicly available [Citation12,Citation13]. To our knowledge, the last comprehensive survey of public domain PKIs was reported in 2018 [Citation13]. Publicly available PKIs provide a large knowledgebase for basic research and drug discovery. Five years after the last survey, we have systematically curated PKIs for which reliable activity data are available and generated large data sets for follow-up analysis. These data sets and an open access deposition making them freely available are described herein.

Methodology

Data curation

Protein kinase information was retrieved from UniProtKB/Swiss-Prot (release 3 August 2022) [Citation14]. A list of unique identifiers (Uniprot IDs) of human and mouse PKs was used to screen the ChEMBL (version 31) [Citation15] and BindingDB [Citation16] databases for PKI activity data (both databases were accessed in October 2022).

ChEMBL: Only PKI activity annotations falling into the SINGLE PROTEIN target category tested in direct interaction assays at the highest confidence level (ChEMBL confidence score 9) with standard activity measurements (IC50, Ki, Kd) in nanomolar (nM) units were considered. Activity values were recorded in negative decadic logarithmic form. Compounds with problematic activity comments including ‘uncertain’, ‘potential transcription error’, and ‘outside typical range’ or inconsistent ‘active’ / ‘inactive’ labels were omitted. Furthermore, a threshold of 10,000 nM was applied for classifying a PKI as active. Compounds with reported lower activity (that is, borderline active PKIs or PKIs with questionable activity) were classified as inactive.

BindingDB: PKIs were extracted on the basis of UniProt IDs for human and mouse protein kinases. Standard activity measurements including IC50, Ki, and Kd values with activity relationship “=” reported for single chain targets were considered also applying the 10,000 nM threshold for inhibitory activity. Compounds with activity relationship “>” and standard values of at least 10,000 as well as PKIs with activity relationship “=” and standard values larger than 10,000 nM were classified as inactive.

SMILES strings [Citation17] representing qualifying PKIs from ChEMBL or BindingDB were subjected to a standardization protocol that included canonicalization, neutralization, removal of salts and stereochemical information. PKIs from ChEMBL and BindingDB were combined using the resulting unique non-stereo sensitive canonical SMILES strings.

If multiple qualifying potency measurements (standard activity relation “=”) of the same type were available for a kinase-inhibitor pair, the mean was calculated as the final potency annotation. If IC50, Ki, and/or Kd value were available, assay-independent Ki or Kd were selected. Activity annotations calculated from multiple values were discarded if the standard deviation was greater than 1 (logarithmic units). Kinase-inhibitor pairs for which conflicting quantitative and qualitative measures (standard activity relation “>” or “>>”) were available in ChEMBL and/or BindingDB were also disregarded.

Finally, the selected compounds were organized in separate data sets of active human and mouse PKIs and corresponding compounds classified as inactive. summarizes the PKI data curation and aggregation process.

Figure 1. Data curation and aggregation.

The workflow diagram summarizes protein kinase inhibitor data curation and aggregation steps described in the text.

PKI: Protein kinase inhibitor.

Figure 1. Data curation and aggregation.The workflow diagram summarizes protein kinase inhibitor data curation and aggregation steps described in the text.PKI: Protein kinase inhibitor.

Analogue series

An analogue series (AS) consists of several compounds that share the same core structure (also termed scaffold) and are distinguished by different substituents (R-groups) at one or more sites, as illustrated in . From human and mouse PKIs, AS with single or multiple substitution sites were systematically extracted using the compound-core relationship (CCR) algorithm [Citation18]. The CCR approach systematically fragments compounds based on retrosynthetic rules and identifies all compound core structures after removal of substituents. All possible compound–core relationships are explored and compounds containing the same core are assigned to an AS. Hence, each AS represents a unique core structure.

Figure 2. Exemplary analogue series.

Shown is a core structure-based representation of an analogue series with activity against the Aurora kinase family. The core structure is shared by multiple analogues forming this series that are distinguished from each other by different combinations of R-groups at two substitution sites (designated R1 and R2, respectively).

Figure 2. Exemplary analogue series.Shown is a core structure-based representation of an analogue series with activity against the Aurora kinase family. The core structure is shared by multiple analogues forming this series that are distinguished from each other by different combinations of R-groups at two substitution sites (designated R1 and R2, respectively).

Covalent inhibitors

To identify covalent PKIs, substructure searches were carried out in the human PKI data set with 14 commonly used warheads targeting the side chains of different amino acid residues [Citation19].

Exemplary results

Protein kinase inhibitors

A total of 155,579 qualifying unique human PKIs were obtained and a comparably small set of 3057 mouse PKIs. Human and mouse PKIs formed a total of 237,620 and 3685 unique compound-kinase interactions, respectively. Human PKIs were active against 440 kinases, providing ∼85% coverage of the human kinome, and mouse PKIs were active against 109 kinases (that is, each PKI inhibited one or more kinases). In addition, 14,240 compounds were classified as inactive (inhibitory activity >10,000 nM) against 343 human kinases and 481 compounds as inactive against 39 mouse kinases. Compared with a previous survey reported in 2018 [Citation13], the number of human PKIs with reliable activity data has further increased by ∼43,000 compounds. shows the logarithmic potency distribution of the human PKIs, with a median potency value of 7.1 (corresponding to ∼100 nM potency). Most PKIs were active in the logarithmic potency range of 6–8. We have used the new data set of human PKIs to analyze the promiscuity and potency distribution of different types of PKIs, among other properties, and identify activity cliffs formed by PKIs and potential covalent inhibitors, as reported recently [Citation20].

Figure 3. Potency of human protein kinase inhibitors.

The boxplot represents the potency value distribution of all qualifying human protein kinase inhibitors. In a boxplot, the value distribution is represented by its minimum (lower whisker), lower quartile (lower boundary of the box), median (horizontal line in box), upper quartile (upper boundary of the box) and maximum (upper whisker). Diamond symbols represent values classified as statistical outliers.

PKI: Protein kinase inhibitors.

Figure 3. Potency of human protein kinase inhibitors.The boxplot represents the potency value distribution of all qualifying human protein kinase inhibitors. In a boxplot, the value distribution is represented by its minimum (lower whisker), lower quartile (lower boundary of the box), median (horizontal line in box), upper quartile (upper boundary of the box) and maximum (upper whisker). Diamond symbols represent values classified as statistical outliers.PKI: Protein kinase inhibitors.

Analogue series & core structures

From human PKIs, 29,298 AS were algorithmically extracted and 41,171 singletons were identified (that is, PKIs having no structural analogues), as reported [Citation20]. Mouse PKIs yielded 714 AS and 856 singletons. Because each AS and each singleton contained a unique core structure, human and mouse PKIs yielded a total of 70,469 and 1570 distinct cores, respectively. Although algorithmically defined core structures are often similar (yet distinct), core structure diversity among PKIs is generally high [Citation20].

Covalent inhibitors of human kinases

CovalentInDB, a database for covalent inhibitors [Citation21], currently contains ∼2000 PKIs. We searched our human PKI dataset for compounds with 14 different warheads and identified 13,949 potential covalent PKIs. shows the distribution of compounds over these warheads. Acrylamide (targeting Cys, Lys, Ser and Thr residues) and heterocyclic urea (targeting Ser and Thr residues) were the most frequently detected warheads in PKIs, with 9861 and 2268 instances, respectively. The median potency values of all warhead-dependent subgroups of covalent PKIs for which at least 100 compounds were available fell into the logarithmic potency range of 6–8. Overall, an unexpectedly large number of potential covalent PKIs was identified, providing ample opportunities for follow-up analysis.

Figure 4. Covalent human protein kinase inhibitors.

The histogram shows the distribution of human protein kinase inhibitors containing 14 chemical warheads enabling covalent inhibition.

PKI: Protein kinase inhibitor.

Figure 4. Covalent human protein kinase inhibitors.The histogram shows the distribution of human protein kinase inhibitors containing 14 chemical warheads enabling covalent inhibition.PKI: Protein kinase inhibitor.

Data

Newly curated human and mouse PKIs and corresponding compounds classified as inactive were organized in four ‘tab separated values’ (tsv) files containing the following information:

  • Compound_new_ID: internal ID assigned to each compound,

  • nonstereo_aromatic_smile: standardized compound SMILES,

  • Uniprot_ID: UniProtKB/Swiss-Prot human and mouse kinase IDs,

  • pref_name: preferred/full name of the kinase family,

  • activity_id: reference to the activity data from ChEMBL and/or BindingDB,

  • mean_log: mean negative logarithmic potency value (NaN: qualitative measurement),

  • selected_stvalue: standard value used,

  • ORGANISM: organism of the kinase,

  • Human PKIs (human_PKI_active.tsv) containing warheads were flagged: CPKI: True.

These data sets and a readme.txt document have been made freely available on the ZENODO open access platform [Citation22].

Limitations & next steps

Importantly, for all human and mouse PKIs and corresponding compounds classified as inactive reported herein, high-confidence activity data are available. However, the resulting PKI activity profiles are generally affected by data incompleteness because PKIs have typically not been tested against the entire kinome (which especially applies to PKIs reported in older publications). Data incompleteness might lead, for example, to an underestimation of PKI promiscuity on the basis of currently available activity measurements. Furthermore, PKIs can be tested in many different assay formats, which might give rise to experimental heterogeneity and variance, especially for assay-dependent IC50 values. During data curation, we have addressed this issues by requiring highest assay confidence and by assessing the consistency of multiple activity measurements, if available. Data incompleteness and potential experimental heterogeneity are the only intrinsic PKI data limitations we currently are aware of.

The large number of newly curated human PKIs provides an extensive knowledgebase for the kinase field and a sound basis for follow-up investigations. For example, given the many AS formed by PKIs, these compound series yield a wealth of SAR information for medicinal chemistry. Furthermore, the significant number of putative covalent PKIs we identified enables the assessment of covalent inhibition and warhead characteristics on an unprecedentedly large scale.

Notably, from a medicinal chemistry point of view, PKI candidates with 10,000 nM activity are essentially regarded as irrelevant for drug discovery. However, both PKIs classified as inactive or active provide immediate opportunities for computational investigations. For example, they can be used for evaluating or calibrating potency prediction methods including machine learning regression models. Hence, data sets of inactive PKIs should also be useful for various applications.

Summary points
  • Protein kinase drug discovery is introduced.

  • Differences between protein kinase inhibitors (PKIs) are discussed.

Methodology

  • Data curation is detailed.

  • Identification of analogue series and covalent PKIs is described.

Exemplary results

  • PKI statistics are provided.

  • The potency distribution is assessed.

  • PKI core structures and warheads are analyzed.

Data

  • Data sets are described.

  • The open access data deposition is detailed.

Limitations & next steps

  • Data incompleteness and assay variance are discussed.

  • Large-scale exploration of structure-activity relationships and covalent inhibition is enabled.

  • Opportunities for computational investigations exist.

Author contributions

J Bajorath conceived the study; E Xerxa carried out the analysis; E Xerxa and J Bajorath analyzed the results; E Xerxa and J Bajorath prepared the manuscript.

Acknowledgments

The authors thank F Miljković and M Vogt for support.

Financial & competing interests disclosure

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript.

References

  • FergusonFM , GrayNS. Kinase inhibitors: the road ahead. Nat. Rev. Drug Discov.17, 353–376 (2018).
  • CohenP , CrossD , JännePA. Kinase drug discovery 20 years after imatinib: progress and future directions. Nat. Rev. Drug Discov.20, 551–569 (2021).
  • AttwoodMM , FabbroD , SokolovAV , KnappS , SchiöthHB. Trends in kinase drug discovery: targets, indications and inhibitor design. Nat. Rev. Drug Discov.20, 839–861 (2021).
  • ManningG , WhyteD , MartinezR , HunterT , SudarsanamS. The protein kinase complement of the human genome. Science298, 1912–1934 (2002).
  • Ayala-AguileraCC , ValeroT , Lorente-MaciasA , BaillacheDJ , CrokeS , Unciti-BrocetaA. Small molecule kinase inhibitor drugs (1995–2021): medical indication, pharmacology, and synthesis. J. Med. Chem.65, 1047–1131 (2021).
  • GavrinLK , SaiahE. Approaches to discover non-ATP site kinase inhibitors. Med. Chem. Comm.4, 41–51 (2013).
  • MüllerS , ChaikuadA , GrayNS , KnappS. The ins and outs of selective kinase inhibitor development. Nat. Chem. Biol.11, 818–821 (2015).
  • LaufkötterO , HuH , MiljkovićF , BajorathJ. Structure- and similarity-based survey of allosteric kinase inhibitors, activators, and closely related compounds. J. Med. Chem.65, 922–934 (2022).
  • AbdeldayemA , RaoufYS , ConstantinescuSN , MorigglR , GunningPT. Advances in covalent kinase inhibition. Chem. Soc. Rev.49, 2617–2687 (2020).
  • ChaikuadA , KochP , LauferSA , KnappS. The cysteinome of protein kinases as a target in drug development. Angew. Chem. Int. Ed.57, 4372–4385 (2018).
  • GehringerM , LauferSA. Emerging and re-emerging warheads for targeted covalent inhibitors. J. Med. Chem.62, 5673–5724 (2019).
  • HuY , FurtmannN , BajorathJ. Current compound coverage of the kinome. J. Med. Chem.58, 30–40 (2015).
  • MiljkovićF , BajorathJ. Computational analysis of kinase inhibitors identifies promiscuity cliffs across the human kinome. ACS Omega3, 17295–17308 (2018).
  • The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res.45, D158–D169 (2017).
  • GaultonA , BellisLJ , BentoAPet al.ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res.40, D1100–D1107 (2012).
  • LiuT , LinY , WenX , JorissenRN , GilsonMK. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res.35, D198–D201 (2007).
  • WeiningerD. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci.28, 31–36 (1988).
  • NavejaJJ , VogtM , StumpfeD , Medina-FrancoJL , BajorathJ. Systematic extraction of analogue series from large compound collections using a new computational compound-core relationship method. ACS Omega4, 1027–1032 (2019).
  • McAulayK , BilslandA , BonM. Reactivity of covalent fragments and their role in fragment based drug discovery. Pharmaceuticals15, 1366 (2022).
  • XerxaE , MiljkovićF , BajorathJ. Data-driven global assessment of protein kinase inhibitors with emphasis on covalent compounds. J. Med. Chem.66, 7657–7665 (2023).
  • DuH , GaoJ , WengGet al.CovalentInDB: a comprehensive database facilitating the discovery of covalent inhibitors. Nucleic Acids Res.49, D1122–D1129 (2021).
  • https://doi.org/10.5281/zenodo.7970944