1,990
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A data-driven machine learning approach for discovering potent LasR inhibitors

, , , , , & show all
Article: 2243416 | Received 10 Mar 2023, Accepted 28 Jul 2023, Published online: 08 Aug 2023

ABSTRACT

The rampant spread of multidrug-resistant Pseudomonas aeruginosa strains severely threatens global health. This severity is compounded against the backdrop of a stagnating antibiotics development pipeline. Moreover, with many promising therapeutics falling short of expectations in clinical trials, targeting the las quorum sensing (QS) system remains an attractive therapeutic strategy to combat P. aeruginosa infection. Thus, our primary goal was to develop a drug prediction algorithm using machine learning to identify potent LasR inhibitors. In this work, we demonstrated using a Multilayer Perceptron (MLP) algorithm boosted with AdaBoostM1 to discriminate between active and inactive LasR inhibitors. The optimal model performance was evaluated using 5-fold cross-validation and test sets. Our best model achieved a 90.7% accuracy in distinguishing active from inactive LasR inhibitors, an area under the Receiver Operating Characteristic Curve value of 0.95, and a Matthews correlation coefficient value of 0.81 when evaluated using test sets. Subsequently, we deployed the model against the Enamine database. The top-ranked compounds were further evaluated for their target engagement activity using molecular docking studies, Molecular Dynamics simulations, MM-GBSA analysis, and Free Energy Landscape analysis. Our data indicate that several of our chosen top hits showed better ligand-binding affinities than naringenin, a competitive LasR inhibitor. Among the six top hits, five of these compounds were predicted to be LasR inhibitors that could be used to treat P. aeruginosa-associated infections. To our knowledge, this study provides the first assessment of using an MLP-based QSAR model for discovering potent LasR inhibitors to attenuate P. aeruginosa infections.

Introduction

Antimicrobial resistance (AMR) has emerged as one of the leading public health threats of the 21st century [Citation1]. As of 2019, the global death tolls attributable to and associated with AMR have stretched to approximately 1.3 million and 3.6 million, respectively. On top of that, the drastic change in the healthcare landscape due to the ongoing SARS-CoV-2 crisis has increased the usage of antibiotics to safeguard hospitalized patients with COVID-19 against secondary bacterial infections [Citation2]. One of the primary pathogens involved in nosocomial infections is Pseudomonas aeruginosa. Combating P. aeruginosa is challenging due to its intrinsic resistance toward most conventional antibiotics. In addition to its near impermeable double membrane [Citation3], P. aeruginosa also constitutively expresses multidrug efflux pumps from the Resistance-Nodulation-Division superfamily, antibiotic-inactivating enzymes such as β-lactamases and forms biofilms that decreases antibiotic efficacy [Citation4,Citation5].

One attractive strategy to treat P. aeruginosa infection is by suppressing its pathogenicity. This can be achieved by inhibiting the secretion of virulence factors under the control of the Quorum Sensing (QS) system. The QS circuitry is a communication system between bacterial cells that regulates different gene expressions in response to population density [Citation6,Citation7]. P. aeruginosa has two major QS circuitries that are essential for its adaptation and survival, i.e. the LasR-LasI and RhIR-RhII systems (). These two circuitries act interdependently, in which the RhIR-RhII circuitry can only be activated with a fully functional LasR-LasI system. In the las system, the autoinducer molecule N-(3-oxododecanoyl) homoserine lactone (3OC12-HSL; henceforth referred to as HSL) is synthesized by the synthase protein LasI and secreted to the cell exterior [Citation8]. Once a sufficient concentration of HSL has accumulated in the environment, HSL will bind to receptor LasR to form a complex. Subsequently, the HSL-bound LasR will activate gene transcription with the RhlR-RhII system to express genes such as lasA and lasB [Citation9]. Together, the two QS circuitries regulate various virulence factors contributing to P. aeruginosa pathogenesis. These virulence factors include various proteases, pyocyanin, rhamnolipids and biofilms [Citation10]. While some studies suggest that las mutants in P. aeruginosa can retain functional QS responses [Citation11,Citation12], LasR still holds promise as a drug target for treating infections caused by this bacterium due to its specificity [Citation13]. By targeting specific amino acid residues in the LasR protein, off-target effects can be potentially minimized [Citation13]. Furthermore, P. aeruginosa also has a third major QS system known as Pseudomonas quinolone signal (PQS) that regulates various virulence factors, including biofilm formation, pyocyanin production, and elastase activity [Citation14]. However, PQS mutants have only shown reduced virulence in vitro [Citation15,Citation16] and in animal models [Citation17,Citation18], and its significance in human infections is still uncertain [Citation19]. Additionally, targeting PQS and its associated enzymes with small molecules or other interventions is more challenging due to their structural complexity [Citation19]. Hence, inhibiting las QS system remains a promising approach to treat P. aeruginosa-associated infections [Citation20] as suppressing P. aeruginosa pathogenesis could provide time for the host immune system to counter the bacteria infection naturally [Citation21].

Figure 1. Schematic representation of QS circuitry in P. aeruginosa. This Gram-negative bacterium uses two main AHL QS systems, the las and rhl systems, to form a regulatory cascade. Lasl directs the synthesis of HSL, which interacts with LasR and activates the target promoters in the las system. The rhll gene product directs the synthesis of C4-HSL, which interacts with the cognate transcriptional regulator, RhIR, activating the target gene promoters in the rhl system. Once a critical threshold LasR-3OC12-HSL concentration is attained, the expression of rhlR and rhll is activated, resulting in the secretion of multiple virulence factors, such as elastase, LasA protease, alkaline protease, exotoxin A, biofilms, pyocyanin, and rhamnolipid. These two QS systems are also linked through the regulatory actions of VqsR, GacA, RsmA, RsaL, and QscR proteins and another signalling molecule, PQS. The solid arrows indicate a stimulatory effect, while the dashed arrows indicate an inhibitory effect.

Figure 1. Schematic representation of QS circuitry in P. aeruginosa. This Gram-negative bacterium uses two main AHL QS systems, the las and rhl systems, to form a regulatory cascade. Lasl directs the synthesis of HSL, which interacts with LasR and activates the target promoters in the las system. The rhll gene product directs the synthesis of C4-HSL, which interacts with the cognate transcriptional regulator, RhIR, activating the target gene promoters in the rhl system. Once a critical threshold LasR-3OC12-HSL concentration is attained, the expression of rhlR and rhll is activated, resulting in the secretion of multiple virulence factors, such as elastase, LasA protease, alkaline protease, exotoxin A, biofilms, pyocyanin, and rhamnolipid. These two QS systems are also linked through the regulatory actions of VqsR, GacA, RsmA, RsaL, and QscR proteins and another signalling molecule, PQS. The solid arrows indicate a stimulatory effect, while the dashed arrows indicate an inhibitory effect.

Structurally, the LasR receptor comprises two independently folded protein domains: the N-terminal ligand-binding domain (LBD) and a C-terminal DNA-binding domain (DBD) [Citation22,Citation23]. The binding of HSL stabilizes the LasR monomer and promotes dimerization [Citation24]. The resulting LasR homodimer complex binds to the target DNA to activate gene transcription subsequently [Citation25]. To date, multiple crystal structures of LasR have been reported, including the resolved LasR-LBD structures [Citation26] and a more recent structure (PDB ID: 6V7X) that captures both the LBD and DBD regions of LasR [Citation27]. The LasR-LBD adopts an α-β-α sandwich that fully encapsulates structurally distinct ligand classes such as HSL, tri-phenyl- and bicyclo-hexanal-based inhibitors [Citation22,Citation23,Citation28]. Key residues involved in ligand-binding are Tyr56, Trp60, Arg61, Asp73, Thr75, and Ser129 [Citation22]. Additionally, mutational studies showed interactions with the residues Tyr56, Trp60, and Ser129 could govern whether ligands would elicit an agonistic or antagonistic profile [Citation29].

Inhibition of LasR with ligands has shown a significant reduction in virulence factor production, which could ameliorate disease progression of chronic and acute infections [Citation11]. Several groups have attempted to synthesize LasR inhibitors using varying approaches; however, most contain a 1,2,3-triazole-based moiety [Citation30–32]. Such inhibitors are not clinically applicable due to their instability in alkaline conditions [Citation33]. Besides, these triazole-based inhibitors are targeted for hydrolysis by mammalian lactonases [Citation34,Citation35]. Multiple studies investigating the inhibitory effects of LasR have alluded that LasR inhibitors could inhibit biofilm formation [Citation36], reduce multidrug resistance [Citation37], and inhibit twitching, swimming, and swarming motilities in P. aeruginosa [Citation38,Citation39]. Therefore, there is an urgent need to discover LasR inhibitors with novel chemical scaffolds.

To search for new chemical entities, high throughput screen (HTS) methods are often used to screen millions of compounds rapidly [Citation40]. Despite that, this conventional drug discovery approach remains expensive for academic laboratories [Citation41]. More importantly, it is unable to cover the chemical space in a time-efficient manner. While HTS can screen 100,000–1,000,000 compounds daily, the entire chemical space is estimated to contain 10 [Citation42] molecules [Citation43]. Fragment-based drug discovery (FBDD) and open innovation strategies are two alternative approaches that can accelerate drug discovery and reduce costs [Citation44,Citation45]. FBDD uses small molecule fragments for drug development, which are quicker and less expensive to synthesize and screen [Citation46]. On the other hand, open innovation involves collaborating with external partners to gain access to resources and expertises while also reducing costs [Citation47]. Nonetheless, both approaches have their potential drawbacks. FBDD may necessitate significant upfront investments in infrastructure and expertise, such as specialized equipment and the knowledge to design complicated fragment libraries [Citation48]. Furthermore, the risks of collaboration failure such as wasted resources, missed opportunities, and damaged reputations may also make it unappealing to justify the advantages of collaborating externally [Citation49]. Therefore, to search for new inhibitors in the most cost-effective and time-efficient manner, researchers can use Quantitative Structure Activity Relationship (QSAR) modeling. QSAR models attempt to derive the relationship between bioactivity and chemical features to predict potential inhibitors. Recently, incorporating machine learning (ML) methods, such as hyperparameter tuning [Citation50] and ensemble methods [Citation51,Citation52] into QSAR modeling has improved the hit rate of QSAR models.

In this study, we aimed to develop an ML algorithm that can screen and predict potent LasR inhibitors to inhibit P. aeruginosa virulence factor secretions. First, we downloaded all the existing LasR inhibitors from ChEMBL database, which provided an experimental dataset of 149 molecules for data cleaning. Next, we constructed a classification QSAR model using a boosted Multilayer Perceptron algorithm. This ML model was selected for discovering potent LasR inhibitors as it provided the overall best performance compared to other models. Due to the size limitation of the dataset, we rigorously evaluated our models using 5-fold cross-validation (CV), 10-fold CV, and test sets. Then, the model was deployed against a custom library containing compounds with drug-like properties to shortlist compounds as potent LasR inhibitors. With the rapid emergence of antibiotic-resistant strains, there is a need to characterize more novel potent inhibitors of LasR as therapeutics since sociomicrobiology studies have found that resistance tends to spread less rapidly to QS inhibitors that target virulence phenotypes than conventional antibiotics that target bacterial growth [Citation53,Citation54]. On this account, the inhibition of QS has emerged as an important anti-virulence approach [Citation55,Citation56]. To the best of our knowledge, this is the first assessment of using an MLP-based QSAR model for screening potent inhibitors of LasR to attenuate P. aeruginosa infections.

Methodology

Preparation of dataset

The LasR inhibitor dataset comprising of 149 molecules (ChEMBL1075207) was downloaded from the ChEMBL database. The dataset was processed by removing inhibitors with incomplete SMILES, inhibitors with incomplete bioactivity data and compounds with percentage inhibition lower than −10%. After processing, a total of 71 compounds remained for further analysis. The compounds were featured as molecular fingerprints (FP) using PaDEL-Descriptor software, which are binary codes of a molecular structure. All the compounds were neutralized, and energy minimized using the OpenBabel using the default settings.

QSAR modeling

WEKA is a workbench software package containing different machine learning algorithms for data mining tasks. In a point-and-click manner, machine learning models can be built without coding [Citation57]. The important features were first selected using the WEKA version 3.9.5 attribute selector ‘CfsSubsetEval’ with ‘BestFirst’ as the selection method to overcome model overfitting [Citation58]. Subsequently, the selected features were checked for multi-collinearity using a correlation matrix. The dataset containing the selected features was then used as input for eight widely used classification algorithms, namely Bayes Net, Multinomial Naive Bayes (MNB), Logistic Regression, Multilayer Perceptron (MLP), Support Vector Machine (SVM), IBk (k-Nearest Neighbor), KStar (K*), and Random Forest (RF) to identify which algorithm has the highest predictivity. The classifiers were evaluated using 5-fold cross-validation (CV), 10-fold CV, Leave-One-Out cross-validation (LOOCV) and 80/20 split test sets. The selected base learners were fine-tuned with hyperparameter tuning, ensemble method, and boosting technique. The metrics used to evaluate the QSAR models were prediction accuracy, ROC-AUC, and MCC.

MCC=TPxTNFPxFNTP+FPTP+FNTN+FPTN+FN
where TP, TN, FP, and FN denote the instances of true positives, true negatives, false positives, and false negatives, respectively.

To test the predictive capability of our model, we evaluated our model using an external test set. Specifically, we selected computationally and experimentally validated LasR inhibitors that met two criteria: (a) the compounds were sourced from published literature, and (b) the structures were not present in the training dataset. The predicted active compounds were tabulated as a result of this evaluation.

Model deployment

In preparation for model deployment, a ligand-based virtual screening (LBVS) was conducted using 598,411 compounds from the Enamine Advanced Collection database [Citation59] to identify compounds with structural similarities to a recently characterized active QS inhibitor against LasR, which was termed as compound 6p (w N-(1 H-Benzo[d]imidazol-2-yl)-2-(4-(4-nitrophenyl)-1 H–1,2,3-triazol-1-yl)acetamide [Citation31]. Before screening, all the compounds were prepared by washing (protonation at pH 7, adding explicit hydrogen, generating 3D coordinates), adding partial charges, and minimizing energy using the General Amber Force Field (GAFF) with Open Babel (version 2.4.0) default settings. Each compound was subsequently converted to a multi-conformer using OpenEye Scientific OMEGA [Citation42]. As for preparing the query, the two-dimensional structure of compound 6p was drawn in MarvinSketch version 22.2.0, converted into canonical smiles, and energy-minimized using the Open Babel.

To construct a virtual compound library of 5,000 compounds, the targeted database was screened for compounds similar to the query using the ROCS and EON software by OpenEye. The software can rapidly identify compounds with similar chemical or pharmacophore features from a database with just a single query. ROCS would screen and rank compounds based on shape and color comparisons [Citation60], while EON would re-score ROCS hits based on their shape and electrostatics similarity [Citation61]. Thereafter, our ML model constructed in WEKA was deployed against the virtual compound library. All the compounds predicted as active were subsequently compiled and converted into Extended Fingerprint descriptors using the PaDEL-Descriptor software.

Molecular docking studies and principal component analysis

We obtained the LasR sequences from four distinct strains of P. aeruginosa, namely PAO1 [Citation62], PA14 [Citation63], PA7 [Citation64], and PAK [Citation65], from the Pseudomonas Genome Database [Citation66]. Using Clustal Omega, we conducted a multiple sequence alignment to compare the LasR sequences among these strains. Furthermore, we downloaded a recent 3D crystal structure of the LasR LBD:mBTL complex (PDB ID: 6MWL) from the PDB database for further analysis. The PDB structure of the protein was prepared using the Genetic Optimization for Ligand Docking (GOLD) 5.3.0 package developed by the Cambridge Crystallographic Data Center (CCDC). The 215 ML-predicted compounds were docked to the active site of the LasR protein at 30 GA runs using GOLD. Using the default ChEMPLP scoring function, the ligands were then ranked based on their binding affinities from high to low. The docking study was repeated with the training set as input. To assess the performance and predictive ability of the classification on the training set, metrics such as the Area Under the Receiver Operating Characteristic (AUROC) and Enrichment Factor at 5% in GraphPad Prism v9.0.0 were used.

EnrichmentFactor(EFn%)= [(no.ofactivecompoundsontopn%oftherankeddataset)/(totalno.ofactivecompoundsinthedataset)] (no.ofcompoundsontopn%oftherankeddataset)
/(totalno.ofcompoundsinthedataset)

Subsequently, the top 20 predicted compounds in molecular descriptors were subjected to Principal Component Analysis (PCA) using PUMA (Platform for Unified Molecular Analysis, Version 1.0) [Citation67] with the docked results from the training set. Any outlier identified was removed from the prediction set and substituted with the following compound under the best-ranking list. Finally, the top 20 hits selected were re-docked with 100 GA runs to make the best pose prediction. The protein-inhibitor interactions of these top-scoring docked poses were then visualized using PyMol 2.0. To verify the accuracy of the docking software, a control docking experiment was also performed using 100 GA runs to redock the ligand OHN into LasR (PDB ID: 3IX3) cognately.

Molecular dynamics simulations and binding of free energies

Among the top 20 hit compounds, six compounds were shortlisted as potential lead candidates [Citation68]. Molecular dynamic studies were performed to analyze the stability of the inhibitor-protein complexes. We used the crystal structure of the LasR LBD:mBTL complex (PDB ID: 6MWL) to perform MD simulations on the binding complexes of the top six hits, as well as the native LasR signaling molecule, N-(3-oxo-dodecanoyl) homoserine lactone (OHN) retrieved from the LasR-OC12 HSL complex (PDB ID: 3IX3). LasR-OHN complex was the positive control, while the unbound protein (apoprotein) served as a negative control. A competitive LasR inhibitor that was recently identified, naringenin, was also included in the study [Citation69]. A total of nine MD simulations, 200 ns each, were performed. The antechamber module in AmberTools21 software [Citation70] was used to parameterize the inhibitors using the GAFF forcefield with the partial charges of the inhibitors assigned using the AM1-BCC charge methods. The tleap module in AMBER was used to solvate the system with a truncated octahedron periodic box with a 10 A buffer distance from the complex along each side. Six Na+ counterions were added to neutralize the whole system in each case. The TIP3P model was employed to describe the water molecules. The temperature was kept at 300K and controlled by the Langevin thermostat. A time-step of 2.0 fs was fixed for all simulations.

Initially, the apo structure and each complex underwent a restrained minimization (with a force constant of 300 kcal/mol/Å2) using 500 steps of the steepest descent algorithms followed by another 500 cycles of the conjugate gradient scheme. Subsequently, the complexes were optimized using the steepest descent and conjugate gradient methods with 1,000 steps each. After the minimization, the temperature of each solvated system was gradually increased from 0 K to 300 K over 20 ps with the NVT ensemble using a restraint force constant of 100 kcal/mol/Å2 acting on the LasR residues. Following that, a 200 ns equilibration simulation was performed with a constant pressure of 1.0 atm at a fixed temperature of 300 K. All simulations were executed with the GPU-enabled pmemd.cuda MD engine in AMBER21. On completing the MD runs, the Root-Mean-Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) were estimated to check the conformational changes and stability of apo-LasR when it is bound to the top hits using AMBER module cpptraj.

The post-docking binding affinity calculation was performed using the Molecular Mechanics/Generalized Born and Surface Area (MM-GBSA) to understand the energetic contributions of the ligand binding affinity. The MMPBSA.py module [Citation71] calculated the solvation-free and interaction energies for the LasR, ligands, and ligand-bound LasR complexes. A total of 5,000 frames per complex were processed from the trajectories, energies of the system were estimated through the following equation:

ΔGBinding=ΔHTΔSΔEMM+ΔGsolTΔSwhereΔEMM=ΔEele+ΔEvdwΔGsol=ΔEgb+ΔEnp

The ΔEMM denotes the sum of the electrostatic and Van der Waals interaction energies in the gas phase. The ΔGsol represents the sum of the polar (calculated using the Generalized Born model) and nonpolar contributions to the solvation energy.

The equations for calculating binding energy are as follows:

ΔGbind=ΔE+ΔGsolv+ΔGSAΔE=EcomplexEproteinEligand

where Ecomplex, Eprotein, and Eligand indicate the minimized energies for protein-inhibitor complex, protein, and inhibitor, respectively.

ΔGsolv=ΔGsolv(complex)ΔGsolv(protein)ΔGsolv(ligand)ΔGSA=ΔGSA(complex)ΔGSA(protein)ΔGSA(ligand)

where ∆GSA is the nonpolar contribution to the solvation energy due to the surface area. GSA(complex), GSA(protein), and GSA(ligand) are the surface energies of complex, protein, and ligand, respectively.

Principal component analysis & free energy landscape

In addition, PCA, also known as essential dynamics (ED) analysis, was used to study the broad concerted motions in LasR in their bound states through the eigenvectors of the mass-weighted covariance matrix (C) of the atomic positional fluctuations [Citation72–74]. To generate more snapshots for plotting the free energy landscape, the MD trajectories (each having a duration of 500 ns) were concatenated. PCA was calculated from the concatenated trajectory, containing approximately 125,000 snapshots. The pytraj library was employed for handling the protein coordinates in a Jupyter notebook. The first two principal components, PC1 and PC2, were used for landscape plotting based on the equation below:

ΔGPC1,PC2=KBTlnPPC1,PC2

PC1 and PC2 represent the reaction coordinates, KB signifies the Boltzmann constant, and P (PC1, PC2) illustrates the probability distribution of the system along PC1 and PC2. The two-dimensional free energy contour maps were generated using the plotting package of PyEMMA 2.5.7. After that, the contoured wells projected on each landscape were further examined to identify the energy basins. Each landscape was constructed by combining three trajectories: OHN, naringenin, and compound. Color-coded triangles were used as markers to point out the energy minima. A representative structure of the complex was obtained by identifying the coordinates of the energy basin and deducing the frame number. The files of all the frames of the representative structures were then saved in PDB file format and subjected to the BIOVIA Discovery Studio Visualizer 4.5 software for protein-ligand interaction study.

Results and discussion

Preparation of dataset

We constructed our classification QSAR model using the workflow shown in .

Figure 2. Workflow to identify potential LasR inhibitors. The workflow outlines the following steps: (a) collecting and pre-processing of LasR inhibitor dataset from the ChEMBL database, (b) model construction and validation for deployment, (c) preparing a custom library with the Enamine database, and (d) assessing model performance through various analyses such as molecular docking, MD simulations, MM-GBSA analysis, FEL analysis, and protein-ligand interaction studies.

Figure 2. Workflow to identify potential LasR inhibitors. The workflow outlines the following steps: (a) collecting and pre-processing of LasR inhibitor dataset from the ChEMBL database, (b) model construction and validation for deployment, (c) preparing a custom library with the Enamine database, and (d) assessing model performance through various analyses such as molecular docking, MD simulations, MM-GBSA analysis, FEL analysis, and protein-ligand interaction studies.

First, we downloaded the compound dataset (ChEMBL1075207) containing 149 compounds. After removing compounds without SMILES or complete bioactivity data, 71 compounds with their percentage inhibition ranging from −10 to 98% were left. Next, to focus on identifying highly active compounds, we designated any inhibitors with a percentage inhibition larger than 50% active and the rest inactive. This yielded a dataset consisting of 23 active and 48 inactive inhibitors (Supplementary Table ST1). Our dataset exhibited a slight imbalance between the active and inactive inhibitors, resulting in a skewed ratio. Imbalanced datasets, whether highly imbalanced (e.g. 80:20 or 90:10) or slightly imbalanced (e.g. 60:40 or 55:45), can lead to biased model performance and insufficient representation of the minority class [Citation75,Citation76]. Achieving a perfectly balanced dataset is often impractical in real-world scenarios, especially without controlled variables and appropriate data preprocessing [Citation77]. To address the mild imbalance in our dataset, we employed cross-validation and hold-out test set strategies. These techniques allow for improved evaluation and validation of our models. We began by featuring these inhibitors with the molecular fingerprinting method ECFP4. The ECFP4 method can encode the substructure of chemical compounds and represent them in a binary of ‘0’ s and ‘1’ s [Citation78]. This yielded a dataset of compounds with 1024 molecular fingerprints (FP). To reduce model complexity and prevent data overfitting, we used the correlation-based feature selection subset evaluator (CfsSubsetEval) available in the Waikato Environment of Knowledge Analysis (WEKA) package to select a subset of unique features in the data for training. After removing irrelevant and redundant attributes from the classifier, a subset was obtained from the FP dataset. The attribute evaluator selected 33 features in the FP dataset for further classification modeling.

QSAR modeling and deployment

Next, we constructed QSAR models using eight classification ML algorithms trained on the FP dataset (71 compounds). Hence, the models were evaluated using a 5-fold cross-validation (CV) method. The metrics used to evaluate these models were prediction accuracy, Receiver Operator Characteristic curve (ROC) – Area Under Curve (AUC), and Matthews Correlation Coefficient (MCC). The ROC-AUC measures how well the models distinguish between active and inactive inhibitors. This metric could take values ranging from 0 to 1, with 1 indicating a perfect model [Citation79]. Meanwhile, the MCC metric could take values from 1 to −1. One indicates a perfect prediction, 0 for random prediction, and −1 indicates a total disagreement between the prediction and the observations [Citation80]. In this regard, the models exhibited prediction accuracies ranging from 77.46% to 90.14%, ROC-AUC values ranging from 0.806 to 0.954, and MCC values ranging from 0.55 to 0.78 when evaluated using 5-fold CV (Supplementary Table ST2). When evaluated using a 10-fold CV, model performance is shown in Supplementary Table ST3.

Next, the models were evaluated using test sets. These test sets were split from the input dataset by an 80:20 train-test split by WEKA. Each algorithm was evaluated ten times using different permutations of samples in the training and test sets. Then, the performance for each model was combined and averaged () [Citation81,Citation82]. Using this protocol; the QSAR models showed accuracy values ranging from 75.71% to 90.71%, AUC values ranging from 0.842 to 0.973 and MCC values ranging from 0.530 to 0.809.

Table 1. Performance of machine learning models when evaluated using test sets. The top-ranked classifiers with the highest AUC values representing a different family of algorithms are highlighted with red asterisks*..

After evaluating the performance of QSAR models based on 5-fold CV and external test set, we determined that the top QSAR models were constructed from (a) Multinomial Naïve Bayes (MNB), (b) Multilayer Perceptron (MLP) and (c) K* classifiers. The hyperparameters of these QSAR models were further fine-tuned by adjusting the learning rates, and their performance is listed in . While simply adjusting parameters in an existing model can improve model performance, more specific algorithmic approaches such as ensemble technique and boosting have been shown to provide even better performance in many cases [Citation83–85]. Thus, we also investigated using the ensemble method and boosting techniques to improve the performance of MNB, MLP, and K × .Among the three models, MLP boosted with AdaboostM1 produced the best classification model for identifying LasR inhibitors. This model has the highest accuracy (92.96%), while the MCC and ROC Area were found to be 0.838 and 0.928, respectively.

Table 2. Performance of the fined-tuned and boosted machine learning models when evaluated using the training and test sets. The learning rate of 0.001 and 0.005 are highlighted with a red asterisk(s)*.

Meanwhile, prediction accuracy of 90.73%, MCC of 0.81, and AUC value of 0.95 was achieved when the model was evaluated using the test sets. The confusion matrices for both the training and test sets are shown in below. On the other hand, both MNB and K* models only demonstrated a slight improvement in their performance when boosted. The MNB model boosted with AdaBoostM1 slightly increased accuracy (from 90.14% to 91.55%) and MCC (from 0.786 to 0.807). However, its AUC performance deteriorated from the initial 0.951 to 0.924. On the contrary, the base learner K* boosted with AdaBoostM1 showed deteriorated accuracy from 90.14% to 88.73%. Its MCC (from 0.778 to 0.760) and AUC (from 0.928 to 0.905) values also declined. Hence, we deduced that the QSAR model constructed with MLP boosted with AdaBoostM1 is the most suitable for predictions of LasR inhibitors. Additionally, to ensure an unbiased performance estimation, we implemented the Leave-One-Out Cross-Validation (LOOCV) approach. LOOCV involves iteratively withholding one data point as a test sample while training the model on the remaining data points. This process is repeated for every data point in the dataset, maximizing the utilization of all available data. LOOCV is particularly valuable for small datasets as it ensures that each sample serves as both a training and test instance, enabling a comprehensive evaluation of the model performance [Citation86]. Using LOOCV, our model achieved a prediction accuracy of 88.73%, MCC of 0.74, and AUC value of 0.91.

Figure 3. The confusion matrices of the top three QSAR models. The confusion matrices listed here are for (a) boosted NBM, (b) boosted MLP, and (c) boosted K* models.

Figure 3. The confusion matrices of the top three QSAR models. The confusion matrices listed here are for (a) boosted NBM, (b) boosted MLP, and (c) boosted K* models.

We were also interested in investigating if our QSAR model could identify experimentally-validated LasR inhibitors or predicted LasR inhibitors by other labs. To do so, we collected a series of inhibitors from the literature and deployed our QSAR models onto the compounds (). In some cases, certain compounds predicted as active were found to be within the applicability domain of our model but were predicted as inactive. One potential reason for this discrepancy could be due to the unbalanced dataset used for model construction [Citation97,Citation98]. This could have caused bias toward the majority class (inactive compounds) and poorly predict the minority class (active compounds). Nonetheless, the accuracy of these predictions remains uncertain without experimental validation.

Table 3. Predicted and experimentally validated LasR inhibitors classified as active by the ML model.

Next, we deployed our QSAR classification model against a virtual compound library that consisted of 5,000 compounds using the chemical structure of a potent inhibitor (N-(1 H-Benzo[d]imidazol-2-yl)-2-(4-(4-nitrophenyl)-1 H–1,2,3-triazol-1-yl)acetamide) as a query [Citation31]. To construct the virtual compound library, we screened the Enamine Advanced Library (~590,000 compounds) for compounds similar to query molecules using the ROCS and EON software by OpenEye. This yielded a compound library containing 5,000 compounds. We then deployed our QSAR model onto this virtual compound library. To this end, 215 compounds from the library were predicted as active inhibitors of LasR.

Hits selection

Multiple sequence alignment using Clustal Omega revealed that the LasR sequences from the four different strains of P. aeruginosa were conserved (Supplementary Figure S1). We further docked the 215 compounds onto the LasR receptor (PDB ID: 6MWL) to select potential top hits using the Genetic Optimization of Ligand Docking (GOLD) software. The top 20 compounds were selected and re-docked using a more extensive docking setting (100 GA runs per compound) using ChEMPLP as the scoring function (Supplementary Table ST4). The GOLD ChEMPLP was employed to perform the pose prediction and virtual screening as it has consistently performed better than all other scoring functions [Citation99]. Based on the control docking results, our docking protocol demonstrated close agreement with the crystallized pose (Supplementary Figure S2). From these top 20 compounds, we then imposed two criteria to be selected as top hits. First, the compound must fall within the applicability domain of the QSAR model. We performed this by mapping the chemical space of the training compounds and the predicted compounds onto a Principal Component Analysis plot (Supplementary Figure S3). The features used for the PCA analysis were the molecular weight, octanol-water partition coefficient, number of rotatable bonds, total polar surface area, number of hydrogen bond donors, and number of hydrogen acceptors. Second, the selected compounds must possess drug-like properties and adhere to the Lipinski’s Rule of Five (Ro5) criteria. These criteria include a molecular weight (MW) ≤ 500 Daltons, an octanol-water partition coefficient (cLogP or logP) ≤ 5, a maximum of five hydrogen bond donors (HBD), and a maximum of ten hydrogen bond acceptors (HBA). Our evaluation considered compounds that did not surpass the threshold of two violations of the Ro5 criteria [Citation68,Citation100,Citation101]. Supplementary Table ST5 provides detailed information on the drug-like properties of the selected compounds. Using these criteria, we shortlisted six compounds as potential LasR inhibitors for further analysis using Molecular Dynamics (MD) simulations ( and .

Figure 4. 2D chemical structures generated using MarvinSketch version 22.2.0. The structures of (a) OHN, native ligand, and (b) naringenin, a competitive LasR inhibitor, are compared to the top six selected compounds: (c) compound 1, (d) compound 2, (e) compound 3, (f) compound 4, (g) compound 5, and (h) compound 6.

Figure 4. 2D chemical structures generated using MarvinSketch version 22.2.0. The structures of (a) OHN, native ligand, and (b) naringenin, a competitive LasR inhibitor, are compared to the top six selected compounds: (c) compound 1, (d) compound 2, (e) compound 3, (f) compound 4, (g) compound 5, and (h) compound 6.

Table 4. The top six compounds from the Enamine database were predicted as potential LasR inhibitors.

Molecular dynamics simulations

To further the investigation, we conducted MD simulations of the apo and ligand-bound LasR to study the conformational changes of the complexes over the 200 ns period. The overall structural stability was analyzed by plotting the root-mean-square deviation (RMSD) of the complexes with respect to the apo structure. The protein RMSD for the LasR complexes bound to OHN, naringenin [Citation69], and compounds 1–4 consistently fluctuated within the range of 1.00 Å throughout the 200 ns simulations (Supplementary Figure S4). However, the LasR complexes bound to compounds 5 and 6 exhibited higher RMSD fluctuations and showed a slight continued increase at the end of the 200-ns mark.

We also plotted the ligand RMSD to study the structural variability of these ligands (). We observed that the ligands were stable within their respective binding pockets in all cases. Of particular note was that OHN exhibited higher ligand RMSD fluctuation. Visual inspection of the simulations showed that the fluctuation was due to the highly flexible aliphatic chain of the OHN. Despite these fluctuations, the ligand remained stably bound within the binding site throughout the simulation. Similar observations were seen with compounds 2, 5, and 6.

Figure 5. Ligand RMSD obtained from 200 ns MD simulations. RMSD plots comparing the (a) LasR-OHN (black) and (b) LasR-Naringenin (purple) with respect to the potential lead compounds of (c) LasR-Compound 1 (green), (d) LasR-Compound 2 (cyan), (e) LasR-Compound 3 (magenta), (F) LasR-Compound 4 (yellow), (g) LasR-Compound 5 (grey), and (h) LasR-Compound 6 (orange).

Figure 5. Ligand RMSD obtained from 200 ns MD simulations. RMSD plots comparing the (a) LasR-OHN (black) and (b) LasR-Naringenin (purple) with respect to the potential lead compounds of (c) LasR-Compound 1 (green), (d) LasR-Compound 2 (cyan), (e) LasR-Compound 3 (magenta), (F) LasR-Compound 4 (yellow), (g) LasR-Compound 5 (grey), and (h) LasR-Compound 6 (orange).

To study how the ligands might affect protein dynamics, we also analyzed the entire simulation trajectory and plotted the Root Mean Square Fluctuations (RMSF) for each complex (). In all cases except with compound 5, the binding site amino acid residues showed reduced flexibility. However, compound 5 introduced more instability to the region surrounded by amino acid residues Pro41-Asp46.

Figure 6. RMSF plots of C-alpha atoms of LasR with and without ligands. Residues marked with red asterisks* are the active site residues of the LasR receptor. RMSF plots for apo-LasR (light blue) with respect to (a) LasR-OHN (black), (b) LasR-Naringenin (purple) and the potential lead compounds in their respective colours, (c) LasR-Compound 1, (d) LasR-Compound 2, (e) LasR-Compound 3, (f) LasR-Compound 4, (g) LasR-Compound 5, and (h) LasR-Compound 6.

Figure 6. RMSF plots of C-alpha atoms of LasR with and without ligands. Residues marked with red asterisks* are the active site residues of the LasR receptor. RMSF plots for apo-LasR (light blue) with respect to (a) LasR-OHN (black), (b) LasR-Naringenin (purple) and the potential lead compounds in their respective colours, (c) LasR-Compound 1, (d) LasR-Compound 2, (e) LasR-Compound 3, (f) LasR-Compound 4, (g) LasR-Compound 5, and (h) LasR-Compound 6.

Binding free energies calculation

Furthermore, we calculated the relative binding free energies (Gtotal) of the ligands using the Molecular Mechanics – Generalized Born Solvation Area (MM-GBSA) method [Citation71]. The MM-GBSA calculations were performed using the last 10 ns of the MD simulation of each complex as it represents a stable equilibrated phase of the system and was sufficient to obtain accurate predictions [Citation72]. Our calculations showed that the Gtotal of OHN and naringenin were found to be −55.05 kcal/mol and −38.04 kcal/mol, respectively ( and Supplementary Table ST6). As for the top hits, compound 5 exhibited the most negative Gtotal (−52.97 kcal/mol). This was followed by compound 3 (−45.73 kcal/mol), compound 6 (−44.43 kcal/mol), compound 4 (−42.98 kcal/mol), compound 1 (−42.48 kcal/mol), and compound 2 (−38.18 kcal/mol). Interestingly, all compounds were better binders than naringenin but weaker than the native ligand OHN. Next, we decomposed Gtotal into its hydrophobic and electrostatic interactions to better understand the driving force behind the binding interactions for each ligand. We noted that the native ligand OHN and naringenin had almost equal hydrophobic and electrostatic contributions for their binding affinity. This contrasts with all of our compounds (except compound 4), where the hydrophobic interactions far outweighed the electrostatic components. From this, we concluded that hydrophobic interactions rather than electrostatic forces drove the binding driving force for the top hits onto LasR. This could be due to the fact that the amino acid residues of the LasR receptor ligand-binding domain were mostly hydrophobic [Citation23].

Figure 7. MM-GBSA results for different parameters of the LasR-ligand complexes.

Figure 7. MM-GBSA results for different parameters of the LasR-ligand complexes.

Free energy landscape analysis

We extended each simulation to 500 ns and mapped out the free energy landscape (FEL) to explore the protein conformational shift throughout the simulation. The width and depth of the energy minima basin in each FEL plot indicate whether a conformation state is thermodynamically-stable, semi-stable, or unstable. To plot the FELs, we concatenated the trajectories for OHN-bound, naringenin-bound, and compound-bound LasR structures. In all six FEL plots, we noted that the OHN-bound LasR structure consistently formed a large energy basin in each plot; this most likely represents the most stable conformation state of an activated OHN-bound LasR complex (). Next, we observed that naringenin-bound LasR structures adopted a different energy minimum away from the activated LasR state. We denoted this conformation as inactive since naringenin is an inhibitor of LasR. Based on the FEL plots, all compounds exhibited different energy minima than the OHN-bound LasR complex. Interestingly, the energy minima of compound 6 were similar to naringenin-bound LasR. These plots suggest these six compounds could behave as inhibitors by biasing the LasR conformation away from the activated state.

Figure 8. Free energy landscape of holo-LasR is projected to the first two principal components. Each landscape was constructed by combining three trajectories: OHN, naringenin, and compound. The energy minima are indicated with different colour-coded triangles to depict the most stable protein conformation vividly. The reference triangles are OHN (black) and naringenin (purple). Other triangles representing (a) LasR-Compound 1, (b) LasR-Compound 2, (c) LasR-Compound 3, (d) LasR-Compound 4, (e) LasR-Compound 5, and (f) LasR-Compound 6 are colour-coded in green, cyan, magenta, yellow, grey and orange, respectively.

Figure 8. Free energy landscape of holo-LasR is projected to the first two principal components. Each landscape was constructed by combining three trajectories: OHN, naringenin, and compound. The energy minima are indicated with different colour-coded triangles to depict the most stable protein conformation vividly. The reference triangles are OHN (black) and naringenin (purple). Other triangles representing (a) LasR-Compound 1, (b) LasR-Compound 2, (c) LasR-Compound 3, (d) LasR-Compound 4, (e) LasR-Compound 5, and (f) LasR-Compound 6 are colour-coded in green, cyan, magenta, yellow, grey and orange, respectively.

Hydrogen-bond interactions

As hydrogen bond interactions are important for ligand-binding, we attempted to investigate the hydrogen bond interactions between the various ligands within the LasR binding site ( and ). We took the most stable conformation (deduced from FEL) and analyzed the protein-ligand interactions. Based on the literature, Tyr56, Asp73, and Ser129 are essential in maintaining ligand stability [Citation87,Citation102]. In our study, OHN forms hydrogen bond interactions with LasR active site residues Tyr56, Trp60, Asp73, and Ser129. Meanwhile, naringenin forms hydrogen bond interactions with Leu36, Asp65, and Asp73 (Supplementary Figure S5). Several studies have alluded that compounds forming hydrogen bond interactions with Trp60 and Asp73 could serve as LasR activators. Congruent to this notion, we observed that OHN formed hydrogen bonds with Trp60 and Asp73 in our simulation. We also observed that compound 5 could form hydrogen bonds with both residues. Other than compound 5, naringenin and all other top hits could only form hydrogen bonds with Asp73 (Supplementary Figure S5-S8). We hypothesized that compound 5 could be a LasR activator while all other compounds could be LasR inhibitors. Nonetheless, further experimental studies should be conducted to validate the bioactivity of these compounds.

Figure 9. Binding interactions of the LasR active site with the potential lead compounds. The simulated pose of (a) LasR-OHN, (b) LasR-Naringenin (c) LasR-Compound 1, (d) LasR-Compound 2, (e) LasR-Compound 3, (f) LasR-Compound 4, (g) LasR-Compound 5, and (h) LasR-Compound 6 are shown interacting with residues in the binding site. The residues of the active site are shown as grey lines. The residues that form hydrogen bonds are bolded.

Figure 9. Binding interactions of the LasR active site with the potential lead compounds. The simulated pose of (a) LasR-OHN, (b) LasR-Naringenin (c) LasR-Compound 1, (d) LasR-Compound 2, (e) LasR-Compound 3, (f) LasR-Compound 4, (g) LasR-Compound 5, and (h) LasR-Compound 6 are shown interacting with residues in the binding site. The residues of the active site are shown as grey lines. The residues that form hydrogen bonds are bolded.

Table 5. Hydrogen bond interaction residues of the top hits.

In future, the incorporation of pharmacokinetic (PK) features into our model for classifying LasR inhibitors may be considered to enhance clinical translation. Undesirable PK characteristics (namely, absorption, distribution, metabolism and excretion) are known contributors to drug development failures [Citation103]. Recent advancements, such as the availability of curated PK data in the PK-DB open database [Citation104], offer opportunities for comprehensive assessment and have the potential to improve the clinical prediction capabilities of our model. While we have taken a meticulous computational approach in deriving these potential lead compounds, it remains important that these compounds be experimentally validated.

Conclusion

P. aeruginosa infection remains one of the challenging infections to treat amidst the backdrop of AMR. One way to circumvent the use antibiotics is by suppressing P. aeruginosa pathogenesis. In this work, we constructed an MLP-based QSAR model augmented with AdaBoostM1 to predict LasR inhibitors. Next, we shortlisted six top hits and performed MD simulations to assess their binding capability to LasR. To this end, MM-GBSA calculations showed that all compounds could be better binders than naringenin, a known LasR inhibitor. However, none of these six top hits has a higher binding affinity than the native activator OHN. Following FEL and hydrogen bond network analysis, we deduced that all except one of the top hits could potentially behave as LasR inhibitors. We present this work as part of attempts to discover new effective treatments for P. aeruginosa-associated infections.

Authorship contributions statement

X.C. - Conceptualization, Supervision, Writing – Review & Editing; C.K. - Investigation, Data Curation, Visualization, Writing – Original Draft, Review & Editing; L.S. - Investigation, Data Curation, Writing – Original Draft; C.H., L.B.T., H.S.S. and E.P. contributed ideas, assisted in data analysis, and participated in manuscript drafting and proofreading. All authors approved the submitted manuscript.

Supplemental material

Supplemental Material

Download Zip (6 MB)

Acknowledgments

We gratefully acknowledge the use of LasR inhibitor dataset (ChEMBL1075207) from the ChEMBL database, which was an invaluable resource for our virtual screening efforts. We are thankful to the Waikato Environment for Knowledge Analysis for their valuable free software for our data analysis and predictive modeling. We also thank Enamine Ltd. for granting us access to their Advanced Collection to support our research. We thank Open Babel (version 2.4.0) for its crucial contribution to the preparation and processing of molecular structures in our research. Additionally, we appreciate Swinburne Supercomputing OzSTAR Facility for providing access to their facilities and the AMBER MD package (version 22) for our molecular dynamics simulations. We are grateful to OpenEye Scientific Software (Sante FE, NM) for providing free access to their ROCS, EON and OMEGA software for academic use, and to Chemaxon for providing a research license to use MarvinSketch in drawing and preparing chemical structures. Finally, we would like to express our gratitude to Dr Taufiq Rahman from University of Cambridge for allowing the usage of the GOLD docking software from his laboratory.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available from the corresponding author, X.C., upon reasonable request.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/21655979.2023.2243416

Additional information

Funding

This research had no external funding.

References

  • Langendonk RF, Neill DR, Fothergill JL. The building blocks of antimicrobial resistance in Pseudomonas aeruginosa: Implications for current resistance-breaking therapies. Front Cell Infect Microbiol. 2021;11:665759. doi: 10.3389/fcimb.2021.665759
  • Hsu J. How COVID-19 is accelerating the threat of antimicrobial resistance. BMJ. 2020;369:m1983. doi: 10.1136/bmj.m1983
  • Strateva T, Yordanov D. Pseudomonas aeruginosa – a phenomenon of bacterial resistance. J Med Microbiol. 2009;58(9):1133–342. doi: 10.1099/jmm.0.009142-0
  • Hancock RE. Resistance mechanisms in Pseudomonas aeruginosa and other nonfermentative gram-negative bacteria. Clin Infect Dis. 1998;27(Suppl 1):S93–99. doi: 10.1086/514909
  • Lambert PA. Mechanisms of antibiotic resistance in Pseudomonas aeruginosa. J R Soc Med. 2002;95(Suppl 41):22–26.
  • Ding F, Oinuma K-I, Smalley NE, et al. The Pseudomonas aeruginosa orphan quorum sensing signal receptor QscR regulates global quorum sensing gene expression by activating a single linked operon. MBio. 2018;9(4): doi: 10.1128/mBio.01274-18
  • Rasmussen TB, Givskov M. Quorum sensing inhibitors: a bargain of effects. Microbiol (Reading). 2006;152(4):895–904. doi: 10.1099/mic.0.28601-0
  • Pearson JP, Gray KM, Passador L, et al. Structure of the autoinducer required for expression of Pseudomonas aeruginosa virulence genes. Proc Nat Acad Sci. 1994;91(1):197–201. doi: 10.1073/pnas.91.1.197
  • Gambello MJ, Kaye S, Iglewski BH. LasR of Pseudomonas aeruginosa is a transcriptional activator of the alkaline protease gene (apr) and an enhancer of exotoxin a expression. Infect Immun. 1993;61(4):1180–1184. doi: 10.1128/iai.61.4.1180-1184.1993
  • Elnegery AA, Mowafy WK, Zahra TA, et al. Study of quorum-sensing LasR and RhlR genes and their dependent virulence factors in Pseudomonas aeruginosa isolates from infected burn wounds. Access Microbiol. 2021;3(3):000211. doi: 10.1099/acmi.0.000211
  • Wang Y, Gao L, Rao X, et al. Characterization of lasR-deficient clinical isolates of Pseudomonas aeruginosa. Sci Rep. 2018;8(1):13344. doi: 10.1038/s41598-018-30813-y
  • Cocotl-Yañez M, Soto-Aceves MP, González-Valdez A, et al. Virulence factors regulation by the quorum-sensing and Rsm systems in the marine strain Pseudomonas aeruginosa ID4365, a natural mutant in lasR. FEMS Microbiol Lett. 2020;367(12). doi: 10.1093/femsle/fnaa092
  • Kalia VC, Purohit HJ. Quenching the quorum sensing system: potential antibacterial drug targets. Crit Rev Microbiol. 2011;37(2):121–140. doi: 10.3109/1040841X.2010.532479
  • Williams P, Cámara M. Quorum sensing and environmental adaptation in Pseudomonas aeruginosa: a tale of regulatory networks and multifunctional signal molecules. Curr Opin Microbiol. 2009;12(2):182–191. doi: 10.1016/j.mib.2009.01.005
  • Kim K, Kim YU, Koh BH, et al. HHQ and PQS, two Pseudomonas aeruginosa quorum‐sensing molecules, down‐regulate the innate immune responses through the nuclear factor‐κB pathway. Immunology. 2010;129(4):578–588. doi: 10.1111/j.1365-2567.2009.03160.x
  • Müsken M, Di Fiore S, Dötsch A, et al. Genetic determinants of Pseudomonas aeruginosa biofilm establishment. Microbiology. 2010;156(2):431–441. doi: 10.1099/mic.0.033290-0
  • Lesic B, Lépine F, Déziel E, et al. Inhibitors of pathogen intercellular signals as selective anti-infective compounds. PLOS Pathogens. 2007;3(9):e126. doi: 10.1371/journal.ppat.0030126
  • Komor U, Bielecki P, Loessner H, et al. Biofilm formation by Pseudomonas aeruginosa in solid murine tumors–a novel model system. Microbes Infect. 2012;14(11):951–958. doi: 10.1016/j.micinf.2012.04.002
  • Rampioni G, Falcone M, Heeb S, et al. Unravelling the genome-wide contributions of specific 2-Alkyl-4-quinolones and PqsE to quorum sensing in Pseudomonas aeruginosa. PLOS Pathogens. 2016;12(11):e1006029. doi: 10.1371/journal.ppat.1006029
  • O’Brien KT, Noto JG, Nichols-O’Neill L, et al. Potent irreversible inhibitors of LasR quorum sensing in Pseudomonas aeruginosa. ACS Med Chem Lett. 2015;6(2):162–167. doi: 10.1021/ml500459f
  • Rasko DA, Sperandio V. Anti-virulence strategies to combat bacteria-mediated disease. Nat Rev Drug Discov. 2010;9(2):117–128. doi: 10.1038/nrd3013
  • Bottomley MJ, Muraglia E, Bazzo R, et al. Molecular insights into quorum sensing in the human pathogen Pseudomonas aeruginosa from the structure of the virulence regulator LasR bound to its autoinducer. J Biol Chem. 2007;282(18):13592–13600. doi: 10.1074/jbc.M700556200
  • Zou Y, Nair SK. Molecular basis for the recognition of structurally distinct autoinducer mimics by the Pseudomonas aeruginosa LasR quorum-sensing signaling receptor. Chem Biol. 2009;16(9):961–970. doi: 10.1016/j.chembiol.2009.09.001
  • Kiratisin P, Tucker KD, Passador L. LasR, a transcriptional activator of Pseudomonas aeruginosa virulence genes, functions as a multimer. J Bacteriol. 2002;184(17):4912–4919. doi: 10.1128/jb.184.17.4912-4919.2002
  • Schuster M, Urbanowski ML, Greenberg EP. Promoter specificity in Pseudomonas aeruginosa quorum sensing revealed by DNA binding of purified LasR. Proc Natl Acad Sci U S A. 2004;101(45):15833–15839. doi: 10.1073/pnas.0407229101
  • O’Reilly MC, Dong S-H, Rossi FM, et al. Structural and biochemical studies of non-native agonists of the LasR quorum-sensing receptor reveal an L3 loop “out” conformation for LasR. Cell Chem Biol. 2018;25(9):1128–1139.e1123. doi: 10.1016/j.chembiol.2018.06.007
  • Shah M, Taylor VL, Bona D, et al. A phage-encoded anti-activator inhibits quorum sensing in Pseudomonas aeruginosa. Mol Cell. 2021;81(3):571–583.e576. doi: 10.1016/j.molcel.2020.12.011
  • Paczkowski JE, McCready AR, Cong J-P, et al. An autoinducer analogue reveals an alternative mode of ligand binding for the LasR quorum-sensing receptor. ACS Chem Biol. 2019;14(3):378–389. doi: 10.1021/acschembio.8b00971
  • Gerdt JP, McInnis CE, Schell TL, et al. Mutational analysis of the quorum-sensing receptor LasR reveals interactions that govern activation and inhibition by nonlactone ligands. Chem Biol. 2014;21(10):1361–1369. doi: 10.1016/j.chembiol.2014.08.008
  • Stacy DM, Le Quement ST, Hansen CL, et al. Synthesis and biological evaluation of triazole-containing N-acyl homoserine lactones as quorum sensing modulators. Org Biomol Chem. 2013;11(6):938–954. doi: 10.1039/C2OB27155A
  • Srinivasarao S, Nandikolla A, Nizalapur S, et al. Design, synthesis and biological evaluation of 1,2,3-triazole based 2-aminobenzimidazoles as novel inhibitors of LasR dependent quorum sensing in Pseudomonas aeruginosa. RSC Adv. 2019;9(50):29273–29292. doi: 10.1039/C9RA05059K
  • Srinivasarao S, Nizalapur S, Yu TT, et al. Design, synthesis and biological evaluation of triazole-containing 2-phenylindole and salicylic acid as quorum sensing inhibitors against Pseudomonas aeruginosa. ChemistrySelect. 2018;3(32):9170–9180. doi: 10.1002/slct.201801622
  • Jiang Q, Chen J, Yang C, et al. Quorum sensing: A prospective therapeutic target for bacterial diseases. Biomed Res Int. 2019;2019:2015978. doi: 10.1155/2019/2015978
  • Ozer EA, Pezzulo A, Shih DM, et al. Human and murine paraoxonase 1 are host modulators of Pseudomonas aeruginosa quorum-sensing. FEMS Microbiol Lett. 2005;253(1):29–37. doi: 10.1016/j.femsle.2005.09.023
  • Yang F, Wang L-H, Wang J, et al. Quorum quenching enzyme activity is widely conserved in the sera of mammalian species. FEBS Lett. 2005;579(17):3713–3717. doi: 10.1016/j.febslet.2005.05.060
  • Cady NC, McKean KA, Behnke J, et al. Inhibition of biofilm formation, quorum sensing and infection in Pseudomonas aeruginosa by natural products-inspired organosulfur compounds. Plos One. 2012;7(6):e38492. doi: 10.1371/journal.pone.0038492
  • Bortolotti D, Trapella C, Bragonzi A, et al. Conjugation of LasR quorum-sensing inhibitors with ciprofloxacin decreases the antibiotic tolerance of P. aeruginosa clinical strains. J Chem. 2019;2019:1–13. DOI:10.1155/2019/8143739
  • Abbas H, Shaldam M, Eldamasi D. Curtailing quorum sensing in Pseudomonas aeruginosa by sitagliptin. Curr Microbiol. 2020;77(6):1051–1060. doi: 10.1007/s00284-020-01909-4
  • Hançer Aydemir D, Çifci G, Aviyente V, et al. Quorum-sensing inhibitor potential of trans-anethole against Pseudomonas aeruginosa. J Appl Microbiol. 2018;125(3):731–739. doi: 10.1111/jam.13892
  • Aldewachi H, Al-Zidan RN, Conner MT, et al. High-throughput screening platforms in the discovery of novel drugs for neurodegenerative diseases. Bioeng (Basel). 2021;8(2):30. doi: 10.3390/bioengineering8020030
  • Waring MJ, Arrowsmith J, Leach AR, et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discov. 2015;14(7):475–486. doi: 10.1038/nrd4609
  • OpenEye Scientific Software. OMEGA 4.1.2. 2021. Available from: https://docs.eyesopen.com/applications/omega/releasenotes/version4_1_2.html
  • Reymond J-L. The chemical space project. Acc Chem Res. 2015;48(3):722–730. doi: 10.1021/ar500432k
  • Yeung AWK, Atanasov AG, Sheridan H, et al. Open innovation in medical and pharmaceutical research: A literature landscape analysis. Front Pharmacol. 2021;11. doi: 10.3389/fphar.2020.587526
  • Erlanson DA, Fesik SW, Hubbard RE, et al. Twenty years on: the impact of fragments on drug discovery. Nat Rev Drug Discov. 2016;15(9):605–619. doi: 10.1038/nrd.2016.109
  • Doak BC, Norton RS, Scanlon MJ. The ways and means of fragment-based drug design. Pharmacol Ther. 2016;167:28–37. doi:10.1016/j.pharmthera.2016.07.003
  • Rees S. The promise of open innovation in drug discovery: an industry perspective. Future Med Chem. 2015;7(14):1835–1838. doi: 10.4155/fmc.15.125
  • Keserű GM, Erlanson DA, Ferenczy GG, et al. Design principles for fragment libraries: maximizing the value of learnings from pharma Fragment-Based Drug Discovery (FBDD) programs for use inacademia. J Med Chem. 2016;59(18):8189–8206. doi: 10.1021/acs.jmedchem.6b00197
  • Vivona R, Demircioglu MA, Audretsch DB. The costs of collaborative innovation. J Technol Transf. 2022;48(3):873–899. doi: 10.1007/s10961-022-09933-1
  • Drăgulescu B, Bucos M. Hyperparameter Tuning Using Automated Methods to Improve Models for Predicting Student Success. In: Lopata A, Butkienė R, Gudonienė D, and Sukackė V, editors. Information and software technologies. Vol. 1283.; Springer Nature Switzerland AG: Springer International Publishing; 2020. p. 309–320.
  • Wu Z, Zhu M, Kang Y, et al. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief Bioinform. 2020;22(4): doi: 10.1093/bib/bbaa321
  • Zhao Y, Yin X, Fu Y, et al. A comparative mapping of plant species diversity using ensemble learning algorithms combined with high accuracy surface modelling. Environ Sci Pollut Res. 2022;29(12):17878–17891. doi: 10.1007/s11356-021-16973-x
  • Gerdt JP, Blackwell HE. Competition studies confirm two major barriers that can preclude the spread of resistance to quorum-sensing inhibitors in bacteria. ACS Chem Biol. 2014;9(10):2291–2299. doi: 10.1021/cb5004288
  • Mellbye B, Schuster M. The sociomicrobiology of antivirulence drug resistance: a proof of concept. MBio. 2011;2(5):e00131–00111. doi: 10.1128/mBio.00131-11
  • Allen RC, Popat R, Diggle SP, et al. Targeting virulence: can we make evolution-proof drugs? Nature Rev Microbiol. 2014;12(4):300–308. doi: 10.1038/nrmicro3232
  • Clatworthy AE, Pierson E, Hung DT. Targeting virulence: a new paradigm for antimicrobial therapy. Nat Chem Biol. 2007;3(9):541–548. doi: 10.1038/nchembio.2007.24
  • Hall M, Frank E, Holmes G, et al. The WEKA data mining software. ACM SIGKDD Explor Newsl. 2009;11(1):10–18. doi: 10.1145/1656274.1656278
  • Chauhan H, Kumar V, Pundir S, et al. Comparative analysis and research Issues in classification techniques for intrusion detection. Intelligent Computing, Networking, and Informatics. Springer. 2014;675–685. doi: 10.1007/978-81-322-1665-0_68
  • Enamine. Advanced Collection. 2021. Available from: https://enamine.net/compound-collections/screening-collection/advanced-collection
  • OpenEye Scientific Software. ROCS 3.4.3. 2021. Available from: https://docs.eyesopen.com/applications/rocs/releasenotes/version3_4_3.html
  • OpenEye Scientific Software. EON 2.3.6. 2021. Available from: https://docs.eyesopen.com/applications/eon/releasenotes/version2_3_6.html
  • The Brinkman Lab. Pseudomonas aeruginosa PA7, PSPA7_3898 (lasR). 2022. Available from: https://www.pseudomonas.com/feature/show/?id=1670586&view=sequence
  • The Brinkman Lab. Pseudomonas aeruginosa PAO1, PA1430 (lasR). 2022. Available from: https://www.pseudomonas.com/feature/show/?id=105628&view=sequence
  • The Brinkman Lab. Pseudomonas aeruginosa UCBPP-PA14, PA14_45960 (lasR). 2022. Available from: https://www.pseudomonas.com/feature/show/?id=1658248&view=sequence
  • The Brinkman Lab. Pseudomonas aeruginosa PAK, PAKAF_RS18560 (lasR). 2022. Available from: https://www.pseudomonas.com/feature/show/?id=237052075&view=sequence
  • Winsor GL, Griffiths EJ, Lo R, et al. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database. Nucleic Acids Res. 2016;44(D1):D646–653. doi: 10.1093/nar/gkv1227
  • González-Medina M, Medina-Franco JL. Platform for Unified Molecular Analysis: PUMA. J Chem Inf Model. 2017;57(8):1735–1740. doi: 10.1021/acs.jcim.7b00253
  • Hughes J, Rees S, Kalindjian S, et al. Principles of early drug discovery. Br J Pharmacol. 2011;162(6):1239–1249. doi: 10.1111/j.1476-5381.2010.01127.x
  • Hernando-Amado S, Alcalde-Rico M, Gil-Gil T, et al. Naringenin inhibition of the Pseudomonas aeruginosa quorum sensing response is based on its time-dependent competition with N-(3-Oxo-dodecanoyl)-L-homoserine lactone for LasR binding. Front Mol Biosci. 2020;7:25. DOI:10.3389/fmolb.2020.00025
  • Pearlman DA, Case DA, Caldwell JW, et al. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput Phys Commun. 1995;91(1–3):1–41. doi: 10.1016/0010-4655(95)00041-D
  • Miller BR, McGee TD, Swails JM, et al. Mmpbsa.Py: an efficient program for end-state free energy calculations. J Chem Theory Comput. 2012;8(9):3314–3321. doi: 10.1021/ct300418h
  • Kumari A, Mittal L, Srivastava M, et al. Binding mode characterization of 13b in the monomeric and dimeric states of SARS-CoV-2 main protease using molecular dynamics simulations. J Biomol Struct Dynamics. 2021;40(19):1–19. doi: 10.1080/07391102.2021.1927844
  • Mittal L, Srivastava M, Kumari A, et al. Interplay among structural stability, plasticity, and energetics determined by conformational attuning of flexible loops in PD-1. J Chem Inf Model. 2021;61(1):358–384. doi: 10.1021/acs.jcim.0c01080
  • Singh M, Srivastava M, Wakode SR, et al. Elucidation of structural determinants delineates the residues playing key roles in differential dynamics and selective inhibition of Sirt1–3. J Chem Inf Model. 2021;61(3):1105–1124. doi: 10.1021/acs.jcim.0c01193
  • Jeni LA, Cohn JF, De La Torre F Facing imbalanced data recommendations for the use of performance metrics. International Conference on Computational Intell Interact Workshops 2013, 245–251, doi:10.1109/acii.2013.47.
  • Thabtah F, Hammoud S, Kamalov F, et al. Data imbalance in classification: experimental evaluation. Inf Sci. 2019;513:429–441. doi: 10.1016/j.ins.2019.11.004
  • Hakim Abdul Hamid M, Yusoff M, Mohamed A. Survey on highly imbalanced multi-class data. Int JAdv Comput Sci Appl (IJACSA). 2022;13(6). doi: 10.14569/IJACSA.2022.0130627
  • Qiao Z, Li L, Li S, et al. Molecular fingerprint and machine learning to accelerate design of high‐performance homochiral metal–organic frameworks. AIChE J. 2021;67(10):e17352. doi: 10.1002/aic.17352
  • Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. doi: 10.1148/radiology.143.1.7063747
  • Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta (BBA) - Protein Struct. 1975;405:442–451. doi: 10.1016/0005-2795(75)90109-9
  • Fourches D, Muratov E, Tropsha A. Curation of chemogenomics data. Nat Chem Biol. 2015;11(8):535–535. doi: 10.1038/nchembio.1881
  • Wong T, Yeh P. Reliable accuracy estimates from k-fold cross validation. IEEE Trans Knowledge Data Eng. 2020;32(8):1586–1594. doi: 10.1109/TKDE.2019.2912815
  • Afolabi LT, Saeed F, Hashim H, et al. Ensemble learning method for the prediction of new bioactive molecules. Plos One. 2018;13(1):e0189538. doi: 10.1371/journal.pone.0189538
  • Yu T-H, Su B-H, Battalora LC, et al. Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power. Brief Bioinform. 2021;23(1). doi: 10.1093/bib/bbab377
  • Sarkar C, Das B, Rawat VS, et al. Artificial intelligence and machine learning technology driven modern drug discovery and development. Int J Mol Sci. 2023;24(3):2026. doi: 10.3390/ijms24032026
  • Geroldinger A, Lusa L, Nold M, et al. Leave-one-out cross-validation, penalization, and differential bias of some prediction model performance measures—a simulation study. Diagn Progn Res. 2023;7(1):9. doi: 10.1186/s41512-023-00146-0
  • Sadiq S, Rana NF, Zahid MA, et al. Virtual screening of FDA-approved drugs against LasR of Pseudomonas aeruginosa for antibiofilm potential. Molecules. 2020;25(16):3723. doi: 10.3390/molecules25163723
  • Vetrivel A, Ramasamy J, Natchimuthu S, et al. Combined machine learning and pharmacophore based virtual screening approaches to screen for antibiofilm inhibitors targeting LasR of Pseudomonas aeruginosa. J Biomol Struct Dynamics. 2022;41(9):1–19. doi: 10.1080/07391102.2022.2064331
  • Kalia M, Singh PK, Yadav VK, et al. Structure based virtual screening for identification of potential quorum sensing inhibitors against LasR master regulator in Pseudomonas aeruginosa. Microb Pathog. 2017;107:136–143. DOI:10.1016/j.micpath.2017.03.026
  • Yang L, Rybtke MT, Jakobsen TH, et al. Computer-aided identification of recognized drugs as Pseudomonas aeruginosa quorum-sensing inhibitors. Antimicrob Agents Chemother. 2009;53(6):2432–2443. doi: 10.1128/AAC.01283-08
  • Zhong L, Ravichandran V, Zhang N, et al. Attenuation of Pseudomonas aeruginosa quorum sensing by natural products: Virtual screening, evaluation and biomolecular interactions. Int J Mol Sci. 2020;21(6):2190. doi: 10.3390/ijms21062190
  • Shukla A, Shukla G, Parmar P, et al. Exemplifying the next generation of antibiotic susceptibility intensifiers of phytochemicals by LasR-mediated quorum sensing inhibition. Sci Rep. 2021;11(1):22421. doi: 10.1038/s41598-021-01845-8
  • Vadakkan K, Hemapriya J, Anbarasu A, et al. Quorum quenching by 2-Hydroxyanisole extracted from Solanum torvum on Pseudomonas aeruginosa and its inhibitory action upon LasR protein. Gene Rep. 2020;21:100802. doi: 10.1016/j.genrep.2020.100802
  • Luo J, Dong B, Wang K, et al. Baicalin inhibits biofilm formation, attenuates the quorum sensing-controlled virulence and enhances Pseudomonas aeruginosa clearance in a mouse peritoneal implant infection model. Plos One. 2017;12(4):e0176883. doi: 10.1371/journal.pone.0176883
  • Almalki AJ, Ibrahim TS, Elhady SS, et al. Computational and biological evaluation of β-adrenoreceptor blockers as promising bacterial anti-virulence agents. Pharmaceuticals (Basel). 2022;15(2):110. doi: 10.3390/ph15020110
  • Elfaky MA, Elbaramawi SS, Eissa AG, et al. Drug repositioning: doxazosin attenuates the virulence factors and biofilm formation in Gram-negative bacteria. Appl Microbiol Biotechnol. 2023;107(11):3763–3778. doi: 10.1007/s00253-023-12522-3
  • Kaur H, Pannu HS, Malhi AK. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput Surveys. 2019;52(4):1–36. doi: 10.1145/3343440
  • Ganganwar V. An overview of classification algorithms for imbalanced datasets. Int J Emerging Technol Adv Eng. 2012;2:42–47.
  • Sapundzhi F, Prodanova K, Lazarova M Survey of the scoring functions for protein-ligand docking. AIP Conference Proceedings 2172, 100008, doi:10.1063/1.5133601 (2019).
  • Pascolutti M, Quinn RJ. Natural products as lead structures: chemical transformations to create lead-like libraries. Drug Discovery Today. 2014;19:215–221. doi: 10.1016/j.drudis.2013.10.013
  • Teague SJ, Davis AM, Leeson PD, et al. The design of lead-like combinatorial libraries. Angew Chem Int Ed Engl. 1999;38:3743–3748. doi: 10.1002/(sici)1521-3773(19991216)38:24<3743:Aid-anie3743>3.0.Co;2-u
  • Magalhães RP, Vieira TF, Melo A, et al. Identification of novel candidates for inhibition of LasR, a quorum-sensing receptor of multidrug resistant Pseudomonas aeruginosa, through a specialized multi-level in silico approach. Mol Syst Des Eng. 2022;7(5):434–446. doi: 10.1039/D2ME00009A
  • Pereira LC, Fátima MAD, Santos VV, et al. Pharmacokinetic/Pharmacodynamic modeling and application in antibacterial and antifungal pharmacotherapy: A narrative review. Antibiotics. 2022;11(8):986. doi: 10.3390/antibiotics11080986
  • Grzegorzewski J, Brandhorst J, Green K, et al. PK-DB: pharmacokinetics database for individualized and stratified computational modeling. Nucleic Acids Res. 2021;49(D1):D1358–d1364. doi: 10.1093/nar/gkaa990