Full article: Foundations for improved vaccine correlate of risk analysis using positive-unlabeled learning

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Insights into mechanisms of protection afforded by vaccine efficacy field trials can be complicated by both low rates of exposure and protection. However, these barriers do not preclude the discovery of correlates of reduced risk (CoR) of infection, which are a critical first step in defining correlates of protection (CoP). Given the significant investment in large-scale human vaccine efficacy trials and immunogenicity data collected to support CoR discovery, novel approaches for analyzing efficacy trials to optimally support discovery of CoP are critically needed. By simulating immunological data and evaluating several machine learning approaches, this study lays the groundwork for deploying Positive/Unlabeled (P/U) learning methods, which are designed to differentiate between two groups in cases where only one group has a definitive label and the other remains ambiguous. This description applies to case–control analysis designs for field trials of vaccine efficacy: infected subjects, or cases, are by definition unprotected, whereas uninfected subjects, or controls, may have been either protected or unprotected but simply never exposed. Here, we investigate the value of applying P/U learning to classify study subjects using model immunogenicity data based on predicted protection status in order to support new insights into mechanisms of vaccine-mediated protection from infection. We demonstrate that P/U learning methods can reliably infer protection status, supporting the discovery of simulated CoP that are not observed in conventional comparisons of infection status cases and controls, and we propose next steps necessary for the practical deployment of this novel approach to correlate discovery.

KEYWORDS:

Introduction

The fundamental goal of late-stage vaccine clinical field trials is to determine efficacy. Such studies expand safety data to a broader population in support of widespread use. Beyond this primary motivation, for effective vaccines, they also offer the potential to derive biological insights into potential mechanisms of protection based on the identification of correlates of infection risk (CoR). Whether directly responsible for protection or reliable surrogates for mechanisms that were not directly assessed, immune markers statistically associated with infection status endpoints provide key signposts in further research and development, and can also suggest novel direct interventions.^{Citation1,Citation2} However, the ability to confidently and completely identify CoR depends upon many factors, including the level of protection observed, the magnitude of differences in immunogenicity data between infected and uninfected subjects, and study power. Study power further depends on factors such as study group sample size and the dynamic range of immune response markers.

HIV-1 vaccine efficacy trials in particular have been complicated by low rates of exposure and low levels of protection.^Citation3 In spite of these barriers, several CoR have been defined using case–control study designs, in which the profiles of infected (case) and uninfected (control) subjects are compared.^{Citation4,Citation5} However, as opposed to controlled human challenge studies, exposure and infection are not strictly associated in field trials. Subjects who are infected have definitively been exposed to the virus, while subjects who are uninfected may have been either exposed and protected or simply never exposed. Thus, the “control” class in case–control analysis is split between individuals who lack protective immunity and were not exposed to the virus and individuals who have protective immunity and either were or were not exposed to the virus. The resulting dilution of protected subjects with unprotected but unexposed subjects reduces the likelihood of identifying CoR. The ability to robustly infer the protection status of uninfected individuals would thus improve the ability to identify CoR and fill critical gaps in knowledge regarding immune responses that contribute to protection. Such inferences could provide substantial improvements in CoR discovery for trials performed with low efficacy and/or low rates of pathogen exposure.

Machine learning (ML) methods are statistical methods that seek to identify meaningful patterns in data.^{Citation6,Citation7} These approaches offer the potential to identify groups of subjects across which relevant immunological differences are detected and have been employed previously in CoR analysis studies. Examples of these applications span various infectious diseases, input data types, and experimental designs.^{Citation8–19} Case–control experimental designs seek to compare the response profiles for individual variables between case (infected) and control (uninfected) vaccine recipients. Such comparisons, however, are not well-suited to “wide” data, which is comprised of many immunogenicity parameters measured across relatively few subjects,^{Citation20–23} and instead typically rely on pre-selection of a small number of candidate measures selected to maintain power.^Citation24 ML methods, with strong analytical negative controls such as cross-validation and permutation, can provide a sound approach to identify CoR in wide data: these approaches have identified consistent HIV vaccine CoR in animal model studies and human vaccine efficacy trials evaluating diverse vaccine regimens.^{Citation9,Citation10,Citation25}

ML methods can be broadly classified into two categories: unsupervised and supervised. Unsupervised ML methods do not use class labels (e.g. “infected” and “uninfected”) to separate data. Instead, these methods rely on the internal structure of the data to identify subjects with shared response attributes. Supervised ML methods, however, make use of known class labels to identify characteristics that discriminate between classes.^Citation26

In typical analysis of case–control studies, infection status is used as a proxy for protection status when performing CoR analysis. As outlined previously, however, being uninfected does not map one-to-one with being protected. Supervised methods work best when they are given data that is correctly and completely labeled, but this imperfect mapping of infection status to protection status leads to sub-optimal subject classification and poses a limit to CoR identification. In contrast, unsupervised methods are free to classify the data without being influenced by these imperfect infection status labels, but in the process they ignore what we do know about infected subjects: that they are unprotected.^Citation27 Neither class of ML methods takes full advantage of the available information.

These limitations warrant a different approach, one that takes more complete advantage of the available data. To offer an analytical alternative to the discovery of CoR, our goal becomes to define a way to reclassify subjects as “was/would have been protected” and “was/would have been unprotected,” if all subjects had been exposed.

A potential solution to this problem is semi-supervised machine learning (SSL), which uses a limited number of labeled samples as a starting point for building a classifier, and then attempts to fit the remaining data.^Citation28 SSL approaches thus resemble supervised ones initially, with a few examples of labeled data at the outset, before being turned loose, unsupervised, to “figure out” the rest of the samples. Most SSL methods require examples from both classes, which in the context of vaccine efficacy field trial data would mean examples of both protected and unprotected subjects. Unfortunately, these protection status labels, which would best enable correlate discovery, are precisely what vaccine efficacy field trial data lack.

A sub-class of SSL methods that is well-suited for this task is Positive/Unlabeled (P/U) Learning. These methods are trained using data for a single class (the “positive” data) and then used to classify the remaining data (the “unlabeled” data).^{Citation29,Citation30} In contrast to traditional SSL methods, P/U methods are centered on learning what one class “looks like.” These approaches directly map to the infection status label problem encountered in efficacy trials: one set of subjects are known to have been unprotected (positive/infected), and the uninfected group is presumed to be comprised of a mixture of protected and unprotected.

Here, using a simplified model data set, we explore the ability of positive unlabeled ML methods to contribute to CoR discovery.

Methodology

Data simulation

Immunogenicity data for two classes, unprotected and protected subjects, were simulated. Unprotected class subject data was generated by simulating an initial multivariate normal dataset of three independent variables and 5n (n = 1000) data points with standard deviation sd. Excess (5×) data points were simulated to ensure a large available pool of data points from which data points could be sampled with reduced likelihood of overlap. These variables serve as modeled immunogenicity “features.” Assuming that vaccine-induced responses would be greater for each of the response features among protected subjects, protected class subject data was generated by simulating a multivariate normal dataset of three independent variables and 5n data points such that the centroid of the protected data set lay at a distance d from the centroid of the unprotected dataset in each feature dimension. The value of distance d was varied from 0 to 5 sd in order to consider different effect sizes in terms of immunogenicity profile differences. Thus, each of the three simulated immunogenicity “feature” values correlated with protection status.

By definition, the infected subject class is unprotected. Hence, infected subject data was generated by sampling without replacement from the unprotected dataset for a total of n data points. Uninfected subject class data were generated by sampling without replacement from the unprotected and protected datasets for a total of n data points. Multiple such datasets were generated by sampling different proportions from the unprotected and protected datasets. This proportion represents the simulated efficacy (e) of the vaccine and was varied from 0.05 to 0.5. The uninfected data set thus contained n(e) number of points from the protected dataset and n(1-e) points from the unprotected dataset. For example, an efficacy of 0.3 resulted in 30% of the uninfected subject data being sampled from the protected subject group, with the remaining 70% of the n data points sampled from the unprotected subject group. Conditions resembling the efficacy of a vaccine above 50% were not considered because at such higher efficacy values, most of the data points from the uninfected population would belong to the uninfected protected dataset, and the value of excluding the uninfected unprotected subjects to support identification of correlates of risk is expected to be limited. This data generation process was repeated across 10 different random seeds.

To investigate the effect of sample size on correlate discovery, the ability of machine learning methods to define potential correlates of protection for smaller datasets was tested by simulating datasets of smaller sample size (n = 500, 250, 50). These lower bounds represent numbers of infection cases similar to those reported in HIV vaccine efficacy trials

Consistent with our goal to explore initial proof of concept, we have not considered analysis of a placebo group, longitudinal immunogenicity data, information regarding time of infection, impacts of censoring, or adjustments of individual subject risk based on clinical or other co-variates. The modeled data sets were complete and did not incorporate imputation or missing values or exclusion of subjects. Additionally, protection status was treated as a binary classification, as is risk, rather than the continuous variables they are expected to be in reality.

Training and testing using different classification methods

Training and testing for inference accuracy was performed using feature 1 and feature 2 data. The third feature was held out to support evaluation of the ability of protection status inferences to support CoP discovery. This choice was made to avoid bias associated with evaluating the model based on data that was used for classification method training.

K-means

The K-means clustering method was used to identify two clusters using feature 1 and feature 2 data of uninfected and infected subjects without any group labels (Supplemental Figure S1A). The R package ‘stats’ was used to carry out k-means clustering.^Citation31 The third feature was held out for model evaluation. Because K-means clustering is an unsupervised ML method, it cannot independently identify which of the two clusters is the protected cluster and unprotected cluster, leading to two possibilities when labeling the clusters. The final class labels were based on which classification led to higher model accuracy, as defined by the fraction of correct predictions ( $\frac{TP + TN}{(TP + TN + FP + FN)}$ ). After clustering was completed, inter-cluster variance (SSB) and intra-cluster variance (SSW) were calculated. The goodness of fit of the model was evaluated ( $\frac{SSB}{SSB + SSW}$ ). Based on how the K-means algorithm works, in this case, the algorithm would classify the input data points into two clusters. Hence, as a control, goodness of fit of the model was evaluated when distributions of the same class were used as an input. If the calculated goodness of fit for the tested model did not exceed the goodness-of-fit value of this control condition, the model’s cluster assignments were not accepted. In these cases, it was determined that the model did not characterize the data sufficiently better than a single cluster.

Support Vector Machine (SVM)

An SVM classifier was trained using features 1 and 2 with the protection status labels so as to serve as a positive control. Given the nature of the data and its progression along the XYZ diagonal, the SVM was trained using a linear kernel (Supplemental Figure S1A). The R package ‘e1071’ was used to construct the SVM,^Citation32 which was optimized for class separation by maximizing the margin between the classes’ closest points using C-type classification to set the hyperplane. To reduce the potential of overfitting while balancing the need to ensure a sufficient number of protected subjects were included among the uninfected, repeated 3-fold cross validation was performed, and the mean values across test set folds were considered as the values for the particular evaluation parameter at given seed, distance and efficacy value across replicates.

Reliable Negative (RN) SVM

Infected subjects were considered to be the positive class and uninfected subjects were considered unlabeled. It was assumed that the unlabeled (uninfected) subjects most distinct from the positive (infected) class belonged to the protected class and thus represented “reliable negatives.” The number of reliable negatives identified was half the number of the protected subjects in the unlabeled dataset. These reliable negatives were identified based on the Euclidean distance observed in feature 1 and feature 2 values. This number was set to balance the goal of maintaining a high likelihood of identifying reliable negatives belonging to the protected data set while providing sufficient data points to train the SVM. The positive (unprotected) class and the identified reliable negative (protected) class together formed the training set. The SVM was trained using feature 1 and feature 2 data of the training set, again using a linear kernel (Supplemental Figure S1B).

Model evaluation

The accuracy of each model described above was evaluated in terms of Matthews’ Correlation Coefficient (MCC). Predicted protection status labels were compared with actual protection status labels to identify true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) (Supplemental Table S1). MCC was calculated using the formula $\frac{[(TP * TN) - (FP * FN)]}{sqrt [(TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)]}$ , as defined by using the package ‘caret.’^Citation33 Negative predictive value i.e. protection predictive values in this case ( $\frac{TN}{TN + FN}$ ), specificity ( $\frac{TN}{TN + FP})$ , sensitivity ( $\frac{TP}{TP + FN})$ were also calculated.^Citation34 Reported values represent averages across replicates.

Correlate discovery

The ability of infection status labels, actual protection status labels, and the inferred protection status predictions to define correlates of protection was evaluated for feature 3 using a two-sided Wilcoxon-Mann-Whitney U-test with the null hypothesis that the two groups being compared were identical. The R package ‘stats’ was used to perform the Wilcoxon-Mann-Whitney U-test.^Citation31 Cliff’s delta values for feature 3 were likewise calculated to determine the effect size of group differences. The reported values represent averages across replicates. The R package ‘effsize’ was used to calculate the Cliff’s delta.^Citation35 To define the robustness of the RN-SVM model in terms of risk of false discovery, Cliff’s delta was also calculated after the values of feature 3 were scrambled.

Results

Modeling immunogenicity data for variably effective and immunogenic vaccines

Given the nature of human vaccine efficacy field trials, exposure and infection are not strictly associated (). Case–control study experimental designs aim to compare features of subjects with the disease or condition under study (cases) and a similar group of people who do not (controls). In vaccine efficacy field trials, case–control designs typically compare immune response features of infected vaccine recipients with those observed among uninfected vaccine recipients (). However, unless all vaccine recipients are exposed or the vaccine is 100% effective, the uninfected class is composed of subjects that exhibit a protective response (UP-uninfected protected) and those that do not (UU-uninfected unprotected) but remained uninfected because they were not exposed (). The ability to resolve CoR depends upon both the level of protection observed (efficacy) and the magnitude of differences in immunogenicity data between protected and unprotected subjects (effect size/distance) ().

Figure 1. The labeling problem in correlates analysis. (a) In field trials of vaccine efficacy with pathogens that are not universally prevalent, the infected study group is comprised of vaccine recipients who were definitively exposed to the virus, while the uninfected study group consists of both vaccine recipients who were and others who were not exposed to the virus. (b) The ideal information needed to support discovery of Correlates of Protection (CoP) is the true protection status of exposed individuals. (c) Case-control analysis based on infection status defines Correlates of Risk (CoR) based on comparison of immune response features of infected vaccine recipients (gray) with those observed among uninfected vaccine recipients (purple). However, unless all vaccine recipients were exposed and/or the vaccine is completely effective, the uninfected class is comprised of subjects that exhibit a protective response (red, UP-uninfected protected) and those that do not (blue, UU-uninfected unprotected), but remained uninfected because they were not exposed. While CoR discovery power is improved by increasing the sample size of the uninfected class, it remains reduced by the inclusion of UU subjects.

Figure 2. Graphical illustration of simulated data. (a) The ability to resolve CoR depend critically on the overall efficacy (left) of the vaccine, which is the fraction of vaccine recipients that mounted a protective response, and the effect size (right), which is the magnitude and spread of response distributions between protected (red) and unprotected (blue) participants, visualized for two exemplary variables. (b) To model immunogenicity data for differently distinct and effective vaccines, two populations (U and P) were assigned three normally distributed, uncorrelated immune response parameters (μ) that varied in distance (d) from each other defined in terms of standard deviations from the mean (SD). Parameter space ranged from distances of 0.0–5.0 SD and efficacy (e) of 5–50%.

$Figure 2. Graphical illustration of simulated data. (a) The ability to resolve CoR depend critically on the overall efficacy (left) of the vaccine, which is the fraction of vaccine recipients that mounted a protective response, and the effect size (right), which is the magnitude and spread of response distributions between protected (red) and unprotected (blue) participants, visualized for two exemplary variables. (b) To model immunogenicity data for differently distinct and effective vaccines, two populations (U and P) were assigned three normally distributed, uncorrelated immune response parameters (μ) that varied in distance (d) from each other defined in terms of standard deviations from the mean (SD). Parameter space ranged from distances of 0.0–5.0 SD and efficacy (e) of 5–50%.$

To address the impact of each of these parameters, immunogenicity data consisting of three independent, normally distributed features were simulated for protected and unprotected populations that differed by varying response magnitude distances (d) (). To reflect the infection status labels available in real-world vaccine efficacy field trials, these protected and unprotected subject profiles were then sampled at varying degrees of efficacy (e) up to 50% to generate a dataset representing the uninfected class. By definition, the infected class was generated by sampling only from the unprotected subject class profiles. Collectively, uninfected and infected subject data profiles were generated for a set of 80 different data compositions, each simulated across ten different random seeds. This set included eight different efficacy and ten different distance measures, which were designed to cover the condition space in which case–control correlate analysis based on infection status alone has reduced power and P/U learning methods might add value to CoR discovery.

Inferences of protection status can identify CoR not detected using infection status

Simulated data was used to evaluate the effect of efficacy and distance on case–control CoR analysis based on infection status. Over a wide range of efficacy and distance measures, comparisons between variable distributions for infected versus uninfected subjects and between infected and uninfected protected (UP) subjects were made. Conditions reflecting 15% and 30% efficacy rates separated by 0.5 and 2d were modeled and graphed for a single seed (, biplots). Splitting the uninfected class into uninfected protected and uninfected unprotected improved in power and confidence in the correlates (, violin plots). This analysis represents the gain one would expect from perfect protection label inferences.

Figure 3. Disambiguation of UP and UU improves power and confidence in correlates. Variable (V) biplots of feature 1 and feature 2 for infected (gray) and uninfected (purple) vaccine recipients (top) and for uninfected unprotected (blue) and uninfected protected (red) vaccine recipients (bottom) for distributions with variable efficacy (15% and 30% efficacy) and effect size (0.5 SD and 2 SD). Violin plots show group distributions for V2 (p values by ANOVA with Tukey’s MHC; *p < .05; **p < .01; ****p < .0001).

Setting baseline for ability of ML methods to classify data based on protection status

Before assessing the performance of P/U ML techniques to supplement correlates discovery, it was essential to establish baseline expectations for case–control (infected-uninfected) analysis based on infection status and the accuracy of fully supervised and unsupervised machine learning classification. To accomplish this goal, K-means (unsupervised ML) clustering was performed and SVM (supervised ML) models were trained using two of the three immunogenicity features in the dataset and evaluated for classification performance. The third feature was left out to be used for model evaluation.

The performance of K-means sets a lower baseline for expectation from positive unlabeled ML models as it represents the ability of ML to classify data points according to protection status without considering any outcome information. The K-means algorithm was employed on both infected and uninfected subjects for each dataset. As the distance between the underlying protected and unprotected subject immunogenicity profile data increased, so did the ability of this unsupervised approach to discriminate between protected and unprotected subjects, as defined by Matthew’s Correlation Coefficient (). Similarly, performance improved as vaccine efficacy was increased. Classification performance was also defined using Protection Predictive Value, the proportion of people predicted by the algorithm as protected which are truly protected (), Specificity, the probability of being predicted as protected, conditioned on truly being protected () and Sensitivity, the probability of being predicted as unprotected, conditioned on truly being unprotected (). Performance was also compared to the benchmark set by a case–control analysis based on infection status.

Figure 4. Accuracy of protection status label inferences. (a–d) Evaluation of ML algorithms to accurately predict protection status. Mathew’s correlation coefficient (MCC) (a), Protection Predictive Value (b), Specificity (c) and Sensitivity (d) calculated using data with protection status labels inferred from unsupervised analysis (K-means), supervised analysis (SVM) or P/U ML (RN-SVM) as compared to case-control analysis (infection status labels) or ground truth (actual protection status). Performance metrics are presented for input data sets with variable immunogenicity response variable distance (x-axis) and efficacy (columns). Error bars represent standard deviation across replicates. MCC could not be calculated when all samples were predicted to a single class.

Given the ability to infer protection class, the ability of K-means classifications to support discovery of the held out third feature as a CoR was assessed. The effect sizes assessed by Cliff’s delta () and statistical significance of differences between predicted groups of measurement evaluated by Mann-Whitney U-test () increased as well. Thus, it became easier for the model to produce clusters that closely resembled protection status as distance and efficacy increased. Again, confidence in CoR discovery using these approaches was compared to ground truth protection status labels and infection status labels for each simulated immunogenicity data condition.

Figure 5. Strength and confidence of correlate discovery from inferred protection status. (a,b) Evaluation of effect size and confidence in correlate identification. Cliff’s delta (a) and statistical significance (b) of the held-out immunogenicity feature calculated from inferred protection status labels inferred from unsupervised analysis (K-means), supervised analysis (SVM) or P/U ML (RN-SVM), as compared to case-control analysis labels (infection status labels) or ground truth (actual protection status). Correlate metrics are presented for input data sets with variable immunogenicity response variable distance (x-axis) and efficacy (columns). Statistical significance was calculated using a two-sided Wilcoxon-Mann-Whitney U test with the null hypothesis that the two groups being compared were identical. Error bars represent standard deviation across replicates. Values could not be calculated when all samples were predicted to a single class.

The performance of SVM sets the upper bound expected from positive unlabeled ML models as it represents the ability of ML to classify data points according to protection status. An SVM model was trained using protection status labels with three-fold cross-validation. As with K-means, SVM afforded increases in both classification accuracy and CoR discovery with increasing distance and efficacy (, Supplemental Figure S2). Under most distance and efficacy conditions, the performance of SVM exceeded that of K-means. The gap between performance of K-means and SVM as well as their improvements relative to infection status labels commonly used in the analysis of case–control studies supported the potential for P/U ML methods to offer a useful additional analytical approach.

Semi-supervised learning improves the ability to discover CoR

To directly assess the value of P/U ML to contribute to CoR discovery, a reliable negative (RN) method was employed using two of the three simulated response features. RN SVM takes a two-step approach by first identifying the unlabeled (uninfected) samples most distinct from the positive (infected) class under the hypothesis that these samples represent the protected class.^{Citation29,Citation36} Then, these “reliable negative” samples are used to classify the protection status of the remaining unlabeled (uninfected) samples.

Protection status labels generated by RN-SVM were more accurate than labels based on infection status for most conditions tested, as demonstrated by Matthews Correlation Coefficient, Protection Predictive Value, Specificity, and Sensitivity evaluations (). These results suggest that this approach can successfully identify uninfected subjects that were likely to be infected had they been exposed. To test the resulting improvement in power to identify CoP, the statistical significance (Wilcoxon Mann Whitney U-test) and effect size (Cliff’s Delta) of differences for newly labeled samples were evaluated for the held-out feature and demonstrated a robust ability to improve identification of CoP ().

In general, RN-SVM outperformed K-means at lower distance and efficacy values, and was equivalent to that of SVM at higher values ( and , Supplemental Figure S2). In almost all conditions, RN-SVM provided better identification of uninfected unprotected and uninfected protected data points as compared to infection-status-based analysis of case–control data. Thus, usage of RN-SVM would help to identify correlates not identified through standard analysis.

Efficacy plays an important role in inference accuracy in datasets of smaller size

Beyond efficacy and distance, the number of samples available is also expected to impact the likelihood and confidence of CoR identification. While a useful upper end, n = 1000 represents a greater number of infection cases than are often observed in clinical trials. Because considerably smaller numbers of infection cases have supported identification of CoR for vaccines, with low efficacy using case–control designs based on infection status,^{Citation37,Citation38} we wished to begin to explore anticipated limits to method utility posed by study size. Hence, data simulated for studies in which 500, 250, and 50 infection cases were observed was analyzed. This lower bound represents numbers of infection cases similar to those reported in HIV vaccine efficacy trials. In general, the model performance of RN-SVM when classifying data samples for smaller datasets was similar to that observed at n = 1000 (Supplemental Figures S3 and S4). While efficacy played a more dominant role in determining model performance as the number of available true protected points in the infected dataset became limiting, this analysis showed that even small studies may benefit from the application of P/U ML approaches for correlate discovery ().

Figure 6. Effect of sample size on protection status inferences and correlates discovery. (a) Classification accuracy (MCC) of protection status labels predicted by RN-SVM for different sample sizes (n = 50, 250, 500, 1000). (b) Correlate effect size (Cliff’s delta) for the held-out variable. Metrics are presented for input data sets with variable immunogenicity response distance (x-axis) and efficacy (columns) and sample sizes. Error bars represent standard deviation across replicates. Values could not be calculated when all samples were predicted to a single class.

Robustness of reliable negative SVM

Feature 3 was scrambled and Cliff delta re-assessed as a means to test the robustness of the modeling approach with respect to the risk of false discovery. This analysis demonstrated that scrambling of data points resulted in low Cliff delta values (), suggesting a low risk of false discovery when infection status inferences are applied to discovery of correlates among features not considered in model training.

Figure 7. Robustness of Reliable Negative SVM algorithm evaluated using correlate effect size. Correlate effect size (Cliff’s delta) for the held-out and scrambled held out variable calculated based on inferred protection status label for RN-SVM algorithm as compared to ground truth protection status. Metrics are presented for input data sets with variable immunogenicity response distance (x-axis) and efficacy (columns). Each row represents the different sample sizes. Error bars represent standard deviation across replicates.

Discussion

Vaccine trials are absolutely necessary to establish the ability of a vaccine to protect at-risk populations from infectious diseases. While more direct study designs, such as challenge, ring trials, household, or partner studies, among other designs, can condition analysis on exposure, field trials, the most rigorous means to assess vaccine efficacy, rely on natural exposure to the pathogen of interest, which may be a relatively infrequent event.^Citation39 Given a low rate of exposure and sample size, limits to the infection status information available to support CoR analysis reduce power to both define vaccine efficacy and identify CoR in comparative analyses of infection cases and controls, despite the potential to be refined by incorporation of baseline prognostic or exposure risk variables. Imperfect efficacy reduces statistical power further.^{Citation40,Citation41} The combined effect of these constraints has sometimes resulted in trial designs in which relatively few immunological endpoints are pre-selected prospectively for correlates analysis.^Citation4^,^{Citation24–45} This strategic prioritization maximizes power by design but can leave a knowledge gap of untapped insights into mechanisms of protection.

Beyond these practical constraints, the real-world information from trials defines subjects by infection status. However, because the uninfected class is expected to be composed of protected as well as unexposed individuals, its ability to define correlates is reduced as compared to the ideal labels of protection status. We hypothesized that this “labeling problem” warrants consideration of a different approach. So, our goal was to comparatively evaluate the means to reclassify subjects as “was/would have been protected” and “was/would have been unprotected” if all subjects had been exposed, in order to improve CoR detection. To this end, we evaluated the potential of a Positive Unlabeled Machine Learning using simulated data across a wide range of effect size (0–5 SD) and efficacy (0.05–0.5) values. Efficacy rates higher than 50% were not considered, as under these circumstances most of the data points from the uninfected population belong to an uninfected protected dataset.

To this end, K-means clustering appears to offer some useful discriminative capability. As a fully unsupervised method, this approach can be considered to set a baseline expectation as to how well a model can perform when it does not consider protection/infection outcomes at all. The fact that it can outperform the infection status-based comparisons of case–control study designs under a variety of input data conditions suggests its potential value to correlate discovery. In contrast, fully supervised classification based on fully labeled data represents the upper bound of performance that might be expected from P/U learning approaches. Here, under conditions with relatively higher efficacy and distance, one such method (SVM) approached ground truth.

RN SVM P/U learning, one of the many reported semi-supervised approaches, is centered on learning what the definitively labeled (infected) class “looks like” and then classifying the remaining data as “infected-like” and not “infected-like,” representing the putatively protected class. This approach typically exhibited performance falling between fully unsupervised and fully supervised approaches in each of the performance metrics evaluated. Similarly, its ability to identify and be confident in CoP fell between these comparators. Importantly, however, both K-means and RN SVM outperformed the standard CoR identification approach of comparing responses between infected and uninfected subjects, and in some cases approached ground truth. Collectively, these results suggest the potential utility of both unsupervised and P/U learning approaches to generate hypotheses regarding candidate CoP. For small values of distance and efficacy, SVM often classified all samples to be members of the same class. However, K-means and RN-SVM still yielded class predictions under these circumstances, raising concerns as to the possibility of overfitting and false discovery of CoR. Reassuringly, statistically significant differences in the held-out immunogenicity feature between groups that were too similar to be differentiated by supervised methods were not observed, though they may be expected to result from analysis of features considered in K-means and RN-SVM classification. To safeguard against such false discoveries in the absence of held out data, comparisons to results observed in the context of infection label permutation may be useful.

Limitations to the present analysis include the simplified data streams employed: variables followed normal distributions, and a set of only two features was used to support model inferences. Though not correlated with each other, both were distinct from the distribution among the unprotected subjects. In real-world immunogenicity data, hundreds of response variables may be measured, many of which may not relate to protection at all, and those that do would be expected to exhibit different “distances” from their distributions in infected subjects. Selecting features to train the model might be important in such cases. Additionally, some fraction of subjects may present as non-responsive to the vaccine. Here, vaccine “take” was modeled as being complete. Under what conditions these attributes might generally reduce or enhance modeling performance remains to be determined. To this end, further synthetic data sets that explore this complexity can be simulated. Beyond further efforts with synthetic data, inferences of protection status after selective blinding of vaccine efficacy studies with other designs, such as challenge experiments for which ground truth protection status is known, could provide further support for the potential utility of protection inferences to support candidate CoP identification. Similarly, for disease contexts in which biomarkers of exposure exist, these markers could be explored as a means to either validate the accuracy of or better inform protection status inferences. After validation on more complex datasets, this approach could be deployed in the context of case–control experimental designs as a complement to other analytical approaches to identify candidate correlates in vaccine trials with limited efficacy. Application of these approaches would support the potential to extract additional insights from trials that fail to meet endpoint efficacy criteria or to prioritize subjects among the controls for further analysis. While we have framed our interest in the context of prospective, interventional vaccine efficacy studies, should further work support utility, we envision that extension to other trial designs is feasible.

Overall, based on the simplified analysis presented here, P/U ML methods show potential to solve the protection status labeling problem inherent in vaccine efficacy field trials, motivating the follow-up proposed above. Further, alternative and more sophisticated P/U ML methods such as iterative SVM, SPY-SVM, and bagging SVM can be explored for their ability to solve the protection status labeling problem. Given our results, we hypothesize that they could lead to increased confidence in correlates identified from comparisons based on infection status alone as well as identification of new putative correlates of protection, particularly in the context of smaller sample sizes and low efficacy, and even in the context of trials that have failed to meet endpoint efficacy criteria. Additionally, should further work support generalized utility, application of the proposed approach has the potential to allow studies with shorter duration or fewer participants to yield the same candidate correlates as more resource-intensive designs. Correlates identified by protection status inferences rightly represent hypotheses regarding potential CoP; that this same limitation applies to traditional CoR. While only pathogen challenge can identify definitive CoP, the prospect of deriving greater benefit from experimental studies in humans motivates pushing the boundaries of trial analysis, particularly as immunological data sets become more and more sophisticated, and inference methods improve in accuracy, widespread utility, and scientific acceptance.

Author contribution

Investigation: N.K. and K.M. Coding: N.K. and K.M.; Data analysis and data visualization: N.K.; Writing-original draft: N.K. and K.M.; Writing-reviewing and editing: all authors. Supervision: M.E.A.; Conceptualization and funding: M.E.A.

Supplemental material

Supplemental Material

Download PDF (947.2 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplemental data

Supplemental data for this article can be accessed on the publisher’s website at https://doi.org/10.1080/21645515.2023.2204020.

Additional information

Funding

The study was supported by the National Institutes of Health NIAID under grants [R56AI165448 and P01AI162242].

References

World Health Organization. Correlates of vaccine- induced protection: method and implications. WHO/IVB/13.01. 2013.
Google Scholar
Qin L, Gilbert PB, Corey L, McElrath MJ, Self SG. A framework for assessing immunological correlates of protection in vaccine trials. J Infect Dis. 2007;196(9):1–11. doi:10.1086/522428.
PubMed Web of Science ®Google Scholar
Garber DA, Silvestri G, Feinberg MB. Prospects for an AIDS vaccine: three big questions, no easy answers. Lancet Infect Dis. 2004;4(7):397–413. doi:10.1016/S1473-3099(04)01056-4.
PubMed Web of Science ®Google Scholar
Haynes BF, Gilbert PB, McElrath MJ, Zolla-Pazner S, Tomaras GD, Alam SM, Evans DT, Montefiori DC, Karnasuta C, Sutthent R, et al. Immune-correlates analysis of an HIV-1 vaccine efficacy trial. N Engl J Med. 2012;366(14):1275–86. doi:10.1056/NEJMoa1113425.
PubMed Web of Science ®Google Scholar
Corey L, Gilbert PB, Tomaras GD, Haynes BF, Pantaleo G, Fauci AS. Immune correlates of vaccine protection against HIV-1 acquisition. Sci Transl Med. 2015;7(310):310rv7. doi:10.1126/scitranslmed.aac7732.
PubMed Web of Science ®Google Scholar
Mehta P, Bukov M, Wang CH, Day AGR, Richardson C, Fisher CK, Schwab DJ. A high-bias, low-variance introduction to machine learning for physicists. Phys Rep. 2019;810:1–124. doi:10.1016/j.physrep.2019.03.001.
PubMed Web of Science ®Google Scholar
Chicco D. Ten quick tips for machine learning in computational biology. BioData Min. 2017;10(1):35. doi:10.1186/s13040-017-0155-3.
PubMedGoogle Scholar
Chung AW, Kumar MP, Arnold KB, Yu WH, Schoen MK, Dunphy LJ, Suscovich T, Frahm N, Linde C, Mahan A, et al. Dissecting polyclonal vaccine-induced humoral immunity against HIV using systems serology. Cell. 2015;163(4):988–98. doi:10.1016/j.cell.2015.10.027.
PubMed Web of Science ®Google Scholar
Neidich SD, Fong Y, Li SS, Geraghty DE, Williamson BD, Young WC, Goodman D, Seaton KE, Shen X, Sawant S, et al. Antibody Fc effector functions and IgG3 associate with decreased HIV-1 risk. J Clin Invest. 2019;129(11):4838–49. doi:10.1172/JCI126391.
PubMed Web of Science ®Google Scholar
Ackerman ME, Das J, Pittala S, Broge T, Linde C, Suscovich TJ, Brown EP, Bradley T, Natarajan H, Lin S, et al. Route of immunization defines multiple mechanisms of vaccine-mediated protection against SIV. Nat Med. 2018;24(10):1590–8. doi:10.1038/s41591-018-0161-0.
PubMed Web of Science ®Google Scholar
Bradley T, Pollara J, Santra S, Vandergrift N, Pittala S, Bailey-Kellogg C, Shen X, Parks R, Goodman D, Eaton A, et al. Pentavalent HIV-1 vaccine protects against simian-human immunodeficiency virus challenge. Nat Commun. 2017;8(1):15711. doi:10.1038/ncomms15711.
PubMedGoogle Scholar
Pittala S, Bagley K, Schwartz JA, Brown EP, Weiner JA, Prado IJ, Zhang W, Xu R, Ota‐setlik A, Pal R, et al. Antibody Fab-Fc properties outperform titer in predictive models of SIV vaccine-induced protection. Mol Syst Biol. 2019;15(5):e8747. doi:10.15252/msb.20188747.
PubMed Web of Science ®Google Scholar
Vaccari M, Gordon SN, Fourati S, Schifanella L, Liyanage NP, Cameron M, Keele BF, Shen X, Tomaras GD, Billings E, et al. Corrigendum: adjuvant-dependent innate and adaptive immune signatures of risk of SIVmac251 acquisition. Nat Med. 2016;22(10):1192. doi:10.1038/nm1016-1192a.
PubMed Web of Science ®Google Scholar
Kazmin D, Nakaya HI, Lee EK, Johnson MJ, van der Most R, van den Berg RA, Ballou WR, Jongert E, Wille-Reece U, Ockenhouse C, et al. Systems analysis of protective immune responses to RTS,S malaria vaccination in humans. Proc Natl Acad Sci U S A. 2017;114(9):2425–30. doi:10.1073/pnas.1621489114.
PubMed Web of Science ®Google Scholar
Team H-CSP, Consortium H-I. Multicohort analysis reveals baseline transcriptional predictors of influenza vaccination responses. Sci Immunol. 2017;2(14). doi:10.1126/sciimmunol.aal4656.
Web of Science ®Google Scholar
Fourati S, Ribeiro SP, Blasco Tavares Pereira Lopes F, Talla A, Lefebvre F, Cameron M, Kaewkungwal J, Pitisuttithum P, Nitayaphan S, Rerks-Ngarm S, et al. Integrated systems approach defines the antiviral pathways conferring protection by the RV144 HIV vaccine. Nat Commun. 2019;10(1):863. doi:10.1038/s41467-019-08854-2.
PubMedGoogle Scholar
Lin L, Finak G, Ushey K, Seshadri C, Hawn TR, Frahm N, Scriba TJ, Mahomed H, Hanekom W, Bart P-A, et al. COMPASS identifies T-cell subsets correlated with clinical outcomes. Nat Biotechnol. 2015;33(6):610–6. doi:10.1038/nbt.3187.
PubMed Web of Science ®Google Scholar
Querec TD, Akondy RS, Lee EK, Cao W, Nakaya HI, Teuwen D, Pirani A, Gernert K, Deng J, Marzolf B, et al. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nat Immunol. 2009;10(1):116–25. doi:10.1038/ni.1688.
PubMed Web of Science ®Google Scholar
Arevalillo JM, Sztein MB, Kotloff KL, Levine MM, Simon JK. Identification of immune correlates of protection in Shigella infection by application of machine learning. J Biomed Inform. 2017;74:1–9. doi:10.1016/j.jbi.2017.08.005.
PubMed Web of Science ®Google Scholar
Gitte Vanwinckelen HB. On estimating model accuracy with repeated cross-validation. BeneLearn 2012: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning; 2012.
Google Scholar
Hawkins DM. The problem of overfitting. J Chem Inf Comput Sci. 2004;44(1):1–12. doi:10.1021/ci0342472.
PubMedGoogle Scholar
Subramanian J, Simon R. Overfitting in prediction models – is it a problem only in high dimensions? Contemp Clin Trials. 2013;36(2):636–41. doi:10.1016/j.cct.2013.06.011.
PubMed Web of Science ®Google Scholar
Han H, Jiang X. Overcome support vector machine diagnosis overfitting. Cancer Inform. 2014;13:145–58. doi:10.4137/CIN.S13875.
PubMedGoogle Scholar
Rolland M, Gilbert P. Evaluating immune correlates in HIV type 1 vaccine efficacy trials: what RV144 may provide. Aids Res Hum Retrov. 2012;28(4):400–4. doi:10.1089/aid.2011.0240.
PubMed Web of Science ®Google Scholar
Om K, Paquin-Proulx D, Montero M, Peachman K, Shen X, Wieczorek L, Beck Z, Weiner JA, Kim D, Li Y, et al. Adjuvanted HIV-1 vaccine promotes antibody-dependent phagocytic responses and protects against heterologous SHIV challenge. PLoS Pathog. 2020;16(9):e1008764. doi:10.1371/journal.ppat.1008764.
PubMed Web of Science ®Google Scholar
Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255–60. doi:10.1126/science.aaa8415.
PubMed Web of Science ®Google Scholar
Batta M. Machine learning algorithms - a review. Int J Sci Res. 2020;9:381–6.
Google Scholar
Camargo G, Bugatti PH, Saito PTM. Active semi-supervised learning for biological data classification. PLoS One. 2020;15(8):e0237428. doi:10.1371/journal.pone.0237428.
PubMed Web of Science ®Google Scholar
Bekker J, Davis J. Learning from positive and unlabeled data: a survey. Mach Learn. 2020;109(4):719–60. doi:10.1007/s10994-020-05877-5.
Web of Science ®Google Scholar
Bing Liu WSL, Yu PS, Xiaoli L. Partially supervised classification of text documents. International Conference on Machine Learning; 2002;2:387–394.
Google Scholar
Team RC. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing; 2022.
Google Scholar
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: misc functions of the department of statistics, probability theory group (Formerly: e1071), TU Wien. R package version 1.7-11; 2022.
Google Scholar
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28(5):28. doi:10.18637/jss.v028.i05.
Web of Science ®Google Scholar
Chicco D, Totsch N, Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021;14(1). doi:10.1186/s13040-021-00244-z.
Web of Science ®Google Scholar
Torchiano M. effsize: efficient effect size computation. 2020.
Google Scholar
Bangzuo Zhang WZ. Reliable negative extracting based on kNN for learning from positive and unlabeled examples. J Comput. 2009;4:94–101.
Google Scholar
Karasavvas N, Billings E, Rao M, Williams C, Zolla-Pazner S, Bailer RT, Koup RA, Madnote S, Arworn D, Shen X, et al. The Thai phase III HIV type 1 vaccine trial (RV144) regimen induces antibodies that target conserved regions within the V2 loop of gp120. Aids Res Hum Retrov. 2012;28(11):1444–57. doi:10.1089/aid.2012.0103.
PubMed Web of Science ®Google Scholar
Hammer SM, Sobieszczyk ME, Janes H, Karuna ST, Mulligan MJ, Grove D, Koblin BA, Buchbinder SP, Keefer MC, Tomaras GD, et al. Efficacy trial of a DNA/rAd5 HIV-1 preventive vaccine. N Engl J Med. 2013;369(22):2083–92. doi:10.1056/NEJMoa1310566.
PubMed Web of Science ®Google Scholar
Orenstein WA, Bernier RH, Dondero TJ, Hinman AR, Marks JS, Bart KJ, Sirotkin B. Field evaluation of vaccine efficacy. Bull World Health Organ. 1985;63:1055–68.
PubMed Web of Science ®Google Scholar
Othus M, Zhang MJ, Gale RP. Clinical trials: design, endpoints and interpretation of outcomes. Bone Marrow Transplant. 2022;57(3):338–42. doi:10.1038/s41409-021-01542-0.
PubMed Web of Science ®Google Scholar
Kublin JG, Morgan CA, Day TA, Gilbert PB, Self SG, McElrath MJ, Corey L. HIV vaccine trials network: activities and achievements of the first decade and beyond. Clin Investig (Lond). 2012;2(3):245–54. doi:10.4155/cli.12.8.
PubMedGoogle Scholar
Follmann D, Duerr A, Tabet S, Gilbert P, Moodie Z, Fast P, Cardinali M, Self S. Endpoints and regulatory issues in HIV vaccine clinical trials: lessons from a workshop. J Acquir Immune Defic Syndr. 2007;44(1):49–60. doi:10.1097/01.qai.0000247227.22504.ce.
PubMed Web of Science ®Google Scholar
Gilbert PB, DeGruttola VG, Hudgens MG, Self SG, Hammer SM, Corey L. What constitutes efficacy for a human immunodeficiency virus vaccine that ameliorates viremia: issues involving surrogate end points in phase 3 trials. J Infect Dis. 2003;188(2):179–93. doi:10.1086/376449.
PubMed Web of Science ®Google Scholar
Rerks-Ngarm S, Paris RM, Chunsutthiwat S, Premsri N, Namwat C, Bowonwatanuwong C, Li SS, Kaewkungkal J, Trichavaroj R, Churikanont N, et al. Extended evaluation of the virologic, immunologic, and clinical course of volunteers who acquired HIV-1 infection in a phase III vaccine trial of ALVAC-HIV and AIDSVAX B/E. J Infect Dis. 2013;207(8):1195–205. doi:10.1093/infdis/jis478.
PubMed Web of Science ®Google Scholar
Rida W, Fast P, Hoff R, Fleming T. Intermediate-size trials for the evaluation of HIV vaccine candidates: a workshop summary. J Acq Immun Def Synd. 1997;16(3):195–203. doi:10.1097/00042560-199711010-00009.
Web of Science ®Google Scholar

Foundations for improved vaccine correlate of risk analysis using positive-unlabeled learning

ABSTRACT

Introduction