881
Views
1
CrossRef citations to date
0
Altmetric
Immunology

From bench to bedside via bytes: Multi-omic immunoprofiling and integration using machine learning and network approaches

, , , & ORCID Icon
Article: 2282803 | Received 15 Jul 2023, Accepted 09 Nov 2023, Published online: 15 Dec 2023

ABSTRACT

A significant surge in research endeavors leverages the vast potential of high-throughput omic technology platforms for broad profiling of biological responses to vaccines and cutting-edge immunotherapies and stem-cell therapies under development. These profiles capture different aspects of core regulatory and functional processes at different scales of resolution from molecular and cellular to organismal. Systems approaches capture the complex and intricate interplay between these layers and scales. Here, we summarize experimental data modalities, for characterizing the genome, epigenome, transcriptome, proteome, metabolome, and antibody-ome, that enable us to generate large-scale immune profiles. We also discuss machine learning and network approaches that are commonly used to analyze and integrate these modalities, to gain insights into correlates and mechanisms of natural and vaccine-mediated immunity as well as therapy-induced immunomodulation.

Introduction

The abundance of deep profiles quantifying responses to vaccines and therapies has enabled data integration using principles from network biology, where relationships between and across profiles are represented as complex graphs to probe the inherent relationships present in different biological systems. The integration of complex multi-scale immunoprofiles presents a perfect opportunity to leverage different systems approaches. To mount an effective response, the immune system has to trigger signaling cascades consisting of intricate biological networks and multi-modal information flow between its diverse cell types. Systems approaches capture the complex and intricate interplay between these layers and scales. However, the analysis of these deep profiles has also brought significant statistical challenges (e.g., ultra high-dimensionality, multicollinearity and differences in data modalities) and opportunities to address these challenges using novel machine learning and network techniques. Furthermore, traditional methods for immune profiling often use reductionist approaches to investigate specific cellular and molecular mechanisms based on targeted hypotheses. Although valuable in studying a focused list of candidates and associated hypotheses, these analyses are often limited in uncovering complex multifactorial processes as they involve the interplay of networks beyond their underlying complex mechanisms and the overall systemic effect of the very candidates they prioritize. Moreover, when different types of omics data are analyzed in isolation, it often leads to incomplete or inconsistent interpretations of biological phenomena.

A swath of new computational frameworks aims to integrate multi-modal data using a suite of approaches. Here, we summarize the latest advancements on both the technological and computational front that enable the integration of various types of omics data, such as genomic, transcriptomic, epigenomic, proteomic, metabolomic, spatial omic, and antibody-omic datasets. Furthermore, we discuss how the availability of these multi-omic techniques has dramatically altered immunoprofiling in terms of the design of experiments, the complexity of questions being asked, as well as the novel machine learning and network approaches employed to analyze these complex and high-dimensional datasets.

Profiling the genome and the regulome

We first focus on genomic and epigenomic technologies for multi-modal and spatial profiling, detailing their application to various studies and questions in immunoprofiling (). Single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technologies have revolutionized the field by providing high-throughput and high-resolution means to describe the cellular state of individual immune cells. However, these technologies only provide a partial view of the cellular state by examining the transcriptome or chromatin accessibility of a single modality. Consequently, researchers have begun to explore parallel profiling technologies that can measure multiple modalities simultaneously on the same cell or sample providing gateways to gain a more holistic view.

Figure 1. Single-cell perspectives frequently combine genomic, epigenomic, transcriptomic and spatial modalities to develop parallel “snapshots” of cellular function. Established antibody-omic techniques and target-agnostic proteomics are valuable complements that provide a holistic systems immunology perspective.

Figure 1. Single-cell perspectives frequently combine genomic, epigenomic, transcriptomic and spatial modalities to develop parallel “snapshots” of cellular function. Established antibody-omic techniques and target-agnostic proteomics are valuable complements that provide a holistic systems immunology perspective.

While individual scRNA-seq and scATAC-seq technologies provide transcriptomic and epigenomic profiles alone, platforms including 10X Multiome, SHARE-seq, and sciCARCitation1,Citation2 have been used in various studies to gain a more comprehensive understanding by profiling the transcriptome and epigenome simultaneously (). Linking transcriptome and protein expression has also garnered attention from researchers. CITE-seq and REAP-seq are examples of platforms that can measure protein expression, leveraging prior knowledge about specific cell states, and transcriptome expression (). Indeed, these technologies for coupled multi-modal profiling have been used in a wide range of contexts to characterize both vaccine- and therapy-induced immune responses.Citation3–10 For example, two recent studies have uncovered broad transcriptional signatures that predict antibody response to 13 vaccines, using gene lists generated from CITE-seq data.Citation11,Citation12 This effort showed pre-vaccination NF-kB and IRF-7 transcriptional programs and vaccine type can dramatically influence the post-vaccination response in an individual, and it identified immune endotypes (high, mixed, and low) that broadly predict strong response to multiple vaccines. Overall, characterizing complex, emergent immune phenotypes may come to rely less on identifying a unique gene signature and more on defining the phenotype itself.

Immunologists are also increasingly interested in parallel profiling platforms that can profile TCR/BCR sequences, as these methods can directly target immune cells and answer specific mechanistic questions. For example, a recent study utilized paired single-cell RNA-BCR sequencing on human PBMC samples from healthy controls, asymptomatic, and symptomatic COVID-19 patients to elucidate differences in antibody repertoire across the three groups.Citation13 TCR sequencing technologies can answer a variety of questions, such as tracing the behavior of specific clones of T cells with antigen stimulation or the expansion rate of different clones. For example, in vaccine design, this data can be used with computational prediction tools to curate epitope selection specific to T or B-cells, and to modulate the type and strength of immune responses.Citation14,Citation15 ECCITE-seq, a platform that can capture surface protein expression, transcriptome, clonotypes, and CRISPR perturbations, has been employed to track longitudinal CD4 T cell clonal expansion in HIV-infected patients who have received antiretroviral therapy.Citation16 This has enabled the investigation of which CD4 T-cell subsets continue to harbor HIV-1, are persistent to cell death, and drive the proliferation of HIV-1 infected cells.Citation17

Spatial transcriptomics has emerged as the next frontier of sequencing technologies as it provides gene expression information in a spatially resolved manner within complex tissues such as lymph nodes or in the tumor microenvironment (). By enabling the characterization of the specific localization, cross-talk and functional states of different immune cell types within different environments such as diseased tissues, lymphoid organs and in the presence vs. absence of antigen, spatial transcriptomics offers valuable insight into mechanisms underlying various immune responses. Recent studies have utilized spatial transcriptomics, in combination with other technologies, to identify potential therapeutic targets in many contexts. Some widely used spatial transcriptomic and proteomic technologies include 10X Visium, Slide-Seq, Nanostring GeoMx, MERFISH, and CODEX.Citation18–20 Each has its unique strengths and weaknesses. For example, merFISH is the only image-based spatial transcriptomic method, while the others are sequencing-based methods. merFISH can detect mRNA at a much higher resolution compared to other sequencing-based technologies. One of the main disadvantages of merFISH is that it has a limit of the physical size of the tissue being profiled. Certain sequencing-based methods tend to have a lower resolution (e.g., 10X Visium has a capture area of 6.5 mm by 6.5 with each region having a diameter of 55 micron) but can profile larger tissue sections and can capture the entire transcriptome.

Spatial transcriptomics platforms can be considered parallel profiling technologies as they reveal spatial location and transcriptome profiles simultaneously (). Recent advancements have expanded the scope of these platforms to allow additional modalities to be studied such as TCR sequences such as Slide-TCR-seq which provides high-resolution RNA profiling and T-cell clonotypes.Citation21,Citation22 These approaches can be readily applied to design better therapeutics taking into account the heterogeneity of molecular and cellular phenotypes across spatial niches.

Beyond the genome and regulome – the proteome, the metabolome and the antibody-ome

Advances in high throughput genomic, transcriptomic, and epigenomic technologies have played a critical role in profiling disease and dysregulation associated cellular states, but these measurements often do not correlate well with other measurements (e.g., protein abundances or antibody profiles) at different scales of resolution. Thus, they provide an incomplete view of a complex multi-scale multi-factorial immune response to an external stimulus or perturbation. Technologies quantifying protein abundance have been widely used, but they have only recently become high-throughput amenable (). Further, technologies for discovering protein–protein interactions have been used for deep-profiling different complexes across contexts. Therefore, experiments need to be designed with considerations for the expression of proteins and their subsequent interactions along with the statistical/computational caveats that arise from the analysis of biological big data.

Immunoproteomic profiling is often synonymous with cytokine profiling using Luminex (multiplexed ELISA) or MSD-based assays (). These are bead-based and electrochemiluminescence-based immunoassays to detect the presence of specific cytokines via relevant detection antibodies. Clever multiplexing has significantly increased the throughput of these assays. Chip- and array-based technologies call for a well-planned experimental design as the number and choice of the reporters and probes will decide the appropriate analysis downstream. However, while these technologies are suitable for targeted profiling, approaches such as mass spectrometry (MS), can and have been used for untargeted proteomics and metabolomics. MS has been used for the early diagnosis of a bacterial infection and has applications in viral protein characterization. For example, a study generated the full map of the post-translational modifications (PTMs) to the SARS-COV-2 spike protein using liquid chromatography with tandem mass spectrometry to study the role of certain PTMs to modulate the host factor binding.Citation23–25 MS has its limitations as certain experiments can be noisy, as many fragment ions will fail to be observed and the spectrum will also have unexplained peaks.Citation26 One solution to the spectrum identification problem is the database search approach, where candidate peptides are chosen if their mass falls in a certain range. Given that a database is used for the identification of peptides in an MS-based experiment, a major challenge is identifying the peptides not present in these databases, which are generally polymorphic in nature or have undergone post-translational modifications.

Aptamer-based proteomic approaches (), too, are particularly well suited to protein biomarker discovery – these utilize enormous libraries of chemically modified oligonucleotides that are optimized to bind specific protein epitopes.Citation27 Creating a library of aptamers does not require prior knowledge and is a high-throughput process. An unbiased library can be mixed with a small protein sample, then separated and profiled by hybridizing the aptamers to a microarray. Affinity purification mass spectrometry (AP-MS) is the only MS method for mapping protein–protein interactome networks on a proteomic scale, this method can also be used to detect protein co-complexes. Another widely used method – Yeast-2-Hybrid (Y2H) can be used to detect binary physical protein–protein interactions, where the interaction between a bait and a prey protein leads to a color reaction or growth in specific media.Citation28 These technologies have been used to provide fundamental links between genotypes and phenotypes via the characterization of interaction-specific disruptions.Citation29 Single-cell resolution of B cell receptor sequence can provide valuable information about the germinal center response; if paired with newer technologies such a SEC-seq, which allows for isolation of secreted antibodies from single cells, these methods could potentially allow for direct comparison of antibody avidity across receptor clonalities.Citation30,Citation31

Temporal immunoprofiling can reveal biological phenomena that are undetectable in individual multi-omic timepoints. For example, temporally resolved proteomic, metabolomic, and transcriptomic data from recipients of the Zostavax vaccine uncovered multi-factorial cellular and humoral correlates of immunogenicity.Citation32 Cytokine and proteomic profiling have been widely used in immunoprofiling in Covid, both to capture natural and vaccine-induced immunity.Citation3,Citation33–38

The functional humoral immune response provides important context that can complement data from the modalities described above. Antibodies (Abs) are critical mediators of humoral adaptive immunity that are produced by B cells in response to pathogen and tumor assaults. Via the recognition of specific antigens, antibodies can carry out a broad array of functions. They can directly prevent infection by blocking virus attachment and entry to susceptible host cells or prevent spread by blocking budding and release through antigen engagement via the antigen-binding fragment (Fab) region. Abs also act as molecular beacons that recruit innate immune cells and complement to unleash effector functions on pathogens, infected cells, and tumor cells by a combination of antigen binding and crystallizable fragment (Fc) region interactions with Fc receptors (FcR) on the surface of leukocytes and complement component 1q (C1q). They can indirectly stimulate cellular immunity through antigen uptake of immune complexes by antigen presenting cells, also via antigen and FcR engagement. Abs also enhance infection in FcR-bearing cells, leading to increased pathogen replication and burden, and enhance disease by stimulating unrestrained inflammation. The important roles these functions play to generate protection against natural infection serve as a blueprint for design and characterization of antigens capable of eliciting antibodies with these functions, in vaccine development. Similarly, in therapeutic development, synthetic antibodies are engineered for functions that are needed to effectively treat inflammation, cancer, autoimmunity, infectious disease, and neurodegeneration. Thus, a comprehensive characterization of antibody functional features is needed to discover markers of disease and protection and to develop effective medical interventions.

Assessments of antigen-specific antibody functions are made using in vitro assays, each specialized for detection of a single function.Citation39 Binding and functional assays are used to characterize both Fab and Fc functions (). Antigen-specific binding features of the Fab, including magnitude, affinity, and avidity, are measured using traditional enzyme-linked immunosorbent assay (ELISA) and variants thereof that utilize flow cytometry, as well as label-free, real-time surface plasmon resonance (SPR) and biolayer interferometry (BLI) technologies. Virus neutralization mediated by the Fab is conventionally measured using native or pseudotyped viruses and a host cell that expresses the virus receptor. Importantly, assay conditions such as host cell selection and, in assays using pseudotyped virus, target glycoprotein distribution, conformation, and density impact neutralizing activity.Citation40 Consequently, extrapolation from neutralization assays to in vivo protection requires caution. There is considerable evidence now that in vitro neutralization does not guarantee in vivo protection,Citation41–44 suggesting that Fc-dependent effector mechanisms contribute to antibody-mediated antiviral immunity. As an example, broadly neutralizing antibodies against influenza hemagglutinin and neuraminidase fail to protect in vivo without Fc-FcgR interactions.Citation42 Moreover, the same study showed that non-neutralizing antibodies are protective, and the protective effect requires Fc-FcgR interactions.

To characterize antibody function beyond neutralization, many Fc biophysical and functional features are studied in infectious disease, as well as in other disease areas where antibodies are markers of disease progression or therapeutic efficacy (). Biophysical properties of the Fc including antibody isotype (IgM, IgG, IgA, and IgE), subclass (e.g., IgG1, IgG2, IgG3, and IgG4), and antibody-FcR/C1q binding are measured by ELISA, SPR, and BLI. In addition, antibody glycosylation profiles are assessed using mass spectrometry and capillary electrophoresis. Immune complex interactions with FcRs and C1q can stimulate a broad range of functions that are generally measured using flow cytometry: antibody-dependent cellular phagocytosis involving monocytes, macrophages, dendritic cells, neutrophils, eosinophils, and basophils, antibody-dependent cellular cytotoxicity, antibody-dependent enhanced infection, antibody-dependent enhanced disease, cytokine and chemokine release, complement deposition, complement-dependent cell-mediated phagocytosis, complement-dependent cellular cytotoxicity, and complement-dependent cell-mediated cytotoxicity.

Unsupervised and supervised machine learning approaches that integrate multi-omic datasets without biological priors

Multi-omic data integration presents several pivotal statistical challenges. These include high dimensionality, multicollinearity, data heterogeneity, and sparsity. Integration techniques need to address these issues, often through a combination of data preprocessing and integrated computational steps.

Unsupervised methods for multi-omic data integration () have the strength of being able to use all single-modality datasets, but their limitation is posed by the non-simultaneously measured, non-cell-matched nature of the inputs. To address this, these methods anchor different modalities on the same feature space. Linked inference of genomic experimental relationships (LIGER) is a prominent example of methods of group factor analysis, which provide a generative and probabilistic model that reduces the dimensionality of the original data and finds associations between different technologies.Citation45 Specifically, LIGER utilizes non-negative matrix factorization, which produces interpretable and low-dimensional latent factors for each cell, combining information from the distinctive data modalities. Seurat V3 represents a nonlinear statistical approach that seeks to identify the common subspace shared by multiple modalities.Citation46 To achieve this goal, Seurat V3 first reduces the dimensionality of input datasets through diagonalized canonical component analysis, then identifies pairs of cells, referred to as “anchors” that span multiple datasets using a mutual nearest neighbor algorithm. Despite the strengths of LIGER, such as interpretability, its latent factors are dependent on initialization . Additionally, many factor analysis-based methods have inherent linearity which can limit the discovery of non-linear relationships in the data.

Figure 2. High-level conceptual schematics of common computational multi-omics datasets integration methods including integrated visualization (top panel), factor analysis and matrix factorization (middle panel), and neural network-based methods (bottom panel).

Figure 2. High-level conceptual schematics of common computational multi-omics datasets integration methods including integrated visualization (top panel), factor analysis and matrix factorization (middle panel), and neural network-based methods (bottom panel).

In contrast to unsupervised methods, semi-supervised frameworks () anchor the shared sub-space across modalities by utilizing the shared sample space i.e., semi-supervision is achieved by the fact that the input data, or a subset of it, share the same set of cells. One type of semi-supervised framework focuses on integrating modalities simultaneously measured on the same cell, such as transcriptomic expression and chromatin accessibility within a 10X Multiome experiment. MOFA+, a Bayesian PCA, performs a factor analysis analogous to LIGER, and is among one the most commonly used semi-supervised frameworks.Citation47 MOFA+ can discover underlying latent factors, which can explain the variance across data modalities to varying degrees. Like LIGER, one of the main limitations of MOFA+ is that only linear relationships are captured. Another related method is implemented within V4 of Seurat,Citation48 which leverages the weighted nearest neighbor (WNN) method to assign weights to each data modality based on its relative information content within each cell. Specifically, a WNN graph that represents each cell as a weighted combination of each modality is generated. Seurat V4 has been applied to different paired datasets such as CITE-seq, ECCITE-seq, and 10X Multiome. Seurat V4 has been applied across a wide range of multi-omic datasets, including to ECCITE-seq data from patients who have received HIV vaccinations. WNN integration recovered 57 high-resolution cell clusters characterized by distinct differentially expressed genes and immunophenotypic markers. Importantly, these clusters included cell types such as regulatory T-cells and MAIT cells, which often cannot be separated based on scRNA-seq alone. BABEL, an autoencoder-based deep generative model, not only performs paired integration but also infers expression values in other modalities.Citation49 BABEL can be used to reconstruct RNA expression profiles using models trained on scATAC-seq measurements from similar cell types and vice versa. However, a key limitation of BABEL is that the model needs to be trained on similar cell types and may not perform well on unfamiliar samples.

The above approaches are limited by the fact that the modalities need to be measured simultaneously on the same cells. To address this limitation, multiple frameworks have been developed to integrate single-cell datasets across different experiments (). They are integrated via the use of bridge integration, where the parallel profile acts as the bridge.Citation50 The bridge is a sample-matched dataset such as 10× Multiome, which acts like a “gateway” to integrate scRNA-seq and scATAC-seq data produced in distinct experiments. This framework relies on dictionary learning concepts to integrate single-modality data by finding a shared subspace of the cells coming from each modality and defining shared features in the subspace. By applying bridge integration to human bone marrow mononuclear cells, rare cell types were discovered in the scATAC-seq data including innate lymphoid cells and AXL+SIGLEC6+ dendritic cells, which was not observed in the epigenome data before. Similar to BABEL, variational autoencoders and related neural network approaches have also been proposed for integrating single-modality data. Cobolt is specifically designed to integrate scRNA-seq and scATAC-seq datasets.Citation51 Cobolt has also been used to demonstrate paired datasets such as SNARE-seq as well as using SNARE-seq as leverage to integrate single-modal datasets. However, it is worth noting that Cobolt is designed for integration of scRNA-seq and scATAC-seq and is not suitable to use for other modalities.

Spatial transcriptomic and proteomic platforms are unique parallel profiling technologies that provide spatially resolved expression data. Unlike previous technologies discussed such as SHARE-seq or CITE-seq, spatial transcriptomic technologies do not require correction and integration of the spatial and expression data modalities. However, the lack of single-cell resolution motivated the development of deconvolution methods such as Cell2Location and RCTD.Citation52,Citation53 Both methods utilize biological priors i.e., reference transcriptomic datasets to deconvolve the original regions into single cells by studying the cell-type composition of the regions. For example, Cell2Location applied to human gut tissue has enabled the mapping of rare cell types such as FCRL4+ memory B cells. Spatial proteomics datasets such as CODEX can also be deconvolved using methods like STELLAR.Citation54 STELLAR constructs cell graphs from the reference and experimental spatial datasets and employs a convolutional neural network to capture cell neighborhoods, enabling the identification of differential and potentially novel cell types across different tissues/donors or biological conditions.

While multi-omic integration from a computational and statistical perspective primarily focuses on finding a shared sub-space to embed and cross-predict across modalities (), integration in large profiling and characterization studies can also be interpreted as pooling data across modalities from relevant experiments. These may or may not look for a shared representation, but still integrate profiles generated by these modalities. Integration attempts have been pursued on vaccine-induced cellular and humoral responses using purely supervised approaches including partial least squares approaches, regularized regression (e.g., LASSO and Elastic Net) and tree-based approaches (e.g., random forest).Citation11,Citation12,Citation32,Citation55–62 These studies typically encompass either deep profiles (i.e., high-dimensional datasets) for one or more vaccination strategies and aim to elucidate multi-scale multi-omic correlates of endpoints. These include cellular and humoral signatures at different timepoints (e.g., before and after vaccination) that relate to endpoints of interest (e.g., immunogenicity and protection). These multi-omic studies present the power of deepening our understanding of the immune response to vaccinations from a systems view. The use of low-rank approximations (projection via partial least squares into a low-dimensional space), feature selection through L1 (LASSO) or L1+L2 (Elastic Net) regularization and bootstrap aggregation (random forest) help navigate the high-dimensional space and converge on immune correlates and biomarkers of interest. However, most of these approaches are focused on predictive markers; some recent techniques like Essential Regression look to move beyond prediction to actual inference of putative causal relationships.Citation63

Recently, we introduced a novel approach called SLIDE, aimed to solve the significant statistical challenges outlined earlier (https://www.biorxiv.org/content/10.1101/2022.11.25.518001v1). SLIDE is a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, and has rigorous FDR control. SLIDE outperforms/performs at least as well as a wide range of state-of-the-art approaches, including other latent factor approaches, in terms of prediction. More importantly, it provides biological inference beyond prediction that other methods do not afford.

Machine learning approaches that leverage biological priors

In order to mount an effective response, the immune system has to trigger signaling cascades consisting of intricate networks of genes and proteins to ensure robust cross-talk between its diverse cell types. Inherently complex network organizations are at play to combat external threats and regulate internal processes to maintain homeostasis and prevent disease when homeostasis is disrupted. Many approaches in systems immunology leverage these complex priors represented as networks at different scales to probe individual and system-wide components to study the immune regulation/dysregulation in infectious disease, chronic graft rejection, autoimmunity and various cancers. These approaches are strongly reliant on biological priors. The integration of different modalities leads to a condensation of prior knowledge into an interpretable data structure, such as graphs, to construct a meaningful network for any biological system. Therefore, to successfully design experiments aimed at studying system-wide perturbations and organization of the immune system, a knowledge of different biological networks along with the tools required to analyze them is of paramount importance ().

Figure 3. The top panel illustrates the application of methods like network propagation on biological networks guided by priors like expression and epigenetic information to uncover disease/trait relevant subnetworks or modules. The bottom panel depicts the recent use of generative deep learning models to integrate different biological networks in a transformed (reduced dimension) space.

Figure 3. The top panel illustrates the application of methods like network propagation on biological networks guided by priors like expression and epigenetic information to uncover disease/trait relevant subnetworks or modules. The bottom panel depicts the recent use of generative deep learning models to integrate different biological networks in a transformed (reduced dimension) space.

An important biological network is the protein interactome () which is the network of physically interacting proteins.Citation64 The PPI network is scale-free, meaning proteins with a low number of interactions significantly outnumber the highly connected proteins.Citation65 Scale-free networks are robust to random errors. Only the removal of well-connected proteins led to a collapse of the underlying topology of the network.Citation66 The high degree proteins are generally essential for maintaining organismal fitness, and therefore the computational model of the proteome is representative of the biological “safety nets” in place to protect regulation of essential processes from random mutations. Widespread efforts have been made to uncover local subnetworks or modules which could define a disease system or a biological process. Early efforts integrated gene expression data with PPI networks to uncover relevant subnetworks which can significantly explain the changes in gene expression.Citation67 Similarly, recent methods innovatively used a class of network propagation algorithms capable of diffusing information from input data across connected nodes in a given network. These approaches can leverage node-specific priors/signals and the local topology of the interactions among the encoded proteins to identify high-scoring subnetworks.Citation68 Network approaches have also been used in conjunction with gene expression data-identified molecular networks underlying rejection in pediatric liver transplant.Citation69 Similar modules have been discovered in different autoimmune diseases using a trait-based network propagation approach where six gene modules were found to be enriched with at least two groups of traits associated with inflammatory bowel disease (IBD), multiple sclerosis, and systemic lupus.Citation70 A concerted effort has been made to generate PPI networks between host and different viruses (Influenza, HIV, SARS-CoV-2, etc.), in order to characterize host factors essential in the successful transmission of the virus, to discover a new range of therapeutics targeted at disrupting essential processes in the viral life cycle and to better understand host-side dynamics resulting in a weak or robust response.Citation71–73

Gene regulatory networks (GRNs) provide insights into biological circuitry involved in disease or development/differentiation processes (). The relationships in this network are directed and indicative of how different regulatory elements control the expression of genes at different timepoints in response to different stimuli. The inference of such biological networks is extremely important in the context of immune cells that differentially respond to different stimuli by undergoing extensive epigenetic and transcriptomic remodeling.Citation74 To understand and model these processes, characterizing the signaling networks driving them is important. Unlike PPI networks, these networks are local/context-specific and inferred from epigenetic and/or transcriptomic information of particular cell populations. Effective data integration has been employed to infer these networks. Methods like SCENIC use single-cell RNA sequencing data to discover co-expressed gene and transcription factor (TF) pairs along with robustly integrating cis-regulatory motifs to infer a gene regulatory network.Citation75 Similarly, EpiTensor is an algorithm developed to integrate RNA-seq, histone modifications, and chromatin accessibility data to infer a three-dimensional contact map.Citation76 The concept of network propagation has also been applied in this context, where Taiji can run PageRank (network propagation) on a gene regulatory network inferred from CUT/TAG, ATAC-seq, and RNA-seq data to rank the TFs predicted to be most relevant in a given context.Citation77 GRNs have been used to study the lineages and trajectories of both innate and adaptive immune cells.Citation78,Citation79 The study of B cell GRNs has enabled our understanding of how they diversify their immunoglobulin (Ig) repertoire via somatic hypermutation and/or class switch DNA recombination before differentiating into antibody secreting plasma cells. GRNs for different T cell subsets have elucidated subset-specific roles in host defense, autoimmune, and inflammatory disorders.

As networks depend on large-scale data integration across different data modalities, recent advances in machine learning have been instrumental in driving biological insights by learning “hidden” data representations to drive inference and predictions. Although conceptually orthogonal, there are specific use-cases where the interplay of the two methods drives maximum information gain from the biological data of interest. One such application is the non-trivial integration of different biological networks (). A trivial “join” over them can dilute the information from different contexts and make the resulting network too noisy. In order to meaningfully integrate networks and move beyond a simple trivial join, methods like BIONIC, a software for biological network integration uses convolutional neural networks to learn latent neighborhood-based representations of the nodes in a network followed by a combined step in this transformed spaceCitation80 (). BIONIC integrated a PPI network, co-expression network and a genetic network to predict the chemical sensitivities for a wider set of 873 essential genes across a subset of 50 compounds. Out of the 156 essential genes experimentally identified as sensitive to the set of 50 screened compounds, BIONIC was able to successfully predict 35. Machine learning approaches have also been used to structurally resolve the human–human and the human–SARS-CoV-2 protein–protein interaction network.Citation81,Citation82 Classifiers ensembled over the different features like amino acid biophysical properties, protein structure information, and strength of coevolution between residues were able to predict with high confidence the presence of the residue at the interface of specific PPIs. The structurally resolved host-SARS-CoV-2 interactome was instrumental in elucidating the evolutionary advantage gained by SARS-CoV-2 through stronger binding to the host proteins as compared to SARS-CoV-1.

Data audits and evaluation of model performance

The scale and variability in biological data necessitates the use of machine learning approaches described above. However, the quality of input data, variability between studies and data modalities can be an obstacle to fair model evaluation. A meaningful audit of the dataset can inform the best way to infer the model performance. This is also linked to ways in which to generate a train-test split, where similar training, test and validation sets can lead to overestimation of the performance of the model with a loss of generalizability. Due to the leakage of biologically similar data, performance metrics are indicative of the model’s ability to remember specific data points in the training set over the task of making precise predictions using information learned from important features. For example, a machine learning model trained on protein sequences for a particular functional prediction, will have homologous training points with high sequence similarity. An appropriate way of splitting the training and test sets will be based upon a hard threshold of percentage of sequence similarity between the two sets of data points. But this will still not catch all the homologous pairs, as some homologous proteins have dissimilar sequences. The solution would be to combine the test data with a sensitive hidden Markov model profile comparison tool to find proteins similar to the training data and exclude them in the test set.

Often, experiments are performed to test certain effects in very specific biological populations. This can lead to a skew in the data, where a group is overrepresented in the data (i.e., class imbalance). This can bias models to the majority class and inhibit the learning of the distinguishing features between the groups. Approaches for up- and down-sampling the minority and the majority classes, respectively, are often employed to overcome class biases in datasets. In deep learning models, weighing the loss function for higher penalties to misclassify the underrepresented class is also used. Although predetermined baselines for evaluation metrics, help understand the performance of the model, this may fluctuate depending on the inherent structure of the dataset. Therefore, to understand if a model is making accurate and meaningful predictions, we can permute the dependent variable while keeping the covariance structure of the feature matrix intact. The resulting evaluation metrics in a cross-validation system can help set a baseline against which the model performance can be evaluated. The mentioned strategies are crucial for avoiding overfitting, where a model is completely dependent on the structure of the training dataset. There is a need to be stringent in uncoupling the process from confounding variables.

Clinical and translational applications

One of the ultimate goals of multi-omic integration is translational and clinical insights at the level of individual patients. Current work focuses on uncovering disease-level cellular states through the joint inference of single-cell modalities, where individual variation is largely masked in the aggregated statistics for the experimental group in question. A bottom-up approach from cells to patients may be required for tasks such as predicting the strength and quality of vaccine response based on prior infections. Multi-omic integration enables distinguishing functional subtypes within disease states. These disease subtypes, or endotypes, can improve scientific understanding and aid in clinical decision-making, but present statistical challenges: in particular, model overfitting may not be apparent without appropriate sample size and validation in independent cohorts. Virtually all multi-omics in the next phase of precision medicine will require the integration of biological priors and orthogonal data types to maintain statistical integrity. In general, approaches which can extract priors from publicly available data and synthesize these with single-cell data will vastly improve inferential power – this idea has already been applied to multi-omic integration with “bridge” data sets. Prior knowledge could take the form of pharmacogenetic data that is used to contextualize scRNAseq during clinical trials, or otherwise inform the drug development pipeline by accounting for individual genetic and molecular factors.

Discussion

Multi-omic technologies have been used in a range of contexts for immunoprofiling including cell spatial partitioning and deconvolution, cell state discovery and characterization, humoral profiling and beyond. A range of computational approaches have been developed to analyze these datasets (). However, there is still ample opportunity for advancements both on the technological and computational fronts. Among these emerging frontiers, spatial methods have garnered considerable attention for their potential in unraveling intricate spatial organization within tissues. Despite efforts to enhance resolution and accuracy for cell-type deconvolution for spatial transcriptomic datasets, a bona fide single-cell spatial transcriptome method remains elusive. Furthermore, while Slide-seq provides near cellular resolution measurements, it requires microscopic imagingand sequencing to be done on different tissues.Citation18 In contrast, 10X Visium allows imaging and sequencing on the same tissue yet with much lower resolution. Also, the majority of computational methods discussed here exploit the sample space, as opposed to the feature space, to find common subspaces between modalities. Methods that better leverage the feature space embedding may deliver comparable performance while potentially offering more interpretable latent variables. Further, while numerous deep generative integration methods might demonstrate higher accuracy in terms of cell–cell mapping and/or across-modality data prediction, most neural network architectures lack biological interpretability. Factor analysis methods may suffer from this as well, and the relationships within and between latent factors are often unclear and sometimes challenging to map back to biological pathways and functions. Methods that transcend prediction and offer inference could be pursued with various downstream bench experiments such as gene knockouts and/or CRISPR editing, thereby expanding the range and depth of potential immunological insights.

Biological networks are powerful and often underutilized priors for multi-omic integration. Different biological networks provide a reliable framework for delineating additive or epistatic effects of the various components in a system. This limits the small effects of individual components, which are picked up by traditional univariate approaches, whose disruption may not even lead to a presentation of the target phenotype. The integration of protein interaction networks with functional genomic datasets can lead to more interpretable and translational signatures ().Citation69 The study of the perturbation of molecular networks by missense mutationsCitation29 and population variantsCitation83 illustrates how protein interaction-specific disruptions drive disease pathogenesis – which can be immunomodulatory – exist within a population. These properties of biological networks make them hypothesis-generating resources .

However, biological networks rely heavily on data generated from high throughput experiments, and in certain contexts show biases toward certain genes/proteins. They suffer from incompleteness and networks inferred from transcriptomic and epigenomic data can have high levels of noisy relationships. Meaningfully integrating different kinds of biological networks can address some of these limitations. Network integration approaches () are an important step in this direction. These methods such as BIONICCitation80 and SIMBACitation84 typically use approaches such as graph convolutional networks or cellular knowledge graphs to integrate multiple data modalities co-expression, co-accessibility, protein–protein interaction networks, and gene regulatory networks in a reduced dimensional space (typically an embedding). The next generation of methods will likely improve not only the inference of networks from multi-modal biological datasets but also refine the way we integrate different biological networks and knowledge graphs along with diverse biological readouts (mRNA expression, protein abundance, epigenetic information, etc). Integration in a reduced dimensional space takes place using complex neural network architectures that are analogous to “black boxes.” The next generation of network integration methods may focus on making these methods interpretable. Such approaches will provide insights into biologically important variables which are dominantly encoded as embeddings while the noise is reduced/removed in the process. Another critical component will be the incorporation of temporal profiling to capture dynamic processes. While current multi-omic profiling is deep, it is often static and unable to fully capture the dynamics of complex biological processes.

Overall, given the explosion of multi-omic data modalities for immunoprofiling, it is critical to comprehensively analyze and integrate these datasets using appropriate computational approaches including machine learning and network techniques. These hold the key in moving beyond the characterization of correlates/biomarkers to deeper insights into underlying biological mechanisms.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was partly supported by NIAID DP2 DP2AI164325, NIAID R01AI170108 and NHGRI U01HG012041 to JD.

References

  • Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, Ding J, Brack A, Kartha VK, Tay T, et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell. 2020;183(4):1103–16.e20. doi:10.1016/j.cell.2020.09.056.
  • Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, Daza RM, McFaline-Figueroa JL, Packer JS, Christiansen L, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361(6409):1380–12. doi:10.1126/science.aau0730.
  • Arunachalam PS, Wimmers F, Mok CKP, Perera RAPM, Scott M, Hagan T, Sigal N, Feng Y, Bristow L, Tak-Yin Tsang O, et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science. 2020;369(6508):1210–20. doi:10.1126/science.abc6261.
  • Kotliarov Y, Sparks R, Martins AJ, Mulè MP, Lu Y, Goswami M, Kardava L, Banchereau R, Pascual V, Biancotto A, et al. Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus. Nat Med. 2020;26(4):618–29. doi:10.1038/s41591-020-0769-8.
  • Arunachalam PS, Charles TP, Joag V, Bollimpelli VS, Scott MKD, Wimmers F, Burton SL, Labranche CC, Petitdemange C, Gangadhara S, et al. T cell-inducing vaccine durably prevents mucosal SHIV infection even with lower neutralizing antibody titers. Nat Med. 2020;26(6):932–40. doi:10.1038/s41591-020-0858-8.
  • Su Y, Chen D, Yuan D, Lausted C, Choi J, Dai CL, Voillet V, Duvvuri VR, Scherler K, Troisch P, et al. Multi-omics resolves a sharp disease-state shift between mild and moderate COVID-19. Cell. 2020;183(6):1479–95.e20. doi:10.1016/j.cell.2020.10.037.
  • Wu SZ, Al-Eryani G, Roden DL, Junankar S, Harvey K, Andersson A, Thennavan A, Wang C, Torpy JR, Bartonicek N, et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet. 2021;53(9):1334–47. doi:10.1038/s41588-021-00911-1.
  • Bassez A, Vos H, Van Dyck L, Floris G, Arijs I, Desmedt C, Boeckx B, Vanden Bempt M, Nevelsteen I, Lambein K, et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat Med. 2021;27(5):820–32. doi:10.1038/s41591-021-01323-8.
  • Sacco K, Castagnoli R, Vakkilainen S, Liu C, Delmonte OM, Oguz C, Kaplan IM, Alehashemi S, Burbelo PD, Bhuyan F, et al. Immunopathological signatures in multisystem inflammatory syndrome in children and pediatric COVID-19. Nat Med. 2022;28(5):1050–62. doi:10.1038/s41591-022-01724-3.
  • Liu C, Martins AJ, Lau WW, Rachmaninoff N, Chen J, Imberti L, Mostaghimi D, Fink DL, Burbelo PD, Dobbs K, et al. Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell. 2021;184(7):1836–57.e22. doi:10.1016/j.cell.2021.02.018.
  • Fourati S, Tomalin LE, Mulè MP, Chawla DG, Gerritsen B, Rychkov D, Henrich E, Miller HER, Hagan T, Diray-Arce J, et al. Pan-vaccine analysis reveals innate immune endotypes predictive of antibody responses to vaccination. Nat Immunol. 2022;23(12):1777–87. doi:10.1038/s41590-022-01329-5.
  • Hagan T, Gerritsen B, Tomalin LE, Fourati S, Mulè MP, Chawla DG, Rychkov D, Henrich E, Miller HER, Diray-Arce J, et al. Transcriptional atlas of the human immune response to 13 vaccines reveals a common predictor of vaccine-induced antibody responses. Nat Immunol. 2022;23(12):1788–98. doi:10.1038/s41590-022-01328-6.
  • Ma J, Bai H, Gong T, Mao W, Nie Y, Zhang X, Da Y, Wang X, Qin H, Zeng Q, et al. Novel skewed usage of B-cell receptors in COVID-19 patients with various clinical presentations. Immunol Lett. 2022;249:23–32. doi:10.1016/j.imlet.2022.08.006.
  • Cai X, Li JJ, Liu T, Brian O, Li J. Infectious disease mRNA vaccines and a review on epitope prediction for vaccine design. Brief Funct Genomics. 2021;20(5):289–303. doi:10.1093/bfgp/elab027.
  • Lu L, Ma W, Johnson CH, Khan SA, Irwin ML, Pusztai L. In silico designed mRNA vaccines targeting CA-125 neoantigen in breast and ovarian cancer. Vaccine. 2023;41(12):2073–83. doi:10.1016/j.vaccine.2023.02.048.
  • Mimitou EP, Cheng A, Montalbano A, Hao S, Stoeckius M, Legut M, Roush T, Herrera A, Papalexi E, Ouyang Z, et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat Methods. 2019;16(5):409–12. doi:10.1038/s41592-019-0392-0.
  • Collora JA, Liu R, Pinto-Santini D, Ravindra N, Ganoza C, Lama JR, Alfaro R, Chiarella J, Spudich S, Mounzer K, et al. Single-cell multiomics reveals persistence of HIV-1 in expanded cytotoxic T cell clones. Immunity. 2022;55(6):1013–31.e7. doi:10.1016/j.immuni.2022.03.004.
  • Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, Welch J, Chen LM, Chen F, Macosko EZ. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363:1463–7. doi:10.1126/science.aaw1219.
  • Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348(6233):aaa6090. doi:10.1126/science.aaa6090.
  • Black S, Phillips D, Hickey JW, Kennedy-Darling J, Venkataraaman VG, Samusik N, Goltsev Y, Schürch CM, Nolan GP. CODEX multiplexed tissue imaging with DNA-conjugated antibodies. Nat Protoc. 2021;16(8):3802–35. doi:10.1038/s41596-021-00556-8.
  • Liu S, Iorgulescu JB, Li S, Borji M, Barrera-Lopez IA, Shanmugam V, Lyu H, Morriss JW, Garcia ZN, Murray E, et al. Spatial maps of T cell receptors and transcriptomes reveal distinct immune niches and interactions in the adaptive immune response. Immunity. 2022;55(10):1940–52.e5. doi:10.1016/j.immuni.2022.09.002.
  • Sudmeier LJ, Hoang KB, Nduom EK, Wieland A, Neill SG, Schniederjan MJ, Ramalingam SS, Olson JJ, Ahmed R, Hudson WH. Distinct phenotypic states and spatial distribution of CD8+ T cell clonotypes in human brain metastases. Cell Rep Med. 2022;3:100620. doi:10.1016/j.xcrm.2022.100620.
  • Nomura F, Tsuchida S, Murata S, Satoh M, Matsushita K. Mass spectrometry-based microbiological testing for blood stream infection. Clin Proteomics. 2020;17(1):14. doi:10.1186/s12014-020-09278-7.
  • Seng P, Drancourt M, Gouriet F, La Scola B, Fournier P-E, Rolain JM, Raoult D. Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Clin Infect Dis. 2009;49(4):543–51. doi:10.1086/600885.
  • Liang B, Zhu Y, Shi W, Ni C, Tan B, Tang S. SARS-CoV-2 spike protein post-translational modification landscape and its impact on protein structure and function via computational prediction. Research. 2023;6:0078. doi:10.34133/research.0078.
  • Noble WS, MacCoss MJ, Bourne PE. Computational and statistical analysis of protein mass spectrometry data. PLoS Comput Biol. 2012;8(1):e1002296. doi:10.1371/journal.pcbi.1002296.
  • Kim CH, Tworoger SS, Stampfer MJ, Dillon ST, Gu X, Sawyer SJ, Chan AT, Libermann TA, Eliassen AH. Stability and reproducibility of proteomic profiles measured with an aptamer-based platform. Sci Rep. 2018;8(1):8382. doi:10.1038/s41598-018-26640-w.
  • Brückner A, Polge C, Lentze N, Auerbach D, Schlattner U. Yeast two-hybrid, a powerful tool for systems biology. Int J Mol Sci. 2009;10(6):2763–88. doi:10.3390/ijms10062763.
  • Sahni N, Yi S, Taipale M, Fuxman Bass JI, Coulombe-Huntington J, Yang F, Peng J, Weile J, Karras GI, Wang Y, et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell. 2015;161(3):647–60. doi:10.1016/j.cell.2015.04.013.
  • Ratnasiri K, Wilk AJ, Lee MJ, Khatri P, Blish CA. Single-cell RNA-seq methods to interrogate virus-host interactions. Semin Immunopathol. 2023;45(1):71–89. doi:10.1007/s00281-022-00972-2.
  • Udani S, Langerman J, Koo D, Baghdasarian S, Cheng B, Kang S, Soemardy C, de Rutte J, Plath K, Di Carlo D. Secretion encoded single-cell sequencing (SEC-seq) uncovers gene expression signatures associated with high VEGF-A secretion in mesenchymal stromal cells. bioRxiv. 2023. doi:10.1101/2023.01.07.523110.
  • Li S, Sullivan NL, Rouphael N, Yu T, Banton S, Maddur MS, McCausland M, Chiu C, Canniff J, Dubey S, et al. Metabolic phenotypes of response to vaccination in humans. Cell. 2017;169(5):862–77.e17. doi:10.1016/j.cell.2017.04.026.
  • Li C, Lee A, Grigoryan L, Arunachalam PS, Scott MKD, Trisal M, Wimmers F, Sanyal M, Weidenbacher PA, Feng Y, et al. Mechanisms of innate and adaptive immunity to the Pfizer-BioNTech BNT162b2 vaccine. Nat Immunol. 2022;23(4):543–55. doi:10.1038/s41590-022-01163-9.
  • Röltgen K, Nielsen SCA, Silva O, Younes SF, Zaslavsky M, Costales C, Yang F, Wirz OF, Solis D, Hoh RA, et al. Immune imprinting, breadth of variant recognition, and germinal center response in human SARS-CoV-2 infection and vaccination. Cell. 2022;185(6):1025–40.e14. doi:10.1016/j.cell.2022.01.018.
  • Arunachalam PS, Feng Y, Ashraf U, Hu M, Walls AC, Edara VV, Zarnitsyna VI, Aye PP, Golden N, Miranda MC, et al. Durable protection against the SARS-CoV-2 Omicron variant is induced by an adjuvanted subunit vaccine. Sci Transl Med. 2022;14:eabq4130. doi:10.1126/scitranslmed.abq4130.
  • Arunachalam PS, Scott MKD, Hagan T, Li C, Feng Y, Wimmers F, Grigoryan L, Trisal M, Edara VV, Lai L, et al. Systems vaccinology of the BNT162b2 mRNA vaccine in humans. Nature. 2021;596:410–6. doi:10.1038/s41586-021-03791-x.
  • Zhang S, Cooper-Knock J, Weimer AK, Shi M, Kozhaya L, Unutmaz D, Harvey C, Julian TH, Furini S, Frullanti E, et al. Multiomic analysis reveals cell-type-specific molecular determinants of COVID-19 severity. Cell Syst. 2022;13:598–614.e6. doi:10.1016/j.cels.2022.05.007.
  • Su Y, Yuan D, Chen DG, Ng RH, Wang K, Choi J, Li S, Hong S, Zhang R, Xie J, et al. Multiple early factors anticipate post-acute COVID-19 sequelae. Cell. 2022;185(5):881–95.e20. doi:10.1016/j.cell.2022.01.014.
  • Zhang A, Stacey HD, D’Agostino MR, Tugg Y, Marzok A, Miller MS. Beyond neutralization: Fc-dependent antibody effector functions in SARS-CoV-2 infection. Nat Rev Immunol. 2023;23:381–96. doi:10.1038/s41577-022-00813-1.
  • Li Q, Liu Q, Huang W, Li X, Wang Y. Current status on the development of pseudoviruses for enveloped viruses. Rev Med Virol [Internet]. 2018;28. doi:10.1002/rmv.1963.
  • DiLillo DJ, Tan GS, Palese P, Ravetch JV. Broadly neutralizing hemagglutinin stalk-specific antibodies require FcγR interactions for protection against influenza virus in vivo. Nat Med. 2014;20:143–51. doi:10.1038/nm.3443.
  • DiLillo DJ, Palese P, Wilson PC, Ravetch JV. Broadly neutralizing anti-influenza antibodies require Fc receptor engagement for in vivo protection. J Clin Invest. 2016;126(2):605–10. doi:10.1172/JCI84428.
  • Gunn BM, Yu W-H, Karim MM, Brannan JM, Herbert AS, Wec AZ, Halfmann PJ, Fusco ML, Schendel SL, Gangavarapu K, et al. A role for Fc function in therapeutic monoclonal antibody-mediated protection against Ebola virus. Cell Host Microbe. 2018;24(2):221–33.e5. doi:10.1016/j.chom.2018.07.009.
  • Schäfer A, Muecksch F, Lorenzi JCC, Leist SR, Cipolla M, Bournazos S, Schmidt F, Maison RM, Gazumyan A, Martinez DR, et al. Antibody potency, effector function, and combinations in protection and therapy for SARS-CoV-2 infection in vivo. J Exp Med [Internet]. 2021;218. doi:10.1084/jem.20201993.
  • Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell. 2019;177(7):1873–87.e17. doi:10.1016/j.cell.2019.05.006.
  • Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888–902.e21. doi:10.1016/j.cell.2019.05.031. PMID: 31178118.
  • Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, Stegle O. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21(1):111. doi:10.1186/s13059-020-02015-1.
  • Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–87.e29. doi:10.1016/j.cell.2021.04.048.
  • Wu KE, Yost KE, Chang HY, Zou J. BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc Natl Acad Sci USA [Internet]. 2021;118. doi:10.1073/pnas.2023070118.
  • Hao Y, Stuart T, Kowalski MH, Choudhary S, Hoffman P, Hartman A, Srivastava A, Molla G, Madad S, Fernandez-Granda C, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2023. doi:10.1038/s41587-023-01767-y.
  • Gong B, Zhou Y, Purdom E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biol. 2021;22(1):351. doi:10.1186/s13059-021-02556-z.
  • Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, Elmentaite R, Lomakin A, Kedlian V, Gayoso A, et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 2022;40(5):661–71. doi:10.1038/s41587-021-01139-4.
  • Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, Irizarry RA. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol. 2022;40(4):517–26. doi:10.1038/s41587-021-00830-w.
  • Brbić M, Cao K, Hickey JW, Tan Y, Snyder MP, Nolan GP, Leskovec J. Annotation of spatially resolved single-cell data with STELLAR. Nat Methods. 2022;19(11):1411–18. doi:10.1038/s41592-022-01651-8.
  • Ackerman ME, Das J, Pittala S, Broge T, Linde C, Suscovich TJ, Brown EP, Bradley T, Natarajan H, Lin S, et al. Route of immunization defines multiple mechanisms of vaccine-mediated protection against SIV. Nat Med. 2018;24(10):1590–8. doi:10.1038/s41591-018-0161-0.
  • Das J, Devadhasan A, Linde C, Broge T, Sassic J, Mangano M, O’Keefe S, Suscovich T, Streeck H, Irrinki A, et al. Mining for humoral correlates of HIV control and latent reservoir size. PLoS Pathog. 2020;16(10):e1008868. doi:10.1371/journal.ppat.1008868.
  • Suscovich TJ, Fallon JK, Das J, Demas AR, Crain J, Linde CH, Michell A, Natarajan H, Arevalo C, Broge T, et al. Mapping functional humoral correlates of protection against malaria challenge following RTS,S/AS01 vaccination. Sci Transl Med. 2020;12. doi:10.1126/scitranslmed.abb4757.
  • Das J, Fallon JK, Yu TC, Michell A, Suscovich TJ, Linde C, Natarajan H, Weiner J, Coccia M, Gregory S, et al. Delayed fractional dosing with RTS,S/AS01 improves humoral immunity to malaria via a balance of polyfunctional NANP6- and Pf16-specific antibodies. Med. 2021;2(11):1269–86.e9. doi:10.1016/j.medj.2021.10.003.
  • Lu LL, Chung AW, Rosebrock TR, Ghebremichael M, Yu WH, Grace PS, Schoen MK, Tafesse F, Martin C, Leung V, et al. A functional role for antibodies in tuberculosis. Cell. 2016;167(2):433–43.e14. doi:10.1016/j.cell.2016.08.072.
  • Lu LL, Das J, Grace PS, Fortune SM, Restrepo BI, Alter G. Antibody Fc glycosylation discriminates between latent and active tuberculosis. J Infect Dis. 2020;222(12):2093–102. doi:10.1093/infdis/jiz643.
  • Jennewein MF, Goldfarb I, Dolatshahi S, Cosgrove C, Noelette FJ, Krykbaeva M, Das J, Sarkar A, Gorman MJ, Fischinger S, et al. Fc glycan-mediated regulation of placental antibody transfer. Cell. 2019;178(1):202–15.e14. doi:10.1016/j.cell.2019.05.044.
  • Kaplonek P, Cizmeci D, Kwatra G, Izu A, Lee J-L, Bertera HL, Fischinger S, Mann C, Amanat F, Wang W, et al. ChAdOx1 nCoV-19 (AZD1222) vaccine-induced Fc receptor binding tracks with differential susceptibility to COVID-19. Nat Immunol. 2023;24:1161–72. doi:10.1038/s41590-023-01513-1.
  • Bing X, Lovelace T, Bunea F, Wegkamp M, Kasturi SP, Singh H, Benos PV, Das J. Essential regression: a generalizable framework for inferring causal latent factors from multi-omic datasets. Patterns (NY). 2022;3:100473. doi:10.1016/j.patter.2022.100473.
  • Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437:1173–8. doi:10.1038/nature04209.
  • Yook S-H, Oltvai ZN, Barabási A-L. Functional and topological characterization of protein interaction networks. Proteomics. 2004;4(4):928–42. doi:10.1002/pmic.200300636.
  • Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411(6833):41–2. doi:10.1038/35075138.
  • Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18(Suppl 1):S233–40. doi:10.1093/bioinformatics/18.suppl_1.S233.
  • Leiserson MDM, Vandin F, Wu H-T, Dobson JR, Eldridge JV, Thomas JL, Papoutsaki A, Kim Y, Niu B, McLellan M, et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2015;47(2):106–14. doi:10.1038/ng.3168.
  • Ningappa M, Rahman SA, Higgs BW, Ashokkumar CS, Sahni N, Sindhi R, Das J. A network-based approach to identify expression modules underlying rejection in pediatric liver transplantation. Cell Rep Med. 2022;3:100605. doi:10.1016/j.xcrm.2022.100605.
  • Barrio-Hernandez I, Schwartzentruber J, Shrivastava A, Del-Toro N, Gonzalez A, Zhang Q, Mountjoy E, Suveges D, Ochoa D, Ghoussaini M, et al. Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat Genet. 2023;55(3):389–98. doi:10.1038/s41588-023-01327-9.
  • Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, O’Meara MJ, Rezelj VV, Guo JZ, Swaney DL, et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020;583:459–68. doi:10.1038/s41586-020-2286-9.
  • Shah PS, Link N, Jang GM, Sharp PP, Zhu T, Swaney DL, Johnson JR, Von Dollen J, Ramage HR, Satkamp L, et al. Comparative flavivirus-host protein interaction mapping reveals mechanisms of dengue and zika virus pathogenesis. Cell. 2018;175(7):1931–45.e18. doi:10.1016/j.cell.2018.11.028.
  • Hiatt J, Hultquist JF, McGregor MJ, Bouhaddou M, Leenay RT, Simons LM, Young JM, Haas P, Roth TL, Tobin V, et al. A functional map of HIV-host interactions in primary human T cells. Nat Commun. 2022;13(1):1752. doi:10.1038/s41467-022-29346-w.
  • Placek K, Schultze JL, Aschenbrenner AC. Epigenetic reprogramming of immune cells in injury, repair, and resolution. J Clin Invest. 2019;129(8):2994–3005. doi:10.1172/JCI124619.
  • Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine J-C, Geurts P, Aerts J, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. 2017;14(11):1083–6. doi:10.1038/nmeth.4463.
  • Zhu Y, Chen Z, Zhang K, Wang M, Medovoy D, Whitaker JW, Ding B, Li N, Zheng L, Wang W. Constructing 3D interaction maps from 1D epigenomes. Nat Commun. 2016;7(1):10812. doi:10.1038/ncomms10812.
  • Zhang K, Wang M, Zhao Y, Wang W. Taiji: system-level identification of key transcription factors reveals transcriptional waves in mouse embryonic development. Sci Adv. 2019;5:eaav3262. doi:10.1126/sciadv.aav3262.
  • Sciammas R, Li Y, Warmflash A, Song Y, Dinner AR, Singh H. An incoherent regulatory network architecture that orchestrates B cell diversification in response to antigen signaling. Mol Syst Biol. 2011;7(1):495. doi:10.1038/msb.2011.25.
  • Ciofani M, Madar A, Galan C, Sellars M, Mace K, Pauli F, Agarwal A, Huang W, Parkhurst CN, Muratet M, et al. A validated regulatory network for Th17 cell specification. Cell. 2012;151:289–303. doi:10.1016/j.cell.2012.09.016.
  • Forster DT, Li SC, Yashiroda Y, Yoshimura M, Li Z, Isuhuaylas LAV, Itto-Nakama K, Yamanaka D, Ohya Y, Osada H, et al. BIONIC: biological network integration using convolutions. Nat Methods. 2022;19(10):1250–61. doi:10.1038/s41592-022-01616-x.
  • Meyer MJ, Beltrán JF, Liang S, Fragoza R, Rumack A, Liang J, Wei X, Yu H. Interactome INSIDER: a structural interactome browser for genomic studies. Nat Methods. 2018;15(2):107–14. doi:10.1038/nmeth.4540.
  • Wierbowski SD, Liang S, Liu Y, Chen Y, Gupta S, Andre NM, Lipkin SM, Whittaker GR, Yu H. A 3D structural SARS-CoV-2–human interactome to explore genetic and drug perturbations. Nat Methods. 2021;18(12):1477–88. doi:10.1038/s41592-021-01318-w.
  • Fragoza R, Das J, Wierbowski SD, Liang J, Tran TN, Liang S, Beltran JF, Rivera-Erick CA, Ye K, Wang T-Y, et al. Extensive disruption of protein interactions by genetic variants across the allele frequency spectrum in human populations. Nat Commun. 2019;10(1):4141. doi:10.1038/s41467-019-11959-3.
  • Chen H, Ryu J, Vinyard ME, Lerer A, Pinello L. SIMBA: single-cell embedding along with features. Nat Methods [Internet]. 2023. doi:10.1038/s41592-023-01899-8.