2,083
Views
1
CrossRef citations to date
0
Altmetric
Technology

Unraveling the copper-death connection: Decoding COVID-19‘s immune landscape through advanced bioinformatics and machine learning approaches

, , , &
Article: 2310359 | Received 01 Sep 2023, Accepted 23 Jan 2024, Published online: 11 Mar 2024

ABSTRACT

This study aims to analyze Coronavirus Disease 2019 (COVID-19)-associated copper-death genes using the Gene Expression Omnibus (GEO) dataset and machine learning, exploring their immune microenvironment correlation and underlying mechanisms. Utilizing GEO, we analyzed the GSE217948 dataset with control samples. Differential expression analysis identified 16 differentially expressed copper-death genes, and Cell type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT) quantified immune cell infiltration. Gene classification yielded two copper-death clusters, with Weighted Gene Co-expression Network Analysis (WGCNA) identifying key module genes. Machine learning models (random forest, Support Vector Machine (SVM), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost)) selected 6 feature genes validated by the GSE213313 dataset. Ferredoxin 1 (FDX1) emerged as the top gene, corroborated by Area Under the Curve (AUC) analysis. Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) revealed enriched pathways in T cell receptor, natural killer cytotoxicity, and Peroxisome Proliferator-Activated Receptor (PPAR). We uncovered differentially expressed copper-death genes and immune infiltration differences, notably CD8 T cells and M0 macrophages. Clustering identified modules with potential implications for COVID-19. Machine learning models effectively predicted COVID-19 risk, with FDX1‘s pivotal role validated. FDX1‘s high expression was associated with immune pathways, suggesting its role in COVID-19 pathogenesis. This comprehensive approach elucidated COVID-19-related copper-death genes, their immune context, and risk prediction potential. FDX1‘s connection to immune pathways offers insights into COVID-19 mechanisms and therapy.

Introduction

With the rapid spread of Coronavirus Disease 2019 (COVID-19, novel coronavirus disease) worldwide, it has become essential to conduct in-depth research on its pathogenesis and related potential risk factors.Citation1 COVID-19 is not only a respiratory disease but also involves multiple biological processes and metabolic pathways, with particular attention to the metabolism related to metal ions. Copper is one of the essential trace elements in the human body, participating in various biological processes, including redox reactions, iron transport, and the formation of connective tissues and myelin sheaths in nerves.Citation2 However, excessive or insufficient copper could harm the human body and be associated with various diseases such as Wilson’s disease and cirrhosis.Citation3–5

In recent years, an increasing number of studies have begun to focus on the relationship between trace elements and immune response. The immune microenvironment is composed of various immune cells, cytokines, and chemical messengers, and it plays a critical role in infection, inflammation, and cancer.Citation6–8 For example, specific cytokines could regulate copper metabolism, while copper ions could influence the activity of leukocytes.Citation9 It suggests the potential interaction between copper and the immune microenvironment.Citation2 And for COVID-19, once the virus invades the human body, it triggers a series of complex immune responses in which copper may play a crucial regulatory role.Citation10

Gene expression technology has been widely employed in previous studies to research and diagnose diseases, including using high-throughput sequencing datasets to explore genes associated with specific diseases.Citation11 The (Gene Expression Omnibus) GEO database provides a wealth of gene expression data, allowing researchers better to understand various biological processes.Citation12 For example, by analyzing these data, researchers could identify the characteristic genes related to specific diseases, their relationships with each other, and their roles in developing the diseases. By utilizing this data and combining it with advanced machine learning techniques, more precise guidance could be provided for diagnosing, predicting, and treating diseases.

In this study, we investigate the relationship between copper death-related genes and the immune microenvironment of COVID-19. We employed bioinformatics methods based on GEO datasets, combined with machine learning techniques, to identify copper death-associated genes related to COVID-19 and the potential interactions between these genes and the immune microenvironment. In addition, we have also assessed the risk of copper mortality to explore its potential role in the development of COVID-19. We hope this study can provide new strategies for diagnosing and treating COVID-19 and contribute to revealing the critical role of copper in human health and disease.

Materials and methods

Accessing transcriptome sequencing data from public databases

I retrieved COVID-19-related high-throughput sequencing datasets from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The selected mRNA expression profiling analysis datasets are GSE217948 and GSE213313, which are from the human platforms GPL24676 and GPL21185, respectively. GSE217948 contains 71 standard control samples and 396 COVID-19 human patient whole blood tissue samples, while GSE213313 contains 11 standard control samples and 83 COVID-19 human patient whole blood tissue samples. As these data were obtained from public databases, no ethical approval or informed consent is required.

Differential analysis of genes related to copper-induced cell death

The latest literature shows 19 CRGs are available.Citation13 Based on the GEO dataset, using a threshold of p < .05, the R programming language’s built-in function Wilcox. The test is utilized to filter out differentially expressed genes. Then use the pheatmap package and ggplot2 package in R to plot the heatmap and boxplot. Our research’s data analysis was conducted under R version 4.2.1 (R Foundation for Statistical Computing).

Immunoinfiltration analysis

We use Cell type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT) (https://cibersort.stanford.edu/), a deconvolution algorithm that describes the cellular composition of complex tissues based on gene expression. We performed immune cell infiltration analysis on samples from COVID-19 patients, comparing the disease and regular control groups, using the “e1071” and “preprocessor” packages in R software, combined with the CIBERSORT algorithm. We also quantified the content of immune cells in each sample. The number of simulation calculations is set to n, and the results with p < .05 are filtered and retained. Meanwhile, we used the “corrplot” package in R software to visualize the CIBERSORT calculation results, generating bar plots and correlation plots for immune cells. Afterward, we performed differential analysis on immune cells using the “vioplot” package in the R software. We also conducted correlation tests (Spearman) on immune cells using the “ggplot2,” “ggpubr,” and “ggExtra” packages in the R software, with the criteria of retaining results with p < .05.Citation14,Citation15

Unsupervised clustering of COVID-19 patients

We used unsupervised clustering analysis (ConsensusClusterPlus R package, version 2.60).Citation16 The k-means algorithm with 1000 iterations was used to cluster 396 COVID-19 samples into different clusters. We chose the maximum number of subtypes k as 6 and evaluated the optimal number of clusters based on the cumulative distribution function (CDF) curve, consensus matrix, and consensus clustering score (>0.9).

Gene functional enrichment analysis

Enrichment analysis of differentially expressed genes in the merged dataset was performed using the clusterProfiler package in R. The enriched results of Gene Ontology (GO) terms, including biological process (BP), cellular component (CC), and molecular function (MF), were visualized using the enrichplot package. A bar plot was also generated to display the enrichment results of the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis.Citation17

Weighted gene co-expression network analysis (WGCNA) analysis

Weighted Gene Co-expression Network Analysis (WGCNA) was performed using the WGCNA package in R software to analyze the merged dataset. Firstly, filter the differentially expressed gene (DEG) expression matrix using goodSamplesGenes to remove unqualified genes and construct a scale-free co-expression network. Then, we select an appropriate soft threshold β through the pickSoftThreshold function and perform an adjacency matrix transformation. Furthermore, we calculate the Topological Overlap Matrix (TOM) and construct a hierarchical clustering dendrogram to divide similar gene expressions into different modules, where the minimum number of genes in each module is set to 30. We define a threshold as the height cutoff to merge possible similar modules. Finally, the expression profiles of each module were summarized through the module eigengene (ME), and the correlation between ME and traits was calculated. The most correlated modules were selected for further analysis.Citation18–20

Building predictive models based on multiple machine learning methods

Based on two different types of fractal samples, we constructed machine learning models using the “caret” R package (version 6.0.91), including Random Forest (RF), Support Vector Machine (SVM), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGB) models. Random Forest is an ensemble machine learning method that uses multiple independent decision trees for classification or regression predictions.Citation21 SVM could generate a hyperplane in the feature space and has the maximum margin to distinguish positive and negative examples.Citation22 GLM is an extension of multiple linear regression models, which allows for flexible evaluation of the relationship between dependent features and categorical or continuous independent features. XGBoost is a gradient-boosted tree model that carefully compares classification error and model complexity. All these machine learning models are used with default parameters and evaluated through 5-fold cross-validation. Use the “DalEX” package (version 2.4.0) to interpret these four machine learning models and visualize the predictive distributions and feature importance of these models in graphical form.

ROC curve analysis

Using the “Proc” R package to visualize the area under the ROC curve to evaluate machine learning models and determine the top five critical variables associated with COVID-19-related essential feature genes. Finally, ROC curve analysis was performed on the GSE213313 dataset to validate the diagnostic value of the diagnostic model.Citation23

Gene set enrichment analysis (GSEA) analysis

Perform GSEA using the enrichR package in R language. We first perform pre-processing and normalization on the gene expression data. We define the top and bottom 25% of samples based on the expression levels as the high expression group and low expression group, respectively. Then, we use the GSEA algorithm to calculate the enrichment scores and p-values for each gene set.Citation24

GSVA analysis

We use the enrichR package for gene set variation analysis (GSVA) in R language. First, we preprocess and standardize gene expression data. Then, based on the expression levels of the target gene, the samples will be divided into a high-expression group and a low-expression group. Calculate enrichment scores and p-values for each gene set using the GSVA algorithm.Citation25

Statistical analysis

This study employed SPSS software (version 21.0, IBM, USA) for statistical analysis. For quantitative data, we express it as the mean ± standard deviation. First, conduct tests for normality and homogeneity of variances to ensure that the data follows a normal distribution and has equal variances. For between-group comparisons, we use independent t-tests. We use one-way ANOVA or repeated measures ANOVA for comparisons between multiple groups. We also use Pearson correlation analysis to observe the correlation between indicators. When the p-value is less than 0.05, it indicates that the difference is statistically significant.

Results

Differential expression of CRGs in COVID-19 patients and their potential influence on immune system dynamics and chromosomal distribution

We investigated high-throughput sequencing data of patients with COVID-19 and control samples retrieved from the GEO database. We selected the GSE217948 dataset, which includes normal control samples, for further analysis (). We have examined 19 previously reported Cuproptosis-related gene (CRGs) in the literature.Citation13 The expression levels in different samples were compared, and genes with a significance level of p < .05 were selected as differentially expressed. There are 16 differentially expressed genes: NFE2L2, NLRP3, ATP7B, ATP7A, Ferredoxin 1 (FDX1), LIAS, LIPT1, LIPT2, DLD, PDHB, MTF1, GLS, CDKN2A, DBT, GCSH, and DLST (). NFE2L2, NLRP3, ATP7B, DLD, MTF1, and CDKN2A are upregulated in patients’ whole blood tissues, while ATP7A, FDX1, LIAS, LIPT1, LIPT2, PDHB, GLS, DBT, GCSH, and DLST are downregulated in patients’ whole blood tissues. In addition, we further investigated the correlation of these CRGs in the expression samples and found that NFE2L2, ATP7A, and FDX1 were correlated with other factors ().

Figure 1. Flowchart outlining the study design for predicting the immune microenvironment characteristics of COVID-19 using a copper death-related risk score.

Figure 1. Flowchart outlining the study design for predicting the immune microenvironment characteristics of COVID-19 using a copper death-related risk score.

Figure 2. Identification of differentially expressed CRGs in COVID-19.

(a) Heatmap showing the expression of 16 differentially expressed CRGs in GSE217948 (Control group, n = 71; Disease group, n = 396). (b) Boxplot displaying the expression of 19 CRGs between COVID-19 and non-COVID-19 control groups (Disease group vs. Control group: p < .05, p < .01, p < .001). (c) Correlation analysis of the 16 differentially expressed CRGs. Red and green represent positive and negative correlations, respectively. The size of the pie chart sector represents correlation coefficients. (d) Bar graph representing the composition of immune cells in COVID-19 and non-COVID-19 control group samples. The x-axis represents the groups, and the y-axis represents the relative abundance of immune cells. Different colors in the bar graph indicate different types of immune cells, with a color legend provided on the right side. (e) Proportional differences in the composition of immune cells between COVID-19 and non-COVID-19 control group samples. Blue represents standard samples, and red represents COVID-19 samples (Disease group vs. Control group: p < .05, ** p < .01, p < .001). (F) Circular plot showing the chromosomal locations of the 19 CRGs. (g) Correlation analysis of the 16 CRGs with infiltrating immune cells. Red and blue represent positive and negative correlations, respectively.
Figure 2. Identification of differentially expressed CRGs in COVID-19.

To investigate the potential variations in the immune system between COVID-19 patients and healthy individuals, we conducted immune cell profiling using the CIBERSORT algorithm. Our analysis aimed to uncover changes in the composition of immune cell populations. The results revealed significant differences in proportions of various immune cell types between the disease group (COVID-19 patients) and the control group (). Compared to the control group, abnormal increases in the recruitment of plasma cells, monocytes, M0 macrophages, and activated mast cells were observed in the disease group, suggesting that immune system alterations may play a significant role in the occurrence of COVID-19. Additionally, the correlation analysis between different immune cell populations and CRGs revealed a significant association between most CRGs and immune cell components. Of particular note, NFE2L2 and MTF1 exhibited the strongest positive correlation with neutrophils, while a significant negative correlation was found between these CRGs and CD8 T cells (). The findings above suggest that CRGs might serve as crucial factors influencing the molecular and immune recruitment status of COVID-19 patients.

Subsequently, we examined the positions of 19 CRGs on human chromosomes (), and the results showed that chromosome 1, chromosome 2, and chromosome 11 contained the most CRGs. Chromosome 1 contains NLRP3, DBT, and MTF1. Chromosome 2 contains LIPT1, NFE2L2, and GLS. Chromosome 11 contains DLAT, FDX1, and LIPT2. In addition, DLAT and FDX1 are the closest in the distance of the chromosome among all the CRGs, and the genes in these adjacent specific regions on these chromosomes may be associated with specific functions or biological processes.

Identification of two distinct copper-related gene expression patterns in COVID-19 samples using cluster analysis

The cumulative distribution function (CDF) curve can identify the appropriate K value by detecting a significant increase in the area under the curve. Setting K value to 2 yields the most stable number of clusters, with minimal fluctuation within the CDF curve range. The consensus index ranges from 0.2 to 0.6 (). As K changes from 2 to 6, the area under the CDF curve shows the difference between two tCDF curves (K and K-1) (). In addition, only when k = 2 the consistency score of each subtype is > 0.9 (). Combining the clustering track chart, after excluding the ineligible samples, we ultimately divided 396 COVID-19 samples into two groups, including Cluster1 (n = 87) and Cluster2 (n = 116) (). The t-distributed Stochastic Neighbor Embedding (t-SNE) analysis showed differences between the two clusters ().

Figure 3. Identification of copper-death-associated molecular clusters in COVID-19.

(a) Consensus clustering matrix for k = 2. (b) Cumulative distribution function (CDF) curve. (c) CDF δ area curve. (d) Consensus clustering scores for various cluster numbers. The x-axis represents the number of clusters, and the y-axis represents the consensus clustering scores for each cluster number. (e) Cluster tracking plot displaying different clusters for each sample, with different colors indicating cluster membership. (f) t-SNE plot visualizing the distribution of two subtypes.
Figure 3. Identification of copper-death-associated molecular clusters in COVID-19.

The above results indicate that the expression patterns related to copper mortality in COVID-19 could be divided into two distinct clustering groups, which is essential for further understanding the pathogenesis of COVID-19 and subsequent research directions.

Differential immune profiles and CRG expression between copper-death clusters in COVID-19 patients

To explore the immune features between clusters, we first comprehensively evaluated the expression differences of 16 CRGs between Cluster1 and Cluster2. Different CRGs expression patterns were observed between the two types of copper death clusters (). Immunoinfiltration analysis results indicate distinct immune microenvironments between copper-apoptotic Cluster1 and Cluster2 (). In addition, Cuprotosis Cluster1 showed high expression levels of FDX1, LIAS, LIPT1, PDHB, GLS, GCSH, and DLST, while Cuproptosis Cluster2 was characterized by enhanced expression of NFE2L2, NLRP3, ATP7B, ATP7A, DLD, and MTF1 (). Cluster1 exhibits a higher proportion of immune cell infiltrates on T cells CD8, T cells CD4 memory resting, NK cells resting, and monocytes. In contrast, Cluster2 shows relatively higher immune cell infiltration on plasma cells, immature T cells CD4, macrophages M0, and neutrophils ().

Figure 4. Identification of immune features between the two copper-death clusters.

(a) Heatmaps display the expression patterns of 16 CRGs between the two copper death clusters. (b) The relative abundance of 22 immune cells between the two copper death clusters. (c) Boxplots show the expression of 16 CRGs between the two copper death clusters (p < .05, p < .01, p < .001). (d) Boxplots show the differences in immune response between the two copper death clusters (p < .05, ** p < .01, p < .001).
Figure 4. Identification of immune features between the two copper-death clusters.

The above results indicate differences in immune characteristics between different clusters of copper deaths in COVID-19.

Identification of key gene modules in COVID-19 using WGCNA: insights into copper-induced cell death mechanisms

To identify key gene modules associated with COVID-19, we employed the WGCNA algorithm to construct co-expression networks and modules for both COVID-19 and non-COVID-19 control groups. The expression variances of each gene in GSE217948 were computed, and the top 25% genes with the highest variances were chosen for further analysis. A soft threshold of β = 10(scale-free R = 0.9) was selected as the threshold to construct the scale-free network, and co-expressed gene modules were identified (). After the hierarchical samples’ hierarchical clustering, 12 co-expression modules with different colors were obtained using the dynamic cut algorithm, and a heatmap of the topological overlap matrix (TOM) was presented (). Subsequently, we analyzed the co-expression similarity and adjacency of these modules composed of genes with clinical features (control group and COVID-19 group). Finally, we found that the green module is most strongly associated with COVID-19, including 318 genes (). The level of overlap between 318 genes in the green module and CRGs was examined, resulting in the identification of 4 overlapping genes (NFE2L2, NLRP3, GCSH, and DLST) ().

Figure 5. WGCNA co-expression analysis of COVID-19 and control group-related genes.

(a) Scale-free fitting index (left) and average connectivity (right) for different soft-thresholding powers β. The red line represents a correlation coefficient of 0.9. (b) Hierarchical clustering dendrogram of co-expression modules, with different colors representing different modules. (c) Hierarchical clustering of feature gene modules, summarizing the modules identified in the clustering analysis. (d) Heatmap showing the correlations between the 12 modules. (e) Heatmap depicting the correlations between COVID-19-related gene modules and COVID-19, with each cell containing the correlation and p-value. (f) Venn diagram showing the overlap between genes in the green module and CRGs.
Figure 5. WGCNA co-expression analysis of COVID-19 and control group-related genes.

In addition, we also utilized the WGCNA algorithm to analyze key gene modules closely associated with copper-induced cell death clusters. We select β = 11(scale-free R = 0.9) as the soft threshold to construct a scale-free network (). The results showed that 13 genes co-expression modules were identified in the immune infiltration analysis. The heatmap illustrates the TOM (Topological Overlap Matrix) of all genes associated with each module (). Through the analysis of the relationship between modules – clinical features (Cluster1 and Cluster2), we found a strong correlation between the brown module (700 genes) and the COVID-19 cluster ().

Figure 6. WGCNA co-expression analysis of Cluster1 and Cluster2-related genes.

(a) Scale-free fitting index (left) and average connectivity (right) for different soft-thresholding powers β. The red line represents a correlation coefficient of 0.9. (b) Hierarchical clustering dendrogram of co-expression modules, with different colors representing different modules. (c) Hierarchical clustering of feature gene modules, summarizing the modules identified in the clustering analysis. (d) Heatmap depicting the correlations between the 12 modules. (e) Heatmap showing the correlations between Cluster1 and Cluster2-related gene modules and COVID-19, with each cell containing the corresponding correlation and p-value.
Figure 6. WGCNA co-expression analysis of Cluster1 and Cluster2-related genes.

The above results indicate that using the WGCNA algorithm analysis, we have identified vital gene modules closely associated with copper death clusters in COVID-19, providing important clues for further investigation into the mechanism of copper death factors in COVID-19.

Comparative analysis of gene modules in COVID-19 reveals distinct functional pathways before and after partitioning

To investigate the differences between crucial module genes before and after partitioning in COVID-19, we separately performed GO functional analysis and KEGG pathway analysis on differentially expressed genes related to the green module before partitioning and the brown module after partitioning. We also analyzed the differences between Cluster 1 and Cluster 2 in the up and downregulated pathways through GSVA analysis.

The functional analysis results of GO indicated that the differentially expressed genes in the green module were mainly enriched in biological processes (BP) such as chromosome segregation, nuclear division, mitotic cell cycle phase transition, and organelle fission. Cellular components (CC) were mainly enriched in nucleosomes, DNA packaging complex, protein-DNA complex, and chromosomal region. As for molecular functions (MF) was mainly enriched in a structural constituent of chromatin and protein heterodimerization activity (). The brown module-related differential genes are mainly enriched in biological processes (BP) such as leukocyte activation involved in immune response, cell activation involved in immune response, mononuclear cell differentiation, etc. Cellular components (CC) are mainly enriched in secretory granule membranes, specific granules, and tertiary granules. Molecular functions (MF) are mainly enriched in NAD+ nucleosidase activity, immune receptor activity, etc. ().

Figure 7. Functional and pathway analysis of essential module genes.

(a) GO functional analysis of the green module’s related differentially expressed genes in biological processes (BP), cellular components (CC), and molecular functions (MF). (b) GO functional analysis of the brown module’s related differentially expressed genes in biological processes (BP), cellular components (CC), and molecular functions (MF). (c) KEGG pathway enrichment analysis of differentially expressed genes in the green module. (d) KEGG pathway enrichment analysis of differentially expressed genes in the brown module. (e) GSVA pathway analysis of Cluster1 and Cluster2. (f) Venn diagram showing the intersection between death-related genes, differentially expressed genes, green module-related genes, and brown module-related genes.
Figure 7. Functional and pathway analysis of essential module genes.

KEGG pathway analysis results showed that the differentially expressed genes in the green module were mainly enriched in Systemic lupus erythematosus, Neutrophil extracellular trap formation, Alcoholism, and other terms (), while the differentially expressed genes in the brown module were mainly enriched in Lipoic acid metabolism, Osteoclast differentiation, Leishmaniasis, TNF signaling pathway, and other terms().

The GSVA pathway analysis showed that Cluster1 and Cluster2 were mainly enriched in upregulated differential pathways such as allograft rejection, T cell receptor signaling pathway, etc. Meanwhile, the downregulated differential pathways were mainly enriched in synthesizing glycolipids, NOD-like receptor signaling pathways, and so on ().

To facilitate subsequent analysis, we took the intersection of copper-related genes, differential genes, green module-related genes, and brown module-related genes to obtain 13 characteristic genes (NFE2L2, NLRP3, ATP7A, SLC31A1, FDX1, LIAS, DLD, DLAT, PDHA1, PDHB, MTF1, GLS, DLST) involved in COVID-19 ().

The above results indicate differences in the functions and pathways of essential module genes before and after partitioning in COVID-19. Differentially expressed genes related to the pre-fracture green module are mainly enriched in biological processes such as chromosome segregation, nuclear division, mitotic cell cycle phase transition, and organelle division. After subtyping, the differentially expressed genes related to the brown module are mainly enriched in processes involving immune response, leukocyte activation, cell activation, and monocyte differentiation.

Machine learning models identify copper-death-related feature genes with high diagnostic value for COVID-19

To further identify subtype-specific genes with high diagnostic value, we established four validated machine learning models based on the expression profiles of DEGs (differentially expressed genes) in GSE217948: Random Forest (RF) model, Support Vector Machine (SVM) model, Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGB) model. Use the “DALEX” package to interpret four models and plot the residual distributions of each model in the test set. First, each model’s top 10 essential feature variables are ranked based on the root mean square error (RMSE) (). SVM, GLM, and XGB machine learning models exhibited relatively low residuals (). Subsequently, we validated the model results based on five-fold cross-validation and evaluated the discriminative performance of four machine learning algorithms on the test set using the ROC curve. The Area Under the Curve (AUC) of the four machine learning models is comparable (GLM, AUC = 0.925; SVM, AUC = 0.987; RF, AUC = 0.977; XGB, AUC = 0.981; ). Therefore, we chose the intersection of each model’s top 10 essential feature variables as the feature genes for further analysis ().

Figure 8. Construction and evaluation of RF, SVM, GLM, and XGB machine learning models.

(a) The features of RF, SVM, GLM, and XGB machine learning models are essential. (b) Cumulative residual distribution for each machine learning model. (c) Boxplot displaying the residuals for each machine learning model, with red dots indicating root mean square errors (RMSE). (d) ROC analysis of the four machine learning models based on the testing set. (e) Venn diagram showing the intersection of essential feature variables for the four machine learning models. (F) ROC analysis of the six feature genes based on the testing set. (g) ROC analysis of the machine learning models based on the external testing set.
Figure 8. Construction and evaluation of RF, SVM, GLM, and XGB machine learning models.

To further evaluate the predictive performance of the four machine learning models, we conducted cross-validation using five-fold cross-validation on the feature genes. Among the six candidate genes, FDX1 showed the highest area under the ROC curve (AUC = 0.848, ). Subsequently, we validated the prediction model using the external dataset GSE213313, including samples from normal controls and COVID-19 patients. The ROC curve demonstrated satisfactory performance of the prediction model for six feature genes (AUC = 0.944, ).

The above results indicate that by establishing machine learning models and selecting copper-death-related feature genes, we have successfully identified feature genes with high diagnostic values associated with copper death.

Correlation analysis of copper-death-related genes and immune cell infiltration in COVID-19 patients

We conducted Spearman correlation analysis to explore the relationship between the expression of candidate genes and immune cell infiltration, aiming to understand better the functional roles of these central copper death-related genes in immune infiltration. The results indicate a strong correlation between FDX1, DLST, GLS, SLC31A1 and various immune cells (). Specifically, neutrophils showed the most positive correlation with FDX1 expression, while monocytes showed the most negative correlation with FDX1 expression (Figure S1a, p < .05). T cells CD8 were positively correlated with DLST expression, while neutrophils were negatively correlated with DLST expression (Figure S1b, p < .05). CD4 memory quiescent T cells were positively correlated with GLS expression, while macrophages M0 were negatively correlated with GLS expression (Figure S1c, p < .05). Macrophages M0 were positively correlated with MTF1 expression, while CD4 memory quiescent T cells were negatively correlated with MTF1 expression (Figure S1d, p < .05). Neutrophils were positively correlated with SLC31A1 expression, while T cells CD8 were negatively correlated with SLC31A1 expression (Figure S1e, p < .05). Neutrophils were positively correlated with NFE2L2 expression, while T cells CD8 were negatively correlated with NFE2L2 expression (Figure S1f, p < .05).

Figure 9. Correlation between FDX1, NFE2L2, SLC31A1, DLST, MTF1, GLS, and immune infiltrating cells.

(a–f) Lollipop plots depict the correlation between model feature genes’ expression levels and immune cells.
Figure 9. Correlation between FDX1, NFE2L2, SLC31A1, DLST, MTF1, GLS, and immune infiltrating cells.

The above results indicate that feature genes such as FDX1 may be involved in the immune cell infiltration of the blood immune environment in COVID-19 patients, thus affecting the progression of the disease.

Enrichment analysis of characteristic genes reveals key immune signaling pathways in COVID-19 progression

To further understand the enrichment pathways related to the characteristic genes, we conducted single-gene GSEA analysis and GSVA analysis. We sorted the target genes from high to low according to their expression level differences and defined the top and bottom 25% of samples based on the expression levels of the target genes as the high- and low-expression groups. We assess the enrichment level of a target gene set by the cumulative scores in the ranked list.

The GSEA analysis results showed that in the high and low expression groups of feature gene samples, there was enrichment in signaling pathways such as T cell receptor signaling, natural killer cell-mediated cytotoxicity, Peroxisome Proliferator-Activated Receptor (PPAR), etc. (enrichment score > 0.5, p < .001, ). The T cell receptor signaling pathway is a critical step for T cells to recognize and respond to external antigens, and it plays a crucial role in maintaining immune tolerance, forming immune memory, and other critical immune mechanisms. It plays a vital role in the immune system.Citation26 Moreover, previous studies have demonstrated a correlation between TCR, T cells, and the adaptive immune response to COVID-19.Citation27,Citation28 Natural killer cells (NK cells) are a type of cell with cytotoxic activity that could directly kill infected tumor cells or virus-infected cells. They are crucial in antiviral defense and immune regulation.Citation29 The significance of the PPAR (Peroxisome Proliferator-Activated Receptor) signaling pathway is vast, involving various physiological processes such as lipid metabolism, pathway regulation, cell proliferation, and urinary system functions.Citation30

Figure 10. GSEA analysis of FDX1, NFE2L2, SLC31A1, DLST, MTF1, GLS.

(a–f) GSEA analysis of model feature genes.
Figure 10. GSEA analysis of FDX1, NFE2L2, SLC31A1, DLST, MTF1, GLS.

The GSVA analysis results confirmed the enrichment (enrichment_Score > 0.5, p < .001, ) of signaling pathways, including T cell receptor signaling, natural killer cell-mediated cytotoxicity, PPAR, etc.

Figure 11. GSVA analysis of FDX1, NFE2L2, SLC31A1, DLST, MTF1, GLS.

(a–f) GSVA analysis of model feature genes.
Figure 11. GSVA analysis of FDX1, NFE2L2, SLC31A1, DLST, MTF1, GLS.

The above results indicate an enrichment phenomenon between high and low expression groups of characteristic genes, especially in T cell receptor signaling, natural killer cell-mediated cytotoxicity, and PPAR signaling pathways. These signaling pathways play an essential role in the immune system and are involved in crucial immune mechanisms such as immune tolerance, formation of immune memory, and antiviral capability. These findings further validate the close association between genetic markers and the immune system, providing new insights into the functions of genetic markers and their role in disease progression.

Discussion

Based on the above results, we could preliminarily draw the following conclusions: Feature genes such as FDX1 could affect the immune microenvironment of COVID-19 patients through T cell receptor signaling, natural killer cell-mediated cytotoxicity, PPAR signaling pathway, resulting in increased proliferation and enhanced cytotoxicity of T cells, macrophages, and natural killer cells, thereby suppressing immune evasion of the COVID-19 virus ().

Figure 12. Molecular mechanism diagram predicting the immune microenvironment characteristics of COVID-19 using a copper death-related risk score.

Figure 12. Molecular mechanism diagram predicting the immune microenvironment characteristics of COVID-19 using a copper death-related risk score.

This study clearly revealed the connection between copper death-related genes and COVID-19. Through the sequencing data from the GEO database, we identified 19 copper death-related genes, among which 16 showed differences. These data suggest that copper death-related genes may be crucial in developing COVID-19. Previous studies have also focused on the relationship between trace elements and viral infections, but our research is groundbreaking regarding the specific correlation between copper death-related genes and COVID-19.Citation31,Citation32

Our data shows that T cell CD8 and macrophage M0 expression differences are the most between tumor and control samples and are correlated with COVID-19. It means that these immune cells may be the primary component infiltrating the blood environment of COVID-19 patients, thereby affecting the progression of COVID-19. Compared to previous studies, we not only focus on the overall state of the immune microenvironment but also delve into the specific influence of different types of immune cells on the disease condition, providing new research directions for future studies. In analyzing the COVID-19 samples based on copper death-related genes, we found that the 396 COVID-19 samples could be divided into two clusters related to copper death. This discovery gives us a whole new perspective on the disease classification of COVID-19 patients. Through this stratification, we can develop more personalized treatment plans for patients in different clusters in the future.

We applied four machine learning models and successfully identified six feature genes associated with the risk of disease occurrence. Compared to traditional research methods, machine learning technology provides us with an efficient and accurate tool for prediction.Citation33–35 Compared with previous studies, we not only focused on the selection of characteristic genes but also validated the role of these genes in the immune microenvironment. In our research, the feature gene FDX1 has shown its potential impact on the development of COVID-19. GSEA and GSVA analyses further revealed the involvement of feature genes such as FDX1 in T cell receptor signaling, natural killer cell-mediated cytotoxicity, PPAR signaling pathways, and more. This study provides us with new clues to understand how FDX1 is involved in the development of COVID-19.

The study identified copper death-related genes associated with COVID-19 and explored the interplay between these genes and the immune microenvironment. For example, the role of specific feature genes such as FDX1 mentioned in the study in T cell receptor signaling, natural killer cell-mediated cytotoxicity, and PPAR signaling pathways provides us with a new perspective for understanding the progression and pathological mechanisms of COVID-19. Understanding how these genetic characteristics impact the immune microenvironment and the progression of COVID-19 could provide new directions and targets for future COVID-19 treatment strategies. By utilizing four machine learning models, researchers have successfully predicted the risk of COVID-19 occurrence, providing a tool for early diagnosis and prevention, which holds implications for public health strategies.Citation36

The research is mainly based on the dataset from the GEO database, which may differ from the actual diversity of cases. Although the association between copper death-related genes and COVID-19 has been confirmed, it is only a correlation, and the causal relationship is unclear. Although the model has been validated on the GSE213313 dataset, further validation is still needed in a broader range of samples and various contexts. Due to time and budget constraints, the conclusions of this study are based solely on the analysis of bioinformatics techniques and have yet to be validated through animal experimentation. In the future, we will conduct further experiments on extracellular and intracellular models to determine the exact causal relationship between copper death-related genes and COVID-19. To increase the generalizability of the research, multiple center data could be utilized to validate and refine machine learning models. Understanding how these genes affect the development of COVID-19 could provide direction for drug development and treatment strategies. Considering the potential variation of the COVID-19 virus and its impact on copper-related genes associated with mortality, machine learning models need to be continuously updated and improved to ensure their effectiveness in future predictions.

This study provides a fresh perspective on understanding the relationship between COVID-19 and genes associated with copper death and reveals potential therapeutic strategies. However, this is only preliminary; further research is needed to confirm these findings and apply them to clinical practice.

Authors’ contributions

Qi Wan conceived and designed paper. Zhenzhong Su collected and analyzed data. Jing Zhang prepared figures. He Yan drafted paper. Jie Zhang edited and revised manuscript. All authors read and approved final version of manuscript.

Ethics approval and consent to participate

An ethics statement was not required for this study type, no human or animal subjects or materials were used.

Supplemental material

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that supports the findings of this study are available on request from the corresponding author upon reasonable request.

Supplementary material

Supplemental data for this article can be accessed on the publisher’s website at https://doi.org/10.1080/21645515.2024.2310359.

Additional information

Funding

The work was supported by the Key Project of Science and Technology Committee of Jilin [20220803].

References

  • Diagne ML, Rwezaura H, Tchoumi SY, Tchuenche JM. A mathematical model of COVID-19 with vaccination and treatment. Comput Math Method M. 2021 Sep 4;2021:1–17. doi:10.1155/2021/1250129.
  • Jiang Y, Huo Z, Qi X, Zuo T, Wu Z. Copper-induced tumor cell death mechanisms and antitumor theragnostic applications of copper complexes. Nanomedicine (Lond). 2022;17(5):303–324. doi:10.2217/nnm-2021-0374.
  • Cui X, Wang Y, Liu H, Shi M, Wang J, Wang Y. The molecular mechanisms of defective copper metabolism in diabetic cardiomyopathy. Oxid Med Cell Longev. 2022 Oct 4;2022:5418376. doi:10.1155/2022/5418376.
  • Shribman S, Poujois A, Bandmann O, Czlonkowska A, Warner TT. Wilson’s disease: update on pathogenesis, biomarkers and treatments. J Neurol Neurosurg Psychiatry. 2021;92(10):1053–1061. doi:10.1136/jnnp-2021-326123.
  • Ginès P, Krag A, Abraldes JG, Solà E, Fabrellas N, Kamath PS. Liver cirrhosis. Lancet. 2021;398(10308):1359–1376. doi:10.1016/S0140-6736(21)01374-X.
  • Varanasi SK, Kaech SM, Bui JD. SnapShot: cancer immunoediting. Cell. 2022;185(21):4038–4038.e1. doi:10.1016/j.cell.2022.09.027.
  • Zavros Y, Merchant JL. The immune microenvironment in gastric adenocarcinoma. Nat Rev Gastroenterol Hepatol. 2022;19(7):451–467. doi:10.1038/s41575-022-00591-0.
  • Lv B, Wang Y, Ma D, Cheng W, Liu J, Yong T, Chen H, Wang C. Immunotherapy: reshape the tumor immune microenvironment. Front Immunol. 2022 Jul 6;13:844142. doi:10.3389/fimmu.2022.844142.
  • Singh SK, Kumar D, Rathore AS. Understanding oxidation propensity in GCSF and assessment of its safety and efficacy. Pharm Res. 2020 Sep 29;37(10):207. doi:10.1007/s11095-020-02928-3.
  • Gusev E, Sarapultsev A, Solomatina L, Chereshnev V. SARS-CoV-2-specific immune response and the pathogenesis of COVID-19. Int J Mol Sci. 2022 Feb 2;23(3):1716. doi:10.3390/ijms23031716.
  • Wang Z, Mo S, Han P, Liu L, Liu Z, Fu X, Tian Y. The role of UXT in tumors and prospects for its application in hepatocellular carcinoma. Future Oncol. 2022;18(29):3335–48. doi:10.2217/fon-2022-0582.
  • Clough E, Barrett T. The gene expression omnibus database. Methods Mol Biol. 2016;1418:93–110. doi:10.1007/978-1-4939-3578-9_5.
  • Tsvetkov P, Coy S, Petrova B, Dreishpoon M, Verma A, Abdusamad M, Rossen J, Joesch-Cohen L, Humeidi R, Spangler RD. et al. Copper induces cell death by targeting lipoylated TCA cycle proteins [published correction appears in Science. 2022 Apr 22;376(6591): eabq4855]. Science. 2022;375(6586):1254–61. doi:10.1126/science.abf0529.
  • Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–7. doi:10.1038/nmeth.3337.
  • Ali HR, Chlon L, Pharoah PD, Markowetz F, Caldas C, Ladanyi M. Patterns of immune infiltration in breast cancer and their clinical implications: a gene-expression-based retrospective study. PLoS Med. 2016 Dec 13;13(12):e1002194. doi:10.1371/journal.pmed.1002194.
  • Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572–1573. doi:10.1093/bioinformatics/btq170.
  • Du Y, Miao W, Jiang X, Cao J, Wang B, Wang Y, Yu J, Wang X, Liu H. The epithelial to mesenchymal transition related gene calumenin is an adverse prognostic factor of bladder cancer correlated with tumor microenvironment remodeling, gene mutation, and ferroptosis [published correction appears in Front Oncol. 2023 May 11;13: 1185029]. Front Oncol. 2021 Jun 3;(11):683951. doi:10.3389/fonc.2021.683951.
  • Rezaei Z, Ranjbaran J, Safarpour H, Nomiri S, Salmani F, Chamani E, Larki P, Brunetti O, Silvestris N, Tavakoli T. et al. Identification of early diagnostic biomarkers via WGCNA in gastric cancer. Biomed Pharmacother. 2022;145:112477. doi:10.1016/j.biopha.2021.112477.
  • Wang Z, Liu J, Wang Y, Guo H, Li F, Cao Y, Zhao L, Chen H. Identification of key biomarkers associated with immunogenic cell death and their regulatory mechanisms in severe acute pancreatitis based on WGCNA and machine learning. Int J Mol Sci. 2023 Feb 3;24(3):3033. doi:10.3390/ijms24033033.
  • Gao XM, Zhou XH, Jia MW, Wang XZ, Liu D. Identification of key genes in sepsis by WGCNA. Prev Med. 2023;172:107540. doi:10.1016/j.ypmed.2023.107540.
  • Rigatti SJ. Random Forest. J Insur Med. 2017;47(1):31–39. doi:10.17849/insm-47-01-31-39.1.
  • Anand D, Pandey B, Pandey DK. A novel hybrid feature selection model for classification of neuromuscular dystrophies using Bhattacharyya coefficient, genetic algorithm and radial basis function based support vector machine. Interdiscip Sci Comput Life Sci. 2018;10(2):244–250. doi:10.1007/s12539-016-0183-6.
  • Ma C, Li F, Wang Z, Luo H. A novel immune-related gene signature predicts prognosis of Lung Adenocarcinoma. Biomed Res Int. 2022 Apr 9;2022:4995874. doi:10.1155/2022/4995874.
  • Powers RK, Goodspeed A, Pielke-Lombardo H, Tan AC, Costello JC. GSEA-InContext: identifying novel and common patterns in expression experiments. Bioinformatics. 2018;34(13):i555–i564. doi:10.1093/bioinformatics/bty271.
  • Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 2013 Jan 16;14:7. doi:10.1186/1471-2105-14-7.
  • Shah K, Al-Haidari A, Sun J, Kazi JU. T cell receptor (TCR) signaling in health and disease. Signal Transduct Target Ther. 2021 Dec 13;6(1):412. doi:10.1038/s41392-021-00823-w.
  • Park JJ, Lee KAV, Lam SZ, Moon KS, Fang Z, Chen S. Machine learning identifies T cell receptor repertoire signatures associated with COVID-19 severity. Commun Biol. 2023 Jan 20;6(1):76. doi:10.1038/s42003-023-04447-4.
  • Wang G, Wang Y, Jiang S, Fan W, Mo C, Gong W, Chen H, He D, Huang J, Ou M. et al. Comprehensive analysis of TCR repertoire of COVID-19 patients in different infected stage. Genes Genomics. 2022;44(7):813–822. doi:10.1007/s13258-022-01261-w.
  • Prager I, Watzl C. Mechanisms of natural killer cell-mediated cellular cytotoxicity. J Leukoc Biol. 2019;105(6):1319–1329. doi:10.1002/JLB.MR0718-269R.
  • Christofides A, Konstantinidou E, Jani C, Boussiotis VA. The role of peroxisome proliferator-activated receptors (PPAR) in immune responses. Metabolism. 2021;114:154338. doi:10.1016/j.metabol.2020.154338.
  • Read SA, Obeid S, Ahlenstiel C, Ahlenstiel G. The role of zinc in antiviral immunity. Adv Nutr. 2019;10(4):696–710. doi:10.1093/advances/nmz013.
  • de Jesus JR, Galazzi RM, Lopes Júnior CA, Arruda MAZ. Trace element homeostasis in the neurological system after SARS-CoV-2 infection: insight into potential biochemical mechanisms [published online ahead of print, 2022 Feb 26]. J Trace Elements Med Biol. 2022;71:126964. doi:10.1016/j.jtemb.2022.126964.
  • Gong T, Liu Y, Tian Z, Zhang M, Gao H, Peng Z, Yin S, Cheung CW, Liu Y. Identification of immune-related endoplasmic reticulum stress genes in sepsis using bioinformatics and machine learning. Front Immunol. 2022 Sep 20;13:995974. doi:10.3389/fimmu.2022.995974.
  • Lai Y, Lin P, Lin F, Chen M, Lin C, Lin X, Wu L, Zheng M, Chen J. Identification of immune microenvironment subtypes and signature genes for Alzheimer’s disease diagnosis and risk prediction based on explainable machine learning. Front Immunol. 2022 Dec 8;13:1046410. doi:10.3389/fimmu.2022.1046410.
  • Ko S, Choi J, Ahn J. GVES: machine learning model for identification of prognostic genes with a small dataset. Sci Rep. 2021 Jan 11;11(1):439. doi:10.1038/s41598-020-79889-5.
  • Shrock E, Fujimura E, Kula T, Timms RT, Lee I-H, Leng Y, Robinson ML, Sie BM, Li MZ, Chen Y. et al. Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity. Science. 2020;370(6520):eabd4250. doi:10.1126/science.abd4250.