0
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Predicting locus-specific DNA methylation levels in cancer and paracancer tissues

, ORCID Icon, , , , & show all
Received 03 Apr 2023, Accepted 20 Feb 2024, Published online: 18 Apr 2024
 

Abstract

Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics.

Summary points
  • The authors employed machine learning models to predict genome-wide DNA methylation (DNAm) levels in cancerous tissues (CTs) and paracancerous tissues (PTs) when one of them is difficult to obtain.

  • The proposed model based on a single CpG site achieves an improvement of mean absolute error at more than 68% of CpGs.

  • A multiple-CpG-based XGBoost model can further improve the predictive performance when there is considerable variability between individuals.

  • The detected CpG sites in differential methylation analysis are statistically more significant by combining the measured and predicted PTs to enlarge the sample size.

  • When using CTs as predictors instead of PTs, the prediction models have better performance.

  • The aggressiveness of cancers and patient outcome may be predictable using well-predicted DNAm profiles in CT/PT.

  • Functional enrichment analysis based on highly correlated CpG sites identified important pathways involved in cancer progression.

  • The cross-tumor DNAm prediction model has the potential to be applied to an external cancer dataset for a subset of probes with high correlation in both cancers.

Author contributions

Conceptualization: B Ma, S Liu, F Song and S Zhang; methodology: B Ma and S Zhang; investigation: B Ma, F Song, S Zhang and Y Liu; visualization: S Zhang; supervision: B Ma, S Liu and F Song; writing – original draft: S Zhang; writing – review and editing: B Ma, S Liu, F Song, S Zhang, Y Liu, Y Shen and D Li. All authors read and approved the final manuscript.

Financial disclosure

This work was supported by the Chinese National Key Research and Development Project (no. 2021YFC2500400) and the National Natural Science Foundation of China (no. 61471078). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Competing interests disclosure

The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending or royalties.

Writing disclosure

No writing assistance was utilized in the production of this manuscript.

Data sharing statement

The source code and demo data have been deposited at: https://github.com/lab319/DNAm_prediction_CT_PT.

Additional information

Funding

This work was supported by the Chinese National Key Research and Development Project (no. 2021YFC2500400) and the National Natural Science Foundation of China (no. 61471078). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 99.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 130.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.