128
Views
0
CrossRef citations to date
0
Altmetric
Featured Articles

An unsupervised embedding harmonization system for privacy-preserving data mining in healthcare

, , &
Pages 1-17 | Published online: 25 Jul 2023
 

Abstract

Sharing data across hospitals for disease modeling is challenging due to concerns over patient privacy and the lack of an efficient privacy-preserving data mining framework. Contextual embedding models, which encode medical events into vector representations while preserving the contextual dependencies between events, have shown promise in privacy-preserving data mining without requiring original data disclosure. However, the medical event representations learned from multiple data sources lie in different embedding spaces and cannot be directly integrated. Existing embedding harmonization algorithms require a list of common medical events between different data sources and use them as corresponding pairs for transformation, known as the supervised harmonization method. However, common medical events can be difficult to collect in clinical practice. To promote data mining across hospitals, we developed a novel unsupervised embedding harmonization system that introduces an unsupervised harmonization algorithm to align contextual embeddings without the need for corresponding pairs. The proposed framework also considered different contextual embedding techniques, including Word2Vec and Med2Vec, to explore the robustness of the proposed unsupervised harmonization algorithm. The proposed framework was evaluated using medical events extracted from the Medical Information Mart for Intensive Care III database. By integrating the embeddings from multiple sources, the proposed framework can achieve better disease prediction accuracy and medical event clustering compared to models built on a single data source. The proposed unsupervised harmonization method, which achieves similar performance to the supervised harmonization model under different contextual embedding techniques, holds great promise for predictive modeling and event clustering in healthcare.

Disclosure statement

No potential conflict of interest was reported by the authors.

Consent and approval statement

There are no human subjects involved in this study and informed consent is not applicable.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Funding

This paper is partially supported by the NIH grant R01MH121394.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 107.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.