Views

CrossRef citations to date

Altmetric

ORIGINAL RESEARCH

Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data

Sean P McDermottDivision of Pain Medicine, Department of Anesthesiology and Perioperative Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USACorrespondence[email protected]

https://orcid.org/0000-0002-9899-9494

Ajay D WasanDivision of Pain Medicine, Department of Anesthesiology and Perioperative Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA

https://orcid.org/0000-0002-6394-6077

Abstract

Purpose

This study evaluates the utility of machine learning (ML) and natural language processing (NLP) in the processing and initial analysis of data within the electronic health record (EHR). We present and evaluate a method to classify medication names as either opioids or non-opioids using ML and NLP.

Patients and Methods

A total of 4216 distinct medication entries were obtained from the EHR and were initially labeled by human reviewers as opioid or non-opioid medications. An approach incorporating bag-of-words NLP and supervised ML classification was implemented in MATLAB and used to automatically classify medications. The automated method was trained on 60% of the input data, evaluated on the remaining 40%, and compared to manual classification results.

Results

A total of 3991 medication strings were classified as non-opioid medications (94.7%), and 225 were classified as opioid medications by the human reviewers (5.3%). The algorithm achieved a 99.6% accuracy, 97.8% sensitivity, 94.6% positive predictive value, F1 value of 0.96, and a receiver operating characteristic (ROC) curve with 0.998 area under the curve (AUC). A secondary analysis indicated that approximately 15–20 opioids (and 80–100 non-opioids) were needed to achieve accuracy, sensitivity, and AUC values of above 90–95%.

Conclusion

The automated approach achieved excellent performance in classifying opioids or non-opioids, even with a practical number of human reviewed training examples. This will allow a significant reduction in manual chart review and improve data structuring for retrospective analyses in pain studies. The approach may also be adapted to further analysis and predictive analytics of EHR and other “big data” studies.

Keywords:

Abbreviations

AUC, area under the curve; CHOIR, Collaborative Health Outcomes Information Registry; ECOC, error-correcting output code; EHR, electronic health record; ML, machine learning; NLP, natural language processing; PPV, positive predictive value; ROC, receiver operating characteristic curve.

Acknowledgments

The authors would like to acknowledge the contribution of Andrea G. Gillman Ph.D. (UPMC Pain Medicine, Pittsburgh, PA, USA), for her contributions to data collection and study design, including medication classification and data acquisition.

Disclosure

The authors report no conflicts of interest in this work. Internal departmental funding was utilized in support of this work.

Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data

Purpose

Patients and Methods

Results

Conclusion

Information for

Open access

Opportunities

Help and information

Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data

Abstract

Purpose

Patients and Methods

Results

Conclusion

Abbreviations

Acknowledgments

Disclosure

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature