2,084
Views
2
CrossRef citations to date
0
Altmetric
Editorial

Using data science to improve outcomes for persons with opioid use disorder

, PharmD, PhD, MPHORCID Icon, , PhDORCID Icon, , PharmD, PhDORCID Icon, , PharmD, PhDORCID Icon, , PhD, , PhD, MS, MSPharmORCID Icon, , PhDORCID Icon, , PhD, MPHORCID Icon & , MD, MPHORCID Icon show all

Abstract

Medication treatment for opioid use disorder (MOUD) is an effective evidence-based therapy for decreasing opioid-related adverse outcomes. Effective strategies for retaining persons on MOUD, an essential step to improving outcomes, are needed as roughly half of all persons initiating MOUD discontinue within a year. Data science may be valuable and promising for improving MOUD retention by using “big data” (e.g., electronic health record data, claims data mobile/sensor data, social media data) and specific machine learning techniques (e.g., predictive modeling, natural language processing, reinforcement learning) to individualize patient care. Maximizing the utility of data science to improve MOUD retention requires a three-pronged approach: (1) increasing funding for data science research for OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as data not specific to medical care (e.g., mobile, sensor, and social media data), and (3) applying multiple data science approaches with integrated big data to provide insights and optimize advances in the OUD and overall addiction fields.

The opioid epidemic has taken a devastating toll on the US. Nearly 500,000 people have died of an opioid overdose between 1999 and 2019, and initial estimates from 2020 suggest a 30% increase in overdose deaths during the COVID-19 pandemic.Citation1,Citation2 An estimated two million Americans have an opioid use disorder (OUD), and, unfortunately, a majority of those with OUD are not receiving evidence-based treatment.Citation3 Medication treatment for OUD (MOUD) consists of formulations of buprenorphine, methadone, naltrexone and is the standard of care.Citation4–7 MOUD reduces illicit opioid use and associated mortality.Citation8–10 Even if patients receive MOUD, 40%–55% of persons discontinue MOUD within a year after initiation, and recent data show a six-fold increase in mortality in the four weeks immediately after MOUD discontinuation.Citation11–14 Therefore, MOUD retention is vital for reducing mortality and improving outcomes for persons with OUD.

In the era of “big data” (i.e., having large scale data in electronic format), data science approaches could facilitate major scientific advances in MOUD retention.Citation15 Data science is the systematic extraction of knowledge from data, and, as some have argued, is leading a new (fourth) paradigm of science.Citation16–19 Funding agencies and current, large studies are recognizing the importance of data science to moving the field of health sciences, including addiction, forward. The National Institutes of Health (NIH) has developed a Strategic Plan for Data Science “that provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.”Citation20 Specifically, the National Institute on Drug Abuse (NIDA) and the National Institute on Alcohol Abuse and Alcoholism (NIAAA) have recently published a notice of special interest titled, “High-Priority Interest to Enhance Data Science Research Training in Addiction Research.” The purpose of this notice of special interest is to highlight NIDA/NIAAA’s “high-priority interest in receiving applications that will support training and career development in Big Data and Computational Science (i.e., data science) within the overall field of addiction research.”Citation21 Further, these efforts may increase the use of data science to improve care for persons with substance use disorders including OUD.

In mirroring funding agency priorities, large-scale addiction-focused studies are integrating data science practices at their core. For example, the Adolescent Brain Cognitive Development (ABCD) Study®, the largest, long-term brain development and child health study in the United States, will follow approximately 12,000 9–10 year old children through young adulthood and gather extensive amounts of and various types of data (e.g., imaging and digital data) about each child as the child progresses through adolescence.Citation22 The current phase of The ABCD Study® can help us understand the effects of substance use on the developing adolescent brain.Citation23 The ABCD Study® has weaved data science into the core of the study by creating a Data Analysis, Informatics, and Resource Center (DAIRC) as a study component. DAIRC is providing data harmonization, central capture, and rigorous data quality controls to publicly share all ABCD data and tools with data scientists from around the world to analyze.Citation24

Data science is a NIDA strategic priority and a cross-cutting research approach that holds promise for expanding scientific discovery and medical breakthroughs. Therefore, its use is vital to help address the opioid addiction epidemic. In this article, we highlight recent advances and emerging opportunities in data science to better identify persons at risk of MOUD discontinuation. We also discuss ways that data science may inform the development of interventions for improving MOUD retention. Recent advances and emerging opportunities are presented and discussed in relation to five data sources: electronic health records (EHRs), claims data, genomic and neuroimaging data, digital, non-medical data, and clinical trial data.

Electronic health records

EHRs are creating many opportunities for researchers to study healthcare systems in an effort to improve the effectiveness of care and patient outcomes. Although EHRs store a variety of patient medical information and are used to manage clinical workflows, the utility of EHR systems to improve the quality and enhance addiction care across treatment settings (e.g., substance use disorder (SUD) and general medical care) is limited. Specifically, less than a third of SUD treatment programs in 2017 used EHR systems to store and maintain treatment records.Citation25,Citation26 In contrast, over 95% of hospitals in the US in 2017 used EHR systems for general medical care.Citation27,Citation28 Therefore, SUD treatment records may exist in paper form while the treatment records for the person’s care received in other healthcare settings likely exists in EHR form. However, this limitation may be reduced given the 2018 SUPPORT Act, which aims to increase adoption of EHRs in clinics that treat patients with OUD. Additionally, SUD treatment records that are in the person’s EHR are required to be stored separately, per Title 42 of the Code of Federal Regulations (42CFR), as SUD treatment records cannot be integrated with the rest of the person’s EHR information.Citation29,Citation30 This makes it difficult, on a large scale, to aggregate diagnoses and other care received for persons obtaining SUD care. Furthermore, siloed EHR data serves as a key barrier to identifying important factors (e.g., care types and duration) that may affect MOUD retention. In addition to siloed EHR data, a recent study found substantial fragmentation in EHR data standards for OUD-related clinical data elements.Citation31 Some state-level efforts are underway to improve EHR interoperability for substance abuse treatment programs in New Jersey, but efforts on a national level are lacking.Citation32 Solving these challenges with EHR data would create opportunities to tailor OUD treatment plans to improve MOUD retention.

The Veterans Health Administration (VHA), which is among the largest integrated health care systems in the US, contains an extensive repository of EHR data. VHA is an ideal setting for a learning healthcare system (i.e., care systems that use EHRs to improve care processes and outcomes), particularly in relation to treating OUD, where evidence generation and application could integrate seamlessly to improve healthcare processes and patient outcomes (e.g., MOUD retention).Citation33 VHA provides care to over 9 million enrolled Veterans at 1,293 health care facilities, including 171 VA medical centers and 1,112 outpatient sites.Citation34 The VHA Corporate Data Warehouse (CDW) is unique in that it includes integrated, historical EHR records of persons with SUD and other healthcare received by Veterans; 42CFR does not apply to VHA. VHA is currently using data analytics to provide insights into the care of Veterans with OUD. For example, VHA’s Strategic Analytics for Improvement and Learning (SAIL) metrics provide insight into how well a VHA hospital system is performing on various quality measures.Citation35 A MOUD-specific SAIL metric (SUD-16), which is updated on a quarterly basis, is the percentage of Veterans identified with OUD who are currently receiving MOUD.Citation36 The SUD-16 metric improves our understanding of access to MOUD, and MOUD retention improves the SUD-16 metric. VHA EHR data also contain some self-reported measures that could be leveraged in OUD research such as drinking patterns and pain scores. Outside of VHA, healthcare systems are also evaluating measures of care for patients with OUD. For example, healthcare systems are using Healthcare Effectiveness Data and Information Set (HEDIS) measures to understand health factors that affect treatment initiation and engagement for persons with alcohol and other drug use disorders.Citation37 Advances in data science can be leveraged to further our use of EHR data, including performance metrics, to improve MOUD retention, enhance OUD quality of care, and inform clinical and stakeholder decisions.

VHA and other large health systems (e.g., Kaiser Permanente) have also pioneered the use of EHR data to conduct population-level predictive modeling to improve care for patients. For example, predictive modeling using machine-learning techniques (e.g., random forest, deep neural networks) can be used to better understand outcomes of disease states. Predictive models provide an estimate of the probability that a particular outcome (e.g., disease state, or care outcome including treatment discontinuation) will occur.Citation38,Citation39 VHA developed the Stratification Tool for Opioid Risk Mitigation (STORM) using population-level EHR data on Veterans. STORM is a clinical decision support tool that uses logistic regression-based predictive modeling to identify Veterans at high risk for opioid overdose and suicide. For Veterans considered “at-risk” the model suggests to providers additional care options they might consider to help improve patient outcomes.Citation40 Unfortunately, to our knowledge, machine-learning approaches have yet to be applied to predict and identify individuals at risk of MOUD discontinuation.Citation41–43

To address this gap, our team is currently developing a PREdictive Model for MOUD discontinuation (PREMMOUD), using EHR data from VHA’s CDW, to identify Veterans at risk for MOUD discontinuation. Developed and validated using machine-learning techniques (e.g., random forest), PREMMOUD will generate, for each Veteran, a real-time, validated predicted probability of MOUD discontinuation. Predicting MOUD discontinuation (i.e., gaps in MOUD greater than 30 days after treatment initiation) will provide an estimated probability of discontinuation at various time points in the future (e.g., 90–, 180-day intervals) continuously over time after a Veteran initiates MOUD. PREMMOUD will also provide individual-level patient, provider, and system-level characteristics driving each person’s discontinuation risk score. PREMMOUD will be integrated into a clinical decision support tool like STORM. PREMMOUD will be used, in real-time, by providers to continuously monitor risk of MOUD discontinuation and provide care recommendations for addressing risk factors associated with the risk score.

While predictive modeling has high potential to advance the care of persons with OUD by accurately identifying individuals at high risk of an outcome, natural language processing (NLP) can provide additional information that is unavailable in structured EHR data. NLP is a branch of machine-learning that gives computers the ability to understand spoken and written text (i.e., unstructured data), and in this case, clinical notes in the EHR.Citation44 Data within EHR notes hold rich information—including information about social and behavioral determinants of health, clinical signs and symptoms, and treatment details that may provide useful and valuable insights to advance addiction practice and MOUD retention. NLP can be used to provide real-time assessment of unstructured EHR notes, thus potentially providing awareness of the reasons for MOUD discontinuation documented in providers’ progress notes.Citation44,Citation45 After extensive work developing data dictionaries of relevant terms and validation of NLP systems, NLP could be used to process and provide insights on massive amounts of written text at a rapid pace, which could prove beneficial to providing actionable insights for providers trying to retain persons in treatment.Citation46,Citation47 Adding NLP data to PREMMOUD would be ideal as it would provide opportunities to predict discontinuation of MOUD with both structured and unstructured data.

EHR data are not without limitations. EHR systems are usually limited in only showing care at one healthcare system and not overall for the patient. In addition, EHR data are not collected for research purposes but are primarily inputted for visit documentation. Identifying patients with certain disease states and identifying incident versus prevalent cases can also be difficult as diagnoses may not be documented in the EHR. For example, identifying OUD with EHR data alone can underestimate the prevalence of OUD.Citation48 Limitations can also arise with time-related biases, inconsistent data recording, missing data, and unmeasured confounding.Citation49 However, EHR data can be one avenue for insights in addiction through data science.

Claims data

Like EHR data, claims are another secondary source of population-level data. Claims data, like EHR data, allow for evaluations of rare events related to OUD given the large population samples of the cohorts. For example, Wakeman et al. evaluated the comparative effectiveness of different treatment pathways for OUD using the OptumLabs Data Warehouse, which is an umbrella repository of multiple datasets including claims for commercial and Medicare Advantage enrollees.Citation50 They found methadone and buprenorphine, as compared to opioid antagonist therapy, inpatient treatment, or intensive outpatient behavioral interventions, were associated with lower rates of opioid overdoses and opioid-related morbidity. Similarly, Lo-Ciganic et al. recently used claims from Medicare beneficiaries to develop machine-learning algorithms for predicting opioid overdose.Citation41 They concluded that machine-learning algorithms using claims data are valuable in more accurately identifying opioid overdose risk.

Claims data share some of the same limitations as noted for EHR data but offer the advantage of documenting care across health systems. Unlike EHR data, the primary purpose of claims data is billing. This can make it more challenging to accurately identify diagnoses and services if those diagnoses and services are not related to payment.

Genomics and neuroimaging

Another promising area of data science is using genomics and neuroimaging data to better understand pathways to and treatment of OUD. Studies are currently underway to understand the genetic factors that may influence treatment-related outcomes among patients receiving MOUD. For example, The Genomics of Opioid Addiction Longitudinal Study (GOALS) will be a prospective observational study of 400 patients receiving MOUD that will help determine the interplay between genetic and non-genetic (e.g., depression, anxiety, adverse childhood experiences) factors on key treatment outcomes (e.g., treatment retention rate, toxicology results, overdoses).Citation51 While conducted with typically smaller patient cohorts, neuroimaging studies can provide a vast amount of data per patient. For example, a recent systematic review of functional magnetic resonance imaging studies found that persons with OUD display heightened neural activation to heroin cues, and this pattern is attenuated by MOUD, predicts treatment response, and is reduced after extended abstinence.Citation52 Genomic and neuroimaging data provide great avenues for advanced data science techniques (e.g., deep learning) since these data sources generate extensive amounts of data.

Like other data sources, genomic and neuroimaging data also have their limitations. With genomic data, it can be difficult to have enough patients to test significance of multiple genes. Neuroimaging data are limited by the difficulty in reproducing neuroimaging findings due to both small effect sizes observed in brain-behavioral relationships as well as complex analytical pipelines. Neuroimaging also has few established clinical applications.Citation53

Digital, non-medical data

Data generated from digital technologies, such as smartphones, wearable sensors (e.g., smartwatches), and social media (e.g., Twitter, Reddit), can help the field better understand factors affecting health-related behavior and can inform the delivery of personalized therapies.Citation54,Citation55 While claims data capture information that can be used to predict health behavior and improve patient care and outcomes, they are limited in that they are not designed to collect data in real time. Digital technologies provide an avenue to capture real-time, granular data about individuals’ day-to-day activities and behaviors within the context of their daily lives. In particular, smartphone data collection has grown in popularity as mobile devices have become ubiquitous and contain powerful sensors that allow for data to be gathered objectively and unobtrusively.Citation56 Smartphones also provide a platform for the delivery of just-in-time adaptive interventions (JITAI), which are personalized digital interventions or treatments delivered in the exact moment that people need them.Citation57

The use of smartphones for OUD research and treatment, typically through mobile health (mHealth) applications (apps), has garnered interest in recent years. One reason is likely that digital data collection and therapies occur in everyday contexts, making them particularly advantageous for reaching patients with intermittent or inconsistent health care usage, such as patients with addiction who often experience multiple barriers to care. In addition to collecting data on participant opioid use, mHealth apps have been used to deliver opioid-related education, provide opioid conversion support for clinicians, monitor withdrawal symptoms, and expand the reach of evidence-based OUD treatment.Citation58 Several studies have applied mHealth apps to address challenges of OUD medication adherence and retention. For example, the Addiction-Comprehensive Health Enhancement Support System (A-CHESS) app – a well-known platform that provides recovery support services to patients with alcohol use disorders – has recently been adapted to improve long-term recovery for patients with OUD.Citation59,Citation60 The reSET-O app was cleared by the U.S. Food and Drug Administration in 2018 that uses cognitive behavioral therapy to help increase retention in outpatient treatment for OUD.Citation61 Other lesser known apps specifically targeting MOUD adherence and retention through educational material, video conferencing, text-messaging medication reminders, and motivational coaching are being developed and tested at present.Citation58,Citation62,Citation63 While novel mHealth apps hold potential for improving MOUD retention over the long term, rigorous testing has yet to be completed. More research on the quality and efficacy of digital OUD treatment platforms and recovery management apps is warranted, as there is little empirical evidence that any current apps meet basic quality standards or are effective for OUD management.Citation64

Because smartphones generate large quantities of various types of information, advanced analytic techniques and machine learning algorithms are necessary to process and extract clinical insights from data.Citation65 Many machine learning techniques, such as random forest, support vector, and deep learning approaches, have been applied to data collected passively from smartphones.Citation66 Deep learning uses multilayer artificial neural networks to progressively extract features from the input data. Mobile devices are also a good option for deploying deep learning and reinforcement learning models. Deep learning has shown to improve a broad range of clinical applications such as detection of drug discontinuation events, improved phenotyping, making diagnoses based on clinical images, and prediction of clinical outcomes.Citation46,Citation67–74 In reinforcement learning, the computer uses real-time trial and error in an interactive environment to maximize the total cumulative reward (e.g., maximize the probability of obtaining the intended outcome). Mobile health applications are using reinforcement learning to provide JITAI to help users prevent future negative health outcomes. For example, a reinforcement learning algorithm can be applied to optimize physical activity among users by processing input from the user through their mobile device.Citation75 Over time, the reinforcement learning algorithm learns what time of day and in what setting to present messages about going for a walk that will actually increase the individual’s probability of increasing their step count. Since each individual has different routines and preferences, the algorithm has to learn how to maximize the probability of the desired outcome for each specific individual. In some instances, reinforcement learning has led to decision-making that is more accurate than human decision-making.Citation76 By providing the decisions to maximize total cumulative reward, reinforcement learning could be used to improve MOUD retention through learning individuals’ routines and preferences to deliver interventions (e.g., coping strategies) at optimal times and settings to maximize the probability of MOUD retention.

Social media apps, like mHealth apps, are a burgeoning area of OUD data science research. Social media is an avenue to better understand the opioid epidemic in real-time. For example, machine-learning and NLP have been used with social media data to identify opioid-related social media chatter.Citation77 This type of work is designed to predict geolocation-centric “hot spots” of future opioid overdose events to be able to better respond with appropriate regional-level treatment interventions (e.g., naloxone distribution). A study by Yang et al. worked to predict relapse in OUD treatment using social media data in hopes of keeping patients in treatment longer.Citation78

Challenges and limitations also exist with digital, non-medical cohort data. For example, more rigorous evaluation of the quality of digital health information for diagnostic, therapeutic, or prognostic purposes, and implications of poor information quality for patient safety, is needed.Citation79 Moreover, the “digital divide” may create gaps across the population where older generations may access mHealth apps or social media less frequently than younger generations. Disparities can also arise with digital, non-medical data in regards to race/ethnicity, socioeconomic status, and rural/urban status, particularly with social media data.Citation80–82 Therefore, the generalizability of digital, non-medical cohort data may be limited. However, the use of digital data is another data source that can be used by data scientists to provide new insights for persons with OUD.

Clinical trial data

Clinical trial data, particularly clinical trial data that can be combined across individual clinical trials or linked to other data sources such as claims or EHR data, can be an excellent source of data for advanced data science techniques. Since clinical trial data are collected specifically for research, these data may not face the issues of incompleteness and confounding that claims and EHR data face. NIDA established a Clinical Trials Network which is an enterprise of treatment researchers and community-based service providers who work to discover and implement new treatment options, in part, for OUD.Citation83 With the ability to conduct multi-site, large-scale randomized controlled trials across 18 nodes, NIDA’s Clinical Trial Network may provide a valuable avenue for the application of advanced data science techniques.

Like all the other data sources, clinical trial data have their limitations. Clinical trials typically include more homogeneous participants, and data collection processes can be costly and time-consuming. Clinical trial data are also usually on a smaller scale than secondary data sources such as EHR or claims data. Therefore, clinical trial data can be limited in the generalizability of their findings.

Conclusion

Given the rapid increase in opioid-related morbidity in the US, applying novel data science techniques with “big data” to MOUD retention is imperative. Data science is a powerful tool that has the potential to inform and improve OUD care. Data science, through use of EHR, claims, genomic and neuroimaging, digital, non-medical, and clinical trial-derived data, may enable health care systems and providers to predict which persons with OUD are at high risk for relapse and/or poor care engagement, identify which persons may benefit from which interventions, and pair those persons at high risk of relapse with tailored, patient-specific interventions to improve their outcomes. The field of opioid addiction research must continually leverage data science in multiple ways and from multiple data sources to further advance the prevention of OUDs, enhance the effectiveness of their treatment, and improve outcomes. To do this will require a three-pronged approach: (1) increasing funding for data science research around OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as other digital, non-medical data (e.g., mobile, sensor, social media data), and (3) applying multiple data science approaches with integrated “big data” to provide insights and optimize advances in the OUD and overall addiction field.

Author contributions

CJH and AJG developed the Editorial concept. CJH drafted the Editorial. All authors edited and reviewed the final manuscript. All authors approved the final version for publication.

Acknowledgements

This work was supported by the Veterans Affairs Health Services Research and Development Service under Award Number IK2HX003358, the Translational Research Institute (TRI), grant UL1 TR003107 through the National Center for Advancing Translational Sciences of the National Institutes of Health (NIH), and R01DA050676 through NIH/National Institute on Drug Abuse (NIDA). Infrastructure support for author AJG was provided, in part, by the VA HSR&D Informatics, Decision-Enhancement, and Analytic Sciences (IDEAS) Center of Innovation (CIN 13-414) and the NIDA under the Greater Intermountain Node (GIN; NIH/NIDA 1UG1DA049444-01). The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

The authors are solely responsible for the content of this article, which does not represent the official views of the US Federal Government, including the Department of Veterans Affairs, Veterans Health Administration, the National Institute of Health, or any of the academic affiliations of the authors.

Disclosure statement

Dr. Lo-Ciganic is named as an inventor in a preliminary patent filing from the University of Florida for use of a machine learning algorithm for opioid risk prediction in Medicare. Dr. Lo-Ciganic has received grant funding from Merck Sharp & Dohme Corp and Bristol Myers Squibb, unrelated to this project. Dr. Martin receives royalties from TestleTree LLC for the commercialization of an opioid risk prediction tool, which is unrelated to this project.

Additional information

Funding

This work was supported by the Veterans Affairs Health Services Research and Development Service under Award Number IK2HX003358, the Translational Research Institute (TRI), grant UL1 TR003107 through the National Center for Advancing Translational Sciences of the National Institutes of Health (NIH), and R01DA050676 through NIH/National Institute on Drug Abuse (NIDA). Infrastructure support for author AJG was provided, in part, by the VA HSR&D Informatics, Decision-Enhancement, and Analytic Sciences (IDEAS) Center of Innovation (CIN 13-414) and the NIDA under the Greater Intermountain Node (GIN; NIH/NIDA 1UG1DA049444-01). The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.