835
Views
0
CrossRef citations to date
0
Altmetric
Short Communication

Development and validation of algorithms for identifying lines of therapy in multiple myeloma using real-world data

, , , , , , , , ORCID Icon & ORCID Icon show all
Pages 981-995 | Received 14 Aug 2023, Accepted 23 Nov 2023, Published online: 17 Jan 2024

Abstract

Aim: To validate algorithms based on electronic health data to identify composition of lines of therapy (LOT) in multiple myeloma (MM). Materials & methods: This study used available electronic health data for selected adults within Henry Ford Health (Michigan, USA) newly diagnosed with MM in 2006–2017. Algorithm performance in this population was verified via chart review. As with prior oncology studies, good performance was defined as positive predictive value (PPV) ≥75%. Results: Accuracy for identifying LOT1 (N = 133) was 85.0%. For the most frequent regimens, accuracy was 92.5–97.7%, PPV 80.6–93.8%, sensitivity 88.2–89.3% and specificity 94.3–99.1%. Algorithm performance decreased in subsequent LOTs, with decreasing sample sizes. Only 19.5% of patients received maintenance therapy during LOT1. Accuracy for identifying maintenance therapy was 85.7%; PPV for the most common maintenance therapy was 73.3%. Conclusion: Algorithms performed well in identifying LOT1 – especially more commonly used regimens – and slightly less well in identifying maintenance therapy therein.

Plain language summary

Electronic health data helps us understand treatment in the ’real world’. The data has great value in cancer if we can identify the drugs patients get. Yet this is hard in multiple myeloma (MM), where treatment is complex. Algorithms (set of decision rules) to identify drugs can help here. We tested an existing algorithm for identifying ’lines of therapy’ (LOT) given to patients with MM. Each LOT included one or more drugs for MM. We also developed and tested a new algorithm for ’maintenance therapy’. This is a treatment given to help maintain the response to the main MM treatment. We tested how well the algorithms identified MM treatments in electronic health data. This data came from Henry Ford Health, a healthcare system in Michigan, USA. Treatments were confirmed by cancer specialists who reviewed medical charts. The LOT algorithm was good at finding the first LOT patients. The maintenance algorithm did a fair job of identifying the most used therapy. Our algorithms could help researchers study the real-world treatment of MM.

Multiple myeloma (MM) is a hematologic malignancy in which normal plasma cells genetically transform into myeloma cells, causing tumors to form in bones and soft tissues. Treatment for newly diagnosed MM includes stem cell transplant (SCT; where appropriate, based on patient age and comorbidities); combination pharmacotherapy is recommended by the US National Comprehensive Cancer Network (NCCN) for all lines of therapy (LOT) [Citation1]. Several novel MM therapies approved in the past decade [Citation2–4] now contribute to most NCCN-recommended induction and maintenance regimens, irrespective of SCT eligibility [Citation5]. Treatment decisions often depend on disease status and drugs administered in previous lines [Citation6], and the same agent can be used as induction or maintenance therapy in many LOTs across the therapeutic journey.

Large electronic health record (EHR) databases provide invaluable information on ’real-world’ practice. EHR data are quickly accessible and often readily generalizable. However, they typically do not contain treatment intent (induction vs maintenance) or response, limiting their usefulness to examine therapy intent to induce or maintain response and requiring proxies to examine relevant outcomes.

While clinical trials are the gold standard for establishing treatment efficacy and safety, their findings are seldom generalizable [Citation7–9]. Real-world studies have consequently become increasingly important for evaluating treatment effectiveness [Citation10]. However, the increasing volume of real-world data (RWD) sources must be balanced against data integrity and usability. Importantly, validity of real-world evidence (RWE) studies is contingent on accurate identification of treatment indications. This is particularly challenging in MM, given the complex and evolving treatment landscape.

Validated case-selection algorithms can enable use of RWD (and RWE) for value-based contracting and for evaluating real-world treatment outcomes. They can be developed for use in selecting relevant populations, describing treatment patterns, and/or examining effectiveness when disease progression data are absent. In MM, such algorithms could identify composition and treatment intent of LOTs based on information available in EHR databases, thereby addressing questions relating to real-world treatment patterns and outcomes.

Recent US FDA RWE guidance advocates validation of algorithms developed to analyze real-world data [Citation11,Citation12]. This study used EHR data to validate algorithms for identifying LOTs in newly diagnosed MM patients and for identifying receipt of maintenance therapy as part of LOT1. The algorithms were developed using clinical input from practicing hematologists/oncologists, information from prior research [Citation13,Citation14], and NCCN guidelines [Citation15].

Materials & methods

Study design & data sources

This was a retrospective cohort study that used claims and EHR data from Henry Ford Health (HFH), an integrated US non-profit health maintenance organization that provides primary, specialist, and acute care to approximately 800,000 residents in the Detroit metropolitan area. HFH uses Epic, a multi-dimensional EHR system containing information on patients’ demographics, healthcare visits and hospital admissions, laboratory and imaging results and other clinical measures (e.g., disease progression). HFH also maintains an administrative database of all patient contacts with HFH healthcare providers and facilities (including claims for insured patients for services at non-HFH sites, inpatient and outpatient billing records, and outpatient prescription claims). Moreover, HFH data can be linked to the National Cancer Institute (NCI) Surveillance, Epidemiology and End Results (SEER) Program tumor registries.

To ensure complete data and enrollment information, this study was limited to patients insured by HFH’s Health Alliance Plan (HAP). During the period of HAP enrollment, all claims submitted for care rendered (including care received outside of HFH) were available. For care provided at HFH facilities, access to patients’ EHR data was available.

Claims and EHR data for 1 January 2006, through 31 December 2017, were obtained from HFH. To ensure patient confidentiality, per the Health Insurance Portability and Accountability Act, no personally identifiable information was extracted. Unique patient identifiers were used to link different data sources. The study was approved by the HFH Institutional Review Board (approval No: 12359).

Eligibility criteria

All adult HAP enrollees with ≥1 records in the HFH tumor registry with a diagnosis of MM (International Classification of Diseases for Oncology, Third Edition [ICD-O-3] morphology code 9732) and with evidence of ≥1 ’CRAB’ symptoms –elevated Calcium (hypercalcemia), Renal insufficiency, Anemia and Bone disease [Citation16] were identified. Included patients were required to be enrolled in HAP within 30 days of starting LOT1.

The index date was the earliest date of diagnosis in the HFH tumor registry. CRAB symptoms were assessed from 6 months prior to the index date until the earlier of 6 months after the index date or initiation of LOT1. The presence of CRAB symptoms (see Supplementary Table 1 for relevant operational definitions) was ascertained based on the following criteria: 1) ≥2 outpatient claims on different days with relevant diagnoses; 2) ≥1 inpatient claims with a relevant diagnosis; (3) ≥1 encounters resulting in a relevant diagnosis; 4) ≥1 relevant procedures (claims or EHR); or 5) ≥1 instances of a qualifying laboratory test finding.

Patients were stratified into two cohorts based on whether they underwent SCT within 300 days of the LOT1 start date (’SCT cohort’ and ’non-SCT cohort’).

Algorithms

Each patient’s use of MM drugs (per NCCN guidelines [Citation15]) from the index date onward was identified using all available secondary data. An algorithm for LOT identification, based on established guidance and prior studies [Citation13,Citation14,Citation17,Citation18], was implemented. LOTs comprised all observed MM drugs (i.e., bendamustine, bortezomib, carfilzomib, carmustine, cisplatin, cyclophosphamide, daratumumab, doxorubicin, elotuzumab, etoposide, interferon, ixazomib, lenalidomide, melphalan, panobinostat, pomalidomide, thalidomide, vincristine) used in the relevant LOT.

MM treatment may include within a single LOT separate therapies intended for induction and maintenance. Consequently, smaller and simpler treatment components within an MM LOT should be identified first. We designated these smaller treatment components ’mini-LOTs’. Briefly, the first MM-related drug identified after the index date marked the start of the first mini-LOT. All MM-related drugs received during the 30-day period beginning with the start date for that first drug were included in the first mini-LOT. Inclusion of drugs in the first mini-LOT continued until a ’gap’ of >60 days without evidence of receipt of a drug (assumed to represent discontinuation) or the end of the 30-day period, whichever was earliest. The next drug received was assumed to begin the next mini-LOT. This process was repeated for the entirety of follow-up until all use of MM-related drugs was allocated to a specific mini-LOT. Mini-LOTs were then reviewed and potentially combined into LOTs based on comparisons between each set of proximal mini-LOTs (between mini-LOT1 and mini-LOT2, between mini-LOT2 and mini-LOT3, etc.) for: 1) receipt of ’new’ MM-related drugs (one[s] not included in prior mini-LOTs); 2) an assessment of the interval (in days) between mini-LOTs; and 3) duration of prior mini-LOTs. Specific rules on whether to combine mini-LOTs to constitute LOT1 are illustrated in Supplementary Figure 1. Subsequent LOTs were constituted similarly.

Use of maintenance therapy within LOT1 was determined based on specific criteria, including specific MM drugs used within LOT1, durations of their use, and SCT status. Briefly, for the SCT cohort, the first mini-LOT after SCT incorporated into LOT1 (i.e., the second mini-LOT in LOT1) was assumed to denote maintenance therapy. For the non-SCT cohort, the second mini-LOT of LOT1 was assumed to be maintenance therapy if: 1) use of MM-related therapies within the first mini-LOT was ≥100 days in duration; 2) the second mini-LOT of LOT1 was ≥28 days in duration; and 3) medications that comprised the first and second mini-LOTs of LOT1 differed. Interferon monotherapy was not considered maintenance. Further details on the algorithm used to identify use of maintenance therapy within LOT1 are provided in the Supplementary Methods.

Because the algorithms were developed for use with EHR databases, they were limited to drug start and stop dates and did not include clinical information typically found in unstructured data (e.g., treatment intent, disease progression/response, reason[s] for discontinuation).

EHR review

Data were extracted by trained HFH personnel using case-report forms to identify different MM drugs and regimens comprising LOTs, including start and stop dates for individual drugs, reason(s) for discontinuation and dose and frequency of administration (including change[s] in treatment administration and reason[s] therefor). Intended use of each identified therapy within the LOT (induction or maintenance) was determined by clinician review (see Supplementary Methods for LOT conceptual definitions). Two independent clinicians with expertise in MM (B Nguyen, S Ailawadhi) reviewed manually abstracted unstructured/uncurated EHR data. Disagreements between reviewers were adjudicated by a third reviewer (D Romanus). In all instances, the three reviewers reached a consensus regarding the agent(s) used within each LOT and, for LOT1, its intended use (induction or maintenance). A single LOT comprised all MM-related medications received until evidence of progressive disease, modifications to an existing LOT (i.e., the addition of other agents alone or in combination, including a switch of ≥1 agents or addition of a ’new’ agent [other than steroids]) because of progressive disease, relapse, sub-optimal response/lack of response, or toxicity constituted the beginning of a new LOT. For the non-SCT cohort, use of a drug within LOT1 as maintenance was deemed to have occurred in any instance of: 1) a reduction in number, dose, and/or frequency of administration of relevant agents (relative to LOT1 initiation); or 2) a switch from a multidrug regimen to monotherapy with either a proteasome inhibitor or an immunomodulator. For the SCT cohort, maintenance was assumed to have occurred during LOT1 if, post-SCT: 1) duration of therapy with ≥1 MM-related medications was >4 months; or 2) the regimen comprised <3 agents.

Statistics

Demographic and clinical characteristics were summarized with descriptive statistics. Clinical characteristics included CRAB symptoms; comorbidities; level of risk based on a fluorescence in situ hybridization/cytogenetic analysis and a modified Charlson Comorbidity Index (CCI) that did not include diagnosis of MM [Citation19]. Algorithm performance was analyzed for LOT1, LOT2 and LOT3. Medication composition and use of maintenance therapy in LOT1, as identified by algorithms, were validated against information obtained via chart review. LOTs and maintenance therapies identified by the algorithms but not by clinician review were considered false positives (FPs). Regimens identified by clinician review but not similarly identified by the algorithms were considered false negatives (FNs). Exact matches were considered true positives (TPs) and true negatives (TNs). For maintenance, a similar approach was taken.

The following algorithm performance measures were calculated:

  • Sensitivity = TPs / (TPs + FNs)

  • Specificity = TNs / (TNs + FPs)

  • Positive predictive value (PPV) = TPs / (TPs + FPs)

  • Negative predictive value (NPV) = TNs / (TNs + FNs)

  • Accuracy = (TPs + TNs) / (TPs + FPs + TNs + FNs)

PPV [Citation20,Citation21] was used as the primary indicator of algorithm performance, because it represents the proportion of TPs among all patients identified by algorithms as positive and is therefore better able to capture rare events (e.g. rarely used regimens) than accuracy (which also includes TNs). An algorithm with an accuracy of 99% might be unable to predict a regimen that is used in 1% of MM patients because of the high volume of TNs. Conversely, PPV determines how likely it is that patients identified by an algorithm as having received a certain regimen did indeed receive it.

Results

Patients

Of 192 adult HAP enrollees with a diagnosis of MM between 2006 and 2017, 138 (71.9%) met all relevant selection criteria (). For 133 of these patients (96.4%), unstructured EHR data was available for the same time period as structured claims and EHR data (i.e., the information needed to validate LOT and maintenance algorithms).

Figure 1. Sample attrition.

*Because MM treatment is complex and includes separate treatment components (e.g. induction and maintenance), the first step in defining an LOT is to identify smaller, simpler treatment components known as mini-LOTs. Patients with no valid mini-LOTs for MM were excluded.

CRAB: Hypercalcemia, renal failure, anemia, bone lesions; EHR: Electronic health record; HAP: Health Alliance Plan; HFH: Henry Ford Health; LOT: Line of therapy; MM: Multiple myeloma.

Figure 1. Sample attrition. *Because MM treatment is complex and includes separate treatment components (e.g. induction and maintenance), the first step in defining an LOT is to identify smaller, simpler treatment components known as mini-LOTs. Patients with no valid mini-LOTs for MM were excluded.CRAB: Hypercalcemia, renal failure, anemia, bone lesions; EHR: Electronic health record; HAP: Health Alliance Plan; HFH: Henry Ford Health; LOT: Line of therapy; MM: Multiple myeloma.

Mean age at MM diagnosis was 67.5 years, and 46.6% of patients were female (). Most patients were Black/African–American (60.9%) and non-Hispanic (71.4%). One-quarter of patients (24.8%) had hypercalcemia, 57.1% had renal failure, 84.2% had anemia and 39.1% had bone lesions. The modified CCI was 0 for 24.8% of patients, 1–3 for 47.4% of patients, and ≥4 for 27.8% of patients. 44 patients (33.1%) underwent SCT as part of LOT1. Median (interquartile range) duration between LOT1 start and end of follow-up was 641 (330–1124) days (i.e., nearly 2 years).

Table 1. Demographic and clinical characteristics of patients who contributed to LOT1.

LOT1

Treatments

In the aggregate population, the most frequent drugs within LOT1 regimens, as identified by the algorithm, were bortezomib (received by 73.7% of patients as part of LOT1), lenalidomide (39.1%) and thalidomide (9.8%; ). For the SCT cohort, the most frequent drugs were bortezomib (75.0%), lenalidomide (59.1%), cyclophosphamide (13.6%) and thalidomide (13.6%); for the non-SCT cohort, they were bortezomib (73.0%), lenalidomide (29.2%), thalidomide (7.9%) and oral melphalan (7.9%).

Table 2. Drugs identified in LOT1 (any regimen) using the algorithm and by clinician review of unstructured information from electronic health records, by SCT status.

Algorithm performance

Overall algorithm accuracy at identifying LOT1 regimens was 85.0% (113/133). For the three regimens identified by the algorithm in >10% of patients, accuracy was 92.5% for bortezomib + steroids (identified in 39.1% of patients), 93.2% for bortezomib/lenalidomide + steroids (identified in 23.3% of patients), and 97.7% for lenalidomide + steroids (identified in 12.0% of patients; ). For these same LOT1 regimens, PPV ranged from 80.6% for bortezomib/lenalidomide + steroids to 93.8% for lenalidomide + steroids, and sensitivity ranged from 88.2% for lenalidomide + steroids to 89.3% for bortezomib/lenalidomide + steroids ( and Supplementary Table  2). Specificity was 100.0% for 11 of the 22 LOT1 regimens and was lowest for bortezomib/lenalidomide + steroids (94.3%).

Table 3. Algorithm performance for regimens identified in LOT1, as verified by clinician review of unstructured information from electronic health records: aggregate population.

In the SCT cohort, overall algorithm accuracy at identifying LOT1 regimens was 77.3% (34/44). For the two regimens identified by the algorithm in >10% of patients, accuracy was 88.6% for bortezomib/lenalidomide + steroids (identified in 38.6% of patients) and 90.9% for bortezomib + steroids (identified in 18.2% of patients; ). PPV was 70.6% for bortezomib/lenalidomide + steroids and 100.0% for bortezomib + steroids; sensitivity was 100.0% for bortezomib/lenalidomide + steroids and 66.7% for bortezomib + steroids ( and Supplementary Table  3). Specificity was 84.4% for bortezomib/lenalidomide + steroids and 97.7 to 100.0% for other regimens.

Table 4. Algorithm performance for regimens identified in LOT1, as verified by clinician review of unstructured information from electronic health records: SCT and non-SCT cohorts.

In the non-SCT cohort, overall algorithm accuracy at identifying LOT1 regimens was 88.8% (79/89). For the three regimens identified by the algorithm in >10% of patients, accuracy was 93.3% for bortezomib + steroids (identified in 49.4% of patients), 95.5% for bortezomib/lenalidomide + steroids (identified in 15.7% of patients), and 98.8% for lenalidomide + steroids (identified in 13.5% of patients; ). For these regimens, PPV ranged from 90.9% for bortezomib + steroids to 92.9% for bortezomib/lenalidomide + steroids and sensitivity from 81.3% for bortezomib/lenalidomide + steroids to 100% for lenalidomide + steroids ( and Supplementary Table  4). Specificity was 91.5% for bortezomib + steroids and ranged from 97.7 to 100.0% for other regimens.

LOT2 & LOT3

A total of 59 patients (44.4% of the aggregate population) received LOT2, of whom 14 (23.7%) were in the SCT cohort and 45 (76.3%) in the non-SCT cohort. Overall algorithm accuracy at identifying LOT2 regimens was 52.5% (31/59) in the aggregate population, 50.0% (7/14) in the SCT cohort and 53.3% (24/45) in the non-SCT cohort. For the three regimens identified by the algorithm among >10% of patients in the aggregate population (i.e., bortezomib + steroids, lenalidomide + steroids, and bortezomib/lenalidomide + steroids), accuracy ranged from 81.4 to 89.8% and PPV from 86.1 to 95.7% (Supplementary Table 5).

Thirty-two patients (24.1% of the aggregate population) received LOT3, of whom 10 (31.3%) were in the SCT cohort and 22 (68.7%) in the non-SCT cohort. Overall algorithm accuracy at identifying LOT3 regimens was 34.4% (11/32) in the aggregate population, 30.0% (3/10) in the SCT cohort, and 36.4% (8/22) in the non-SCT cohort. Only two regimens identified by the algorithm were used by >10% of patients in the aggregate population: lenalidomide + steroids, for which accuracy was 75.0% and PPV 40.0%, and bortezomib/lenalidomide + steroids, for which accuracy was 90.6% and PPV 40.0% (Supplementary Table 6).

Maintenance therapy in LOT1

Treatments

The algorithm identified use of maintenance therapy as part of LOT1 among 19.5% of patients in the aggregate population. Maintenance comprised several monotherapies, including lenalidomide (13.5% of patients), dexamethasone (3.8%), bortezomib (1.5%) and thalidomide (0.8%; ). In the SCT cohort, maintenance therapy was identified in 40.9% of patients, including lenalidomide (31.8% of patients), dexamethasone (6.8%) and bortezomib (2.3%); in the non-SCT cohort, it was designated in 9.0% of patients, including lenalidomide (4.5% of patients), dexamethasone (2.2%), bortezomib (1.1%) and thalidomide (1.1%).

Algorithm performance

In the aggregate population, algorithm accuracy for maintenance was 85.7%. Accuracy was 88.7% for no maintenance therapy and ranged from 94.0 to 99.2% for individual therapies (). For lenalidomide monotherapy, the only maintenance therapy identified by the algorithm in >10% of patients (11.3%), PPV and sensitivity were both 73.3% ( and Supplementary Table  7). Specificity ranged from 96.6 to 100.0% for individual maintenance therapies. For no maintenance therapy, PPV was 94.4%, sensitivity was 91.8% and specificity was 73.9%.

Table 5. Algorithm performance at identifying maintenance therapies used as part of LOT1, as verified by clinician review of unstructured information from electronic health records.

Algorithm accuracy at identifying maintenance therapies in the SCT cohort was 81.8%. Accuracy was 86.4% for no maintenance therapy and ranged from 90.9 to 97.7% for individual therapies (). For lenalidomide monotherapy, again the only maintenance therapy identified by the algorithm in >10% of patients (27.3%), PPV and sensitivity were both 83.3% ( and Supplementary Table  8). For no maintenance therapy, PPV was 100.0% and sensitivity 81.3%. Specificity was 100.0% for no maintenance therapy and ranged from 93.2 to 97.7% for individual therapies.

In the non-SCT cohort, overall algorithm accuracy at identifying maintenance therapies was 87.6%. Accuracy was 89.9% for no maintenance therapy and ranged from 95.5 to 98.9% for individual therapies (). For no maintenance therapy, PPV was 92.6%, sensitivity 96.2% and specificity 45.5% ( and Supplementary Table  9). The maintenance therapy identified most frequently by the algorithm was lenalidomide monotherapy (3.4%). Across maintenance therapies, PPV ranged from 0.0 to 100.0%, sensitivity from 0.0 to 66.7% and specificity from 97.7 to 100.0%.

Discussion

The need for RWE to supplement clinical trial data and to answer clinical, economic and policy-related questions in routine clinical practice has increasingly gained the interest of regulatory agencies and policy makers [Citation22–24]. However, the ability to effectively use secondary databases is often limited by a lack of clinical details required to accurately identify study populations and limited information on patient characteristics and treatments [Citation25,Citation26]. For example, while claims data provides detail on the magnitude of healthcare resource use and costs, they are typically ’silent’ on reasons why care was sought or rendered. EHR contain substantially greater clinical detail, but the intent and outcome of treatment are typically only identifiable in unstructured/uncurated data. In MM, these challenges are compounded by treatment complexity, which often requires sequencing of multiple drugs to induce and maintain clinical response. In order to obtain robust and useful RWE for important applications such as value-based contracting, treatment decision making, and regulatory policy, robust algorithms tailored to specific contexts are required to fully leverage the value of secondary RWD.

One way to enable robust RWE generation in MM is to develop and validate algorithms for reliably identifying LOTs, which can then be used with large electronic health data (e.g., claims, EMR) to further ’unlock’ their usefulness. However, very few studies have evaluated algorithm performance for LOT1 [Citation27,Citation28], and – to the best of our knowledge – no study to date has validated algorithms for subsequent LOTs or for maintenance therapy. In our study, the algorithm for identifying LOT1 performed relatively well, as did the algorithm to identify maintenance therapy. Overall algorithm accuracy at identifying LOT1 was high (85.0%), albeit marginally lower in SCT patients (77.3%) than in non-SCT patients (88.8%).

We used PPV to assess algorithm performance because it is directly correlated with the proportion of ’true’ cases identified (unlike accuracy, which incorporates both true positives and true negatives). The false positives included in the PPV calculation reflect suboptimal algorithm performance (i.e., instances where chart review indicated that the algorithm had incorrectly assigned a unique combination of medications as an LOT). For example, the LOT algorithm assigned bortezomib/lenalidomide + steroids as LOT1 for one patient in our study; however, a review of the patient’s medical chart indicated that LOT1 was actually bortezomib + steroids. Accordingly, instances where PPV was estimated to be 100% are limited to those where the algorithm performed perfectly (i.e., all individuals identified via the algorithm were true positives). Earlier studies that identified cancer cases from EHR data sources deemed PPV values ≥75% acceptable [Citation29,Citation30]. In our study, PPV estimated for the three most commonly used LOT1 regimens (collectively used by 74.4% of patients in the study sample) exceeded this threshold, with ranges between 80.6 and 93.8%. This range of values is comparable to published values for the same regimens from another study that evaluated algorithm performance using data for a randomly selected sample of MM patients in the Veterans Affairs Healthcare System (PPV 89 to 96%, N = 89) [Citation28]. For the less common LOT1 regimens in our study, PPV was often lower, although accuracy and specificity were consistently high. LOT1 regimens for which PPV was low were not NCCN recommended at the time this research was conducted [Citation5]. Since the algorithm was informed by NCCN guidelines, which themselves likely reflect ’typical’ patients, it might not perform as well at identifying atypical regimens not recommended by guidelines.

Algorithm performance decreased in LOTs beyond LOT1. This could be partly due to the smaller sample sizes – as sample size decreases, each incorrectly classified patient imparts greater weight on performance measures. In addition, regimen complexity increased from LOT1 to LOT2 (i.e., more drugs were involved, and the ways in which they were combined also varied), which may not have been fully appreciated or accounted for by the decision rules evaluated herein. Finally, we note that many regimens across all LOTs incorporated use of steroids. While our decision rules to identify and allocate steroid usage are complex, regimen selection is highly personalized (taking account of overall patient health and response to MM-related drugs [and potentially steroids] in earlier LOTs), and the current algorithm likely does not fully differentiate and identify these agents appropriately. Further work is needed to better understand how further refinement of the current decision rules could improve algorithm performance within and across LOTs.

Most patients did not receive maintenance therapy as part of LOT1. Overall algorithm accuracy at identifying maintenance therapies in LOT1 was 85.7%, and PPV was high (94.4%) for no maintenance therapy, indicating that the algorithm was effective at determining whether maintenance therapy was part of LOT1. PPV was relatively good (73.3%) for lenalidomide monotherapy (the most common LOT1 maintenance therapy), although algorithm performance may have been impacted by infrequent use of maintenance therapy. We note that only 26 patients in the aggregate population received maintenance therapy; as noted above, each incorrectly classified patient has a greater impact on algorithm performance measures when the sample size is small. Overall, the algorithms performed well, although they were better at identifying all drugs comprising LOT1 regimens than uses therefor (i.e., induction vs maintenance). These results suggest that our algorithms can be used with EHR data to inform MM research for which LOT ascertainment is required.

Our study has limitations. First, EHR data often have quality issues, such as variable or incomplete data recording. We mitigated this issue by using patients with near complete EHR data capture. Second, as MM is relatively rare, the sample size was limited (especially for the SCT cohort [all LOTs] and for LOT2 and LOT3 in general). Third, attention was limited to a single health system (HAP enrollees of HFH). Although distributions of age and gender were similar to national data (e.g., 33.8 and 30.1% of study patients were aged 65–74 years and ≥75 years, respectively, vs 31.2 and 32.6% of Americans newly diagnosed with MM; 53.4% of study patients were male vs 55.8% of newly diagnosed Americans), race differed (60.9% of study patients were Black/African–American vs 20.7% of all newly diagnosed Americans) [Citation31,Citation32]. Algorithm performance should therefore also be examined using data from other sources, including alternative healthcare systems. Fourth, algorithm logic is complex and may be difficult to implement. Finally, the MM treatment paradigm is continuously evolving, and drugs/indications approved after 2017 (e.g., isatuximab [Sarclisa®] [Citation33], daratumumab [Darzalex®]) [Citation34] for first-line SCT-eligible patients) were not included. Accounting for newer treatment regimens will require additional validation efforts, although our algorithms are based on literature and treatment guidelines [Citation13,Citation14,Citation15] and should be fairly generalizable.

Conclusion

Our findings suggest that the algorithms described herein to identify MM regimens used as LOT1 for induction and/or maintenance performed sufficiently well to inform future RWE generation. Performance was best for commonly used regimens and was relatively consistent regardless of SCT use. As with other instances of algorithm development to expand use of EHR data, future validation efforts should consider other data sources, and agents and regimens approved/adopted for use in MM subsequent to our study period. They should also consider modifications that account for potential patient-level heterogeneity/atypical combinations, which would likely enable the algorithms to better identify use of anti-MM therapies (induction and maintenance) in LOT2 and subsequent LOTs.

Summary points
  • New multiple myeloma (MM) therapies have been approved in the last decade and are now part of most National Comprehensive Cancer Network-recommended induction and maintenance regimens. The combination of drugs administered to the patients with MM depends on disease status and previous lines of therapy (LOTs).

  • Studies using electronic health databases improve our understanding of real-world use and effectiveness of therapies. However, their utility in oncology research is often limited by a lack of information on treatment intent (induction vs maintenance) or outcomes.

  • This limitation is acutely felt in MM, where a given therapy may be used to either induce or maintain response.

  • Using linked claims and electronic health record (EHR) data from 2006–2017, we validated algorithms for identifying LOTs and maintenance therapies in adults with newly diagnosed symptomatic MM enrolled in a US regional non-profit insurance plan.

  • Algorithm performance for the first line of therapy (LOT1) was verified by review of manually abstracted unstructured/uncurated information from the EHR data.

  • The algorithms performed relatively well at identifying MM treatment. Accuracy at identifying LOT1 regimens was 85.0%, and positive predictive value was high (80.6–93.8%) for the most frequently identified LOT1 treatment regimens. Accuracy at identifying maintenance therapy in LOT1 was 85.7%.

  • Algorithm performance was similar in subgroups defined by stem cell transplant status.

  • Our findings suggest that the algorithms could be useful in informing future real-world evidence generation where identification of LOTs is required.

Author contributions

Concept and design: S Ailawadhi, D Romanus, S Shah, K Fraeman, D Saragoussi, R Morris Buus, Lois Lamerato, Ariel Berger. Acquisition of data: D Romanus, K Fraeman, D Saragoussi, R Morris Buus, Lois Lamerato. Analysis and interpretation of data: S Ailawadhi, D Romanus, S Shah, K Fraeman, R Morris Buus, B Nguyen, D Cherepanov, L Lamerato, A Berger. Drafting of the manuscript: S Ailawadhi, D Romanus, S Shah, D Saragoussi, R Morris Buus, B Nguyen, D Cherepanov, A Berger. Critical revision of the paper for important intellectual content: S Ailawadhi, D Romanus, S Shah, D Saragoussi, R Morris Buus, B Nguyen, D Cherepanov, L Lamerato, A Berger. Statistical analysis: S Shah, K Fraeman, A Berger. Provision of study materials or patients: R Morris Buus. Obtaining funding: D Romanus, D Cherepanov, Berger. Administrative, technical, or logistic support: D Romanus, R Morris Buus, D Cherepanov. Supervision: S Ailawadhi, D Romanus, D Cherepanov, L Lamerato, A Berger. Other (initiation, oversight, statistical analysis plan development): D Romanus.

Financial disclosure

This study was funded by Takeda. S Ailawadhi reported receiving grants from AbbVie, Amgen, Ascentage, Bristol Myers Squibb, Cellectar, GlaxoSmithKline, Janssen, Pharmacyclics, and Sanofi; and consulting fees from for Beigene, Bristol Myers Squibb, Cellectar, GlaxoSmithKline, Janssen, Pfizer, Regeneron, Sanofi, and Takeda. D Romanus and D Cherepanov are employees of and own stock/stock options in Takeda, which funded this study. S Shah, K Fraeman, D Saragoussi, R Morris Buus, B Nguyen and A Berger are employees of Evidera/PPD, a contract research organization with previous and ongoing financial relationships with Takeda and other pharmaceutical, biotech and medical device companies. D Saragoussi owns stock options in Thermo Fisher Scientific, and A Berger owns stock and stock options in Thermo Fisher Scientific, of which Evidera/PPD is a part. Evidera/PPD was paid by Takeda to conduct this study. L Lamerato is an employee of Henry Ford Health System and received research funding from Takeda (via Evidera) for this study. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Competing interests disclosure

The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Writing disclosure

The authors thank Stephen Gilliver, PhD of Evidera/PPD for providing medical writing support, which was funded by Takeda in accordance with Good Publication Practice guidelines (www.ismpp.org/gpp-2022).

Ethical conduct of research

The authors state that they obtained appropriate institutional review board approval prior to acquiring data for the research. Per the Health Insurance Portability and Accountability Act, no personally identifiable information was extracted to ensure confidentiality.

Open access

This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

Supplemental material

Supplementary document

Download PDF (1.2 MB)

Supplementary data

To view the supplementary data that accompany this paper please visit the journal website at: www.tandfonline.com/doi/suppl/10.2217/fon-2023-0696

References

  • National Comprehensive Cancer Network . NCCN Guidelines: Multiple Myeloma, Version 1.2024 (2023). www.nccn.org/guidelines/guidelines-detail?category=1&id=1445
  • Kumar S . Emerging options in multiple myeloma: targeted, immune, and epigenetic therapies. Hematology Am Soc Hematol Educ Program.2017(1), 518–524 (2017).
  • Leow CC , LowMSY. Targeted Therapies for Multiple Myeloma. J Pers Med.11(5), 334 (2021).
  • Garfall AL . New Biological Therapies for Multiple Myeloma. Annu. Rev. Med. doi: 10.1146/annurev-med-050522-033815 (2023).
  • Kumar SK , CallanderNS , AdekolaKet al. Multiple Myeloma, Version 3.2021, NCCN Clinical Practice Guidelines in Oncology. J Natl Compr Canc Netw. 18(12), 1685–1717 (2020).
  • Pereira MP , HoffmannV , WeisshaarEet al. Chronic nodular prurigo: clinical profile and burden. A European cross-sectional study. J Eur Acad Dermatol Venereol.34(10), 2373–2383 (2020).
  • Duma N , AzamT , RiazIB , Gonzalez-VelezM , AilawadhiS , GoR. Representation of Minorities and Elderly Patients in Multiple Myeloma Clinical Trials. Oncologist.23(9), 1076–1078 (2018).
  • Casey M , IslamM , ShoukierM , OdhiamboL , CortesJE. Are pivotal trials for drugs approved for leukemia, myelodysplastic syndromes, and multiple myeloma representative of the population demographics affected by these diseases?Blood138, 846 (2021).
  • Varma T , WallachJD , MillerJEet al. Reporting of Study Participant Demographic Characteristics and Demographic Representation in Premarketing and Postmarketing Studies of Novel Cancer Therapeutics. JAMA Netw Open.4(4), e217063 (2021).
  • Blonde L , KhuntiK , HarrisSB , MeizingerC , SkolnikNS. Interpretation and Impact of Real-World Clinical Data for the Practicing Clinician. Adv Ther.35(11), 1763–1774 (2018).
  • U.S. Food and Drug Administration . Considerations for the Use of Real-World Data and Real-World Evidence to Support Regulatory Decision-Making for Drug and Biological Products. Guidance for Industry. FDA-2021-D-1214 (14 January 2022). www.fda.gov/media/154714/download
  • U.S. Food and Drug Administration . Real-World Data: Assessing Electronic Health Records and Medical Claims Data to Support Regulatory Decision-Making for Drug and Biological Products. Guidance for Industry. FDA-2020-D-2307 (14 January 2022). www.fda.gov/media/152503/download
  • Rajkumar SV , HarousseauJL , DurieBet al. Consensus recommendations for the uniform reporting of clinical trials: report of the International Myeloma Workshop Consensus Panel 1. Blood117(18), 4691–4695 (2011).
  • Rajkumar SV , RichardsonP , SanMJF. Guidelines for determination of the number of prior lines of therapy in multiple myeloma. Blood126(7), 921–922 (2015).
  • Kumar SK , CallanderNS , HillengassJet al. National Comprehensive Cancer Network . NCCN Guidelines Insights: Multiple Myeloma, Version 1.2020.J Natl Compr Canc Netw.17(10), 1154–1165 (2019).
  • International Myeloma Working Group . Criteria for the classification of monoclonal gammopathies, multiple myeloma and related disorders: a report of the International Myeloma Working Group. Br. J. Haematol.121(5), 749–757 (2003).
  • Davies F , RifkinR , CostelloCet al. Real-world comparative effectiveness of triplets containing bortezomib (B), carfilzomib (C), daratumumab (D), or ixazomib (I) in relapsed/refractory multiple myeloma (RRMM) in the US. Ann. Hematol.100(9), 2325–2337 (2021).
  • Chari A , RichardsonPG , RomanusDet al. Real-world outcomes and factors impacting treatment choice in relapsed and/or refractory multiple myeloma (RRMM): a comparison of VRd, KRd, and IRd. Expert Rev Hematol.13(4), 421–433 (2020).
  • Quan H , SundararajanV , HalfonPet al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med. Care43(11), 1130–1139 (2005).
  • Chubak J , PocobelliG , WeissNS. Tradeoffs between accuracy measures for electronic health care data algorithms. J. Clin. Epidemiol.65(3), 343–349 e342 (2012).
  • Jiao Y , DuP. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant Biol.4(4), 320–330 (2016).
  • O’donnell JC , LeTK , DobrinRet al. Evolving use of real-world evidence in the regulatory process: a focus on immuno-oncology treatment and outcomes. Future Oncol.17(3), 333–347 (2021).
  • Harricharan S , CurranE , LinHMet al. Real-world evidence in lung and hematologic oncology health technology appraisals: a review of six assessment agencies. Future Oncol.19(8), 603–616 (2023).
  • Di Maio M , PerroneF , ConteP. Real-World Evidence in Oncology: Opportunities and Limitations. Oncologist.25(5), e746–e752 (2020).
  • Brown JS , MaroJC , NguyenM , BallR. Using and improving distributed data networks to generate actionable evidence: the case of real-world outcomes in the Food and Drug Administration’s Sentinel system. J. Am. Med. Inform. Assoc.27(5), 793–797 (2020).
  • Desai RJ , MathenyME , JohnsonKet al. Broadening the reach of the FDA Sentinel system: a roadmap for integrating electronic health record data in a causal analysis framework. NPJ Digit Med.4(1), 170 (2021).
  • Parikh R , ClancyZ , CandrilliS , ParikhK. Administrative data algorithms to identify diagnosis and treatment-related measures in patients with multiple myeloma: a validation study. Poster presented at the 2018 AMCP Managed Care & Specialty Pharmacy Annual Meeting. J Manag Care Pharm. 2018 Apr; 24(4-2):S29.Boston, MA (2018). April 25, 2018.
  • La J , DumontierC , HassanHet al. Validation of algorithms to select patients with multiple myeloma and patients initiating myeloma treatment in the national Veterans Affairs Healthcare System. Pharmacoepidemiol Drug Saf.32(5), 558–566 (2023).
  • Nattinger AB , LaudPW , BajorunaiteR , SparapaniRA , FreemanJL. An algorithm for the use of Medicare claims data to identify women with incident breast cancer. Health Serv. Res.39(6 Pt 1), 1733–1749 (2004).
  • Nordstrom BL , WhyteJL , StolarM , MercaldiC , KallichJD. Identification of metastatic cancer in claims data. Pharmacoepidemiol Drug Saf.21(Suppl. 2), 21–28 (2012).
  • National Cancer Institute . Cancer Stat Facts: myeloma. https://seer.cancer.gov/statfacts/html/mulmy.html (29 March 2022).
  • National Cancer Institute . Cancer Statistics Review 1975–2018: Table A.1 Number of Incidence Cases, 2014–2018 by Primary Cancer Site, Race and Sex (21 SEER Geographic Areas). https://seer.cancer.gov/csr/1975_2018/browse_csr.php (29 March 2022).
  • U.S. Food and Drug Administration . FDA approves new therapy for patients with previously treated multiple myeloma. www.fda.gov/news-events/press-announcements/fda-approves-new-therapy-patients-previously-treated-multiple-myeloma (18 November 2021).
  • U.S. Food and Drug Administration. FDA approves daratumumab for transplant-eligible multiple myeloma. www.fda.gov/drugs/resources-information-approved-drugs/fda-approves-daratumumab-transplant-eligible-multiple-myeloma (3 February 2022).