2,450
Views
17
CrossRef citations to date
0
Altmetric
Original Articles

Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records

, , &
Pages 316-324 | Received 24 Nov 2019, Accepted 30 Jun 2020, Published online: 22 Jul 2020

References

  • Gray BH, Bowden T, Johansen I, Koch S. Electronic health records: an international perspective on “meaningful use”. Issue Brief (Commonw Fund). 2011;28:1–18.
  • Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006;6:30. doi:10.1186/1472-6947-6-30
  • Reilly BM, Evans AT. Translating clinical research into clinical practice: impact of using prediction rules to make decisions. Ann Intern Med. 2006;144:201–9. doi:10.7326/0003-4819-144-3-200602070-00009
  • Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 2005;330:765. doi:10.1136/bmj.38398.500764.8F
  • Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc. 2008;15:14–24. doi:10.1197/jamia.M2408
  • Figueroa RL, Soto DA, Pino EJ. Identifying and extracting patient smoking status information from clinical narrative texts in Spanish. Conf Proc IEEE Eng Med Biol Soc. 2014;2014:2710–3. doi:10.1109/EMBC.2014.6944182
  • Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc. 2008;15:25–8. doi:10.1197/jamia.M2437
  • Patel J, Siddiqui Z, Krishnan A, Thyvalikakath T. Leveraging electronic dental record data to classify patients based on their smoking intensity. Methods Inf Med. 2018;57:253–60. doi:10.1055/s-0039-1681088
  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. Sigkdd Explor Newsl. 2009;11:10–8. doi:10.1145/1656274.1656278
  • Larsson K, Janson C, Lisspers K, Jørgensen L, Stratelis G, Telg G, et al. Combination of budesonide/formoterol more effective than fluticasone/salmeterol in preventing exacerbations in chronic obstructive pulmonary disease: the PATHOS study. J Intern Med. 2013;273:584–94. doi:10.1111/joim.12067
  • Rockberg J, Jørgensen L, Taylor B, Sobocki P, Johansson G. Risk of mortality and recurrent cardiovascular events in patients with acute coronary syndromes on high intensity statin treatment. Prev Med Rep. 2017;6:203–9. doi:10.1016/j.pmedr.2017.03.001
  • Schildt EB, Eriksson M, Hardell L, Magnuson A. Oral snuff, smoking habits and alcohol consumption in relation to oral cancer in a Swedish case-control study. Int J Cancer. 1998;77:341–6. doi:10.1002/(SICI)1097-0215(19980729)77:3<341::AID-IJC6>3.0.CO;2-O
  • Huhtasaari F, Asplund K, Lundberg V, Stegmayr B, Wester PO. Tobacco and myocardial infarction: is snuff less dangerous than cigarettes? BMJ. 1992;305:1252–6. doi:10.1136/bmj.305.6864.1252
  • Joachims T. Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, editors. ECML. Berlin: Springer; 1998. p. 137–42.
  • Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106. doi:10.1007/BF00116251
  • WHO. Prevalence of smoking any tobacco products among adults aged greater than or equal to 15 years [Internet]. [cited 2018 Nov 20]. Available from: http://gamapserver.who.int/gho/interactive_charts/tobacco/use/atlas.html
  • Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines. Adv Kernel Methods Support Vector Learn 1998;208:1–21.
  • Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is “nearest neighbor” meaningful? Lect Notes Comput Sci. 1999;1540:217–35.
  • McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. Proceedings of the International Conference on Machine Learning. AAAI Press; 1998. p. 41–8.
  • Patil TR, Sherekar SS. Performance analysis of naive Bayes and J48 classification algorithm for data classification. Int J Comput Sci Appl 2013;6:251–61.
  • Pedersen T. Determining smoker status using supervised and unsupervised learning with lexical features. Proceedings of the i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. 2006. Available as JAMIA on-line data supplement at www.jamia.org.
  • Aramaki E, Imai T, Miyo K, Ohe K. Patient status classification by using rule based sentence extraction and BM25-kNN based classifier. Proceedings of the i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data. 2006. Available as JAMIA on-line data supplement at www.jamia.org.
  • Zhang H, Li D. Naïve Bayes text classifier. Proceedings of the 2007 IEEE International Conference on Granular Computing, GrC. 2007.
  • Frank E, Bouckaert RR. Naive Bayes for text classification with unbalanced classes. In: Fürnkranz J, Scheffer T, Spiliopoulou M, editors. Knowledge discovery in databases: PKDD 2006. Vol. 4213. Berlin: Springer; 2006. p. 503–10.
  • Feldman R, Sanger J. The text mining handbook: advanced approaches in analyzing unstructured data. Imagine 2007;34:410.
  • Bhosale D, Ade R, Deshmukh PR. Feature selection based classification using naive Bayes, J48 and support vector machine. IJCA. 2014;99:14–8. doi:10.5120/17456-8202
  • Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol. 2004;5:11–8. doi:10.3348/kjr.2004.5.1.11
  • Szarvas G, Iván S, Bánhalmi A, Csirik J. Automatic extraction of semantic content from medical discharge records. Proceedings of the i2b2 Work Challenges Nat Lang Process Clin Data. 2006. p. 1–5. Available as JAMIA on-line data supplement at www.jamia.org
  • Cohen AM. Five-way smoking status classification using text hot-spot identification and error-correcting output codes. J Am Med Inform Assoc. 2008;15:32–5. doi:10.1197/jamia.M2434
  • Clark C, Good K, Jezierny L, Macpherson M, Wilson B, Chajewska U. Identifying smokers with a medical extraction system. J Am Med Inform Assoc. 2008;15:36–9. doi:10.1197/jamia.M2442
  • Heinze DT, Morsch ML, Potter BC, Sheffer RE. Medical i2b2 NLP smoking challenge: the A-Life system architecture and methodology. J Am Med Inform Assoc. 2008;15:40–3. doi:10.1197/jamia.M2438
  • Wicentowski R, Sydes MR. Using implicit information to identify smoking status in smoke-blind medical discharge summaries. J Am Med Inform Assoc. 2008;15:29–31. doi:10.1197/jamia.M2440
  • Sohn S, Savova GK. Mayo clinic smoking status classification system: extensions and improvements. AMIA Annu Symp Proc 2009;2009:619–23.
  • Joachims T. Making large scale SVM learning practical. In: Schölkopf B, Burges CJC, Smola AJ, editors. Advances in kernel methods. Cambridge: MIT Press; 1999. p. 169–84.
  • Kotsiantis S. Supervised machine learning: a review of classification techniques. Informatica 2007;31:249–68.
  • Chiticariu L, Reiss FR, Li Y. Rule-based information extraction is dead! Long live rule-based information extraction systems! Conference on empirical methods in natural language processing, 2013. p. 827–32.
  • Gustafson P, Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. Boca Raton: Chapman and Hall/CRC; 2004.
  • Corbin M, Haslett S, Pearce N, Maule M, Greenland S. A comparison of sensitivity-specificity imputation, direct imputation and fully Bayesian analysis to adjust for exposure misclassification when validation data are unavailable. Int J Epidemiol. 2017;46:1063–72. doi:10.1093/ije/dyx027
  • Gustafson P, Le ND, Saskin R. Case-control analysis with partial knowledge of exposure misclassification probabilities. Biometrics 2001;57:598–609. doi:10.1111/j.0006-341x.2001.00598.x
  • Huang CL, Wang CJ. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst Appl. 2006;31:231–40. doi:10.1016/j.eswa.2005.09.024
  • Weiss SM, Indurkhya N, Zhang T, Emerging directions. In: Weiss SM, Indurkhya N, Zhang T, editors. Fundamentals of predictive text mining. 2nd ed. London: Springer-Verlag; 2015. p. 211–13.