Abstract
Objectives
To document the process of linking breathalyzer and motor vehicle crash (MVC) data for the State of Connecticut using a unique identifier in the place of personal and private information.
Methods
Deterministic linkage methodologies were utilized in Microsoft SQL Server to join 5,634 (of 6,650) breathalyzer records to corresponding MVC driver records for the period of January 1, 2017 to December 31, 2022. Differences between the linked and original datasets were documented by comparing the consistency of frequency and proportion distributions of key variables.
Results
Proportions of annual records, alcohol breath tests, and refusals were nearly unchanged when comparing linked and original breathalyzer data. When examining variables in the original MVC driver records, there were differences in the within-group proportions for sex and age, with an overrepresentation of males and drivers aged 26-to-40 years old. For crash and injury severity, the linked dataset had lower proportions of more severe injury records when compared to the original MVC data. Additionally, 1,007 breathalyzer records were not matched with an associated MVC record.
Conclusions
Linkage methodology is sound and produced quality matches. The use of a unique identifier provided a strong match qualifier in the absence of personal and private data. Changes in proportions for age, sex, crash and injury severity align with previous research. Potential missed matches may be attributed to several factors outside of the linkage process, including data discrepancies and varied reporting practices. Future studies will further explore these differences and incorporate additional toxicology data as part of a continued effort to fuze crash, citation, toxicology, and public health data. The end result will be a holistic, comprehensive, and multifaceted database for transportation research and education.
Acknowledgments
The authors would like to thank the Connecticut Traffic Records Coordinating Committee and other state agencies participating in the cross-system data linkage effort and to everyone who was directly or indirectly involved in the success of this research.
Disclosure statement
Both authors are employed by the University of Connecticut, which houses the Connecticut Crash Data Repository through a grant funded partnership with the state Department of Transportation (CTDOT). Beyond providing data, no other agency had any involvement in the study design, interpretation of results, writing of the report, nor in the decision to submit the article for publication.
Data availability statement
The MVC data used in this study are publicly available in the CT Crash Data Repository at https://www.ctcrash.uconn.edu/. The breathalyzer data used in this study were provided by the state toxicology lab and are not publicly available for privacy reasons.