67
Views
0
CrossRef citations to date
0
Altmetric
ORIGINAL RESEARCH

Comparison of Machine and Human Expert Evaluation of Capsulorrhexis Creation Performance Through Analysis of Surgical Video Recordings

ORCID Icon, , , , , & ORCID Icon show all
Pages 943-950 | Received 15 Oct 2023, Accepted 11 Mar 2024, Published online: 27 Mar 2024
 

Abstract

Purpose

Achieving competency in cataract surgery is an essential component of ophthalmology residency training. Video-based analysis of surgery can change training through its objective, reliable, and timely assessment of resident performance.

Methods

Using the Image Labeler application in MATLAB, the capsulorrhexis step of 208 surgical videos, recorded at the University of Michigan, was annotated for subjective and objective analysis. Two expert surgeons graded the creation of the capsulorrhexis based on the International Council of Ophthalmology’s Ophthalmology Surgical Competency Assessment Rubric:Phacoemulsification (ICO-OSCAR:phaco) rating scale and a custom rubric (eccentricity, roundness, size, centration) that focuses on the objective aspects of this step. The annotated rhexis frames were run through an automated analysis to obtain objective scores for these components. The subjective scores were compared using both intra and inter-rater analyses to assess the consistency of a human-graded scale. The subjective and objective scores were compared using intraclass correlation methods to determine relative agreement.

Results

All rhexes were graded as 4/5 or 5/5 by both raters for both items 4 and 5 of the ICO-OSCAR:phaco rating scale. Only roundness scores were statistically different between the subjective graders (mean difference = −0.149, p-value = 0.0023). Subjective scores were highly correlated for all components (>0.6). Correlations between objective and subjective scores were low (0.09 to 0.39).

Conclusion

Video-based analysis of cataract surgery presents significant opportunities, including the ability to asynchronously evaluate performance and provide longitudinal assessment. Subjective scoring between two raters was moderately correlated for each component.

Disclosure

Dr Nambi Nallasamy reports a patent 63/445,053 pending. The authors have no other conflicts of interest in this work.

Additional information

Funding

This work was supported in part by the Graduate Medical Education Innovations Fund (NN, BT), The Doctors Company Foundation (NN, BT), NIH K12EY022299 (NN), and Fogarty/NIH D43TW012027 (NN).