A rubric for evaluating ChatGPT in helping diagnose ophthalmology patients
I’ve included for you below the explanation of what a rubric is and the rubric itself.
Rubric-Based Assessment of ChatGPT in Clinical Ophthalmic Diagnosis
To evaluate the performance of ChatGPT in supporting ophthalmology residents during clinical diagnostic reasoning, we developed and applied a structured assessment tool—a diagnostic performance rubric.
What is a rubric?
A rubric is an evaluative framework that defines specific criteria and performance levels. It is designed to support consistent, objective, and transparent assessment of complex tasks. In this context, our rubric systematically evaluates ChatGPT's ability to process clinical information, generate appropriate differentials, and justify final diagnoses in ophthalmic scenarios.
Rubric Structure
The rubric includes multiple performance domains:
Information Gathering—Accuracy and completeness of history and symptom analysis.
Diagnostic Reasoning—Logical consistency and relevance of clinical interpretation.
Differential Diagnosis—Appropriateness, breadth, and prioritization of differentials.
Final Diagnosis—Alignment with the correct diagnosis based on presented information.
Justification and Explanation—Clarity, clinical relevance, and use of supporting evidence.
Each domain is scored on a scale with clear performance level descriptors, enabling both quantitative and qualitative analysis.
Recommended Use
We propose the following best practices when applying this rubric:
Standardize Clinical Case Inputs
Ensure ChatGPT receives clinical scenarios with comparable structure and depth to maintain scoring validity.Use Multiple Evaluators
Involve more than one reviewer to improve scoring reliability and to identify inter-rater variability.Incorporate Qualitative Commentary
Combine numerical scores with narrative feedback to capture nuances in reasoning quality and content gaps.Iterate and Compare Across Models/Prompts
Use the rubric to track performance over different GPT versions or prompt strategies, informing optimal use in education.Support Resident Learning
Share rubric-based evaluations with trainees to foster critical thinking about diagnostic processes and the limits of AI-generated responses.