Investigating Disparity between Global Grades and Checklist Scores in OSCEs.  Pell G et al.  Medical Teacher 2015; 37:1106-1113.

Reviewed by Gary Beck Dallaghan

What was the study question?
Can the misalignment that occurs between checklists and global ratings for some OSCE stations be quantified?

How was the study done?
OSCE data from multiple cohorts and different undergraduate year groups from a single medical school in the UK were obtained.   (Both a detailed checklist and global rating scale were used to document performance on the OSCE stations.)  The Borderline Regression Model as a means of standard setting was used.  In using this, the authors employed regression to compare checklist items and the global rating to identify disparities using a Chi-square model to identify misclassifications.


What were the results?
Once the models were developed, they applied them to final year OSCE stations using a hybrid of traditional checklist and global rating.  This allowed them to conduct the post hoc analysis.  Each station was analyzed for discrepancies.  Focusing on the misclassifications, they found stations with known metric concerns had a high degree of asymmetry for pass/fail ratings.  In reviewing stations with sound historical metrics, there was still misalignment in checklist ratings with global rating.  Ultimately, their findings indicate that assessors had a hard time making a global decision of performance that accurately reflected student performance, as determined by the “key features” checklist.


What are the implications of these findings?
With the emphasis on entrustable professional activities for medical students, accurate assessments of performance are vital for entrustment decisions.  As clerkship directors (and medical schools) consider their student performance evaluations, this study indicates that there are occasionally discrepancies between checklists and global ratings scales.


Editor’s note:  This article is statistically complex but has a compelling message: checklists and global rating scales measure different things (we already knew this!) and there may be a discrepancy between these assessment measures.  We should look for such disparities and attempt to understand them in order to improve the OSCE station quality (SLB).

