Sherilyn Smith,MD, Pediatrics, Douglas Brock, PhD, Family Medicine, Lynne S. Robins, PhD, Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, Michael S. Dell, MD, Pediatrics, Case Western Reserve University, Cleveland, OH, Norman B. Berman, MD, Pediatrics, Geisel School of Medicine, Lebanon, NH, Jennifer R. Kogan, MD, Internal Medicine, University of Pennsylvania, Philadelphia, PA
Background: Assessing students’ clinical reasoning requires a multi-faceted approach, including evaluation of their skills in tasks such as formulating a summary statement. Assessment of students’ summary statements provides critical information about how well they are able to interpret, synthesize and prioritize patient information.
Objective: Describe the development and provide preliminary evidence for the validity of a framework to assess students’ summary statements.
Methods: From 155,272 examples, 85 summary statements from 8 different Virtual Patient (VP) cases (CLIPP, SIMPLE and fmCASES) were randomly sampled for analysis. Four investigators independently reviewed summary statements in groups of 10 describing their content and creating their own assessment frameworks. Coding structures were compared iteratively until consensus was achieved, which occurred after 50 summary statements. Investigators applied the consensus assessment framework to 30 summary statements, provided a rationale for their ratings and completed a survey about their general approach to rating. Rater concordance was determined by kappa calculations (within and across cases). Non-parametric Spearman rank order correlations were used to explore agreement between rater pairs. Investigator comments describing their rating approach were analyzed using content analysis.
Results:The consensus assessment framework included four key elements: factual accuracy, appropriate narrowing of the differential diagnosis, transformation of information, and use of semantic qualifiers. A holistic rating also was developed. Inter-rater averages of application of framework components were generally acceptable (Viera, 2005): accuracy (kappa = 0.68), appropriate narrowing of the differential diagnosis (kappa=0.39), transformation of information (kappa=0.40), use of semantic qualifiers (kappa=0.50) and holistic score (kappa=0.60). One of the four raters was significantly different from the other three, generally providing lower ratings (p<0.05). Raters agreed on the general approach to rating but focused on different aspects of the consensus framework when rating student work.
Discussion: This framework holds promise as a tool for structuring feedback to learners about their summary statements. More development is needed for use in higher stakes summative assessment. Next steps include application of the framework to different VP cases, exploration of factors contributing to inter-rater variability, gathering additional validity evidence and identifying effective faculty training requirements.