Council on Medical Student Education in Pediatrics

COMSEP Logo

Search This Site

Journal Club

Comparison of an Aggregate Scoring Method with a Consensus Scoring Methods in a measure of Clinical Reasoning Capacity. Charlin B, Desaulniers M, Gagnon R, Bloin D and vander Vleuten C. Teaching and Learning in Medicine 2002; 14(3):150-156. Reviewed by Leslie Fall, Dartmouth Medical School


Comparison of an Aggregate Scoring Method with a Consensus Scoring Methods in a measure of Clinical Reasoning Capacity. Charlin B, Desaulniers M, Gagnon R, Bloin D and vander Vleuten C. Teaching and Learning in Medicine 2002; 14(3):150-156.

Reviewed by Leslie Fall, Dartmouth Medical School

Description: This article describes another test of clinical reasoning assessment, the Script Concordance Test (SCT). The article focuses not on student performance, but on the method used to develop the scoring method for this test. In the SCT, students are given a brief clinical scenario and then asked a question in which they must rate the likelihood of a given hypothesis on a Likert scale of - 3 (ruled out) to +3 (ruled in). An example question: "If you were considering the diagnosis of otitis media (hypothesis), and the mother told you that the child has been pulling on his ears (new clinical information), this hypothesis becomes: ruled out (-3) through ruled in (+3)." In this study, the experts were given the examination (60 items) and 2 methods of scoring the examination were developed. In the first, the aggregate method (A method), any answer given by the expert was considered correct and a weighted scoring system was developed. In the consensus method (C method), the experts were asked to reach a group consensus on which single answer was correct. The question asked by the study: "Do experts provide the same answer when they take the test individually and when they provide "the good answer" in a group meeting?" Fifty-nine percent (59%) of answers given by experts were different than those they gave when they were placed in the consensus condition! It is well known that experts differ in the multiple decisions that are made in a clinical reasoning process, whereas they usually converge toward a similar outcome. The authors argue that the aggregate scoring method better captures the kinds of judgments clinicians make in their daily reasoning process and that the expertise lies in subtle differences in reasoning processes, which disappear when experts are required to talk with each other to reach a consensus

I found both the description of the SCT and the results of the scoring study fascinating. More than the KFE described above, this test of clinical reasoning seems to me to better dissect and question how clinicians truly reason through a given clinical problem. The differences between the aggregate scoring and consensus scoring methods did not surprise me, and I agree with the authors that the aggregate method is significantly more in line with the task we are asking of students. However, I am not sure I agree with the decision to accept all of the experts' answers as correct (i.e. how would you answer the example question?). It seems to me that a range of acceptable answers would be a better method. I would suggest that anyone interested in clinical reasoning skills and in testing students' abilities in this area should read this article.

(Do you formally assess diagnostic reasoning - in any way - other than by a global impression of the preceptors? Steve Miller)

Comment: The previous two studies point out some important points about clinical reasoning. First, it appears that clinical reasoning is context specific - this means that someone might be a terrific "reasoner" for one type of problem - like abdominal pain - and poor for another - like the work up of vertigo. So, in order to teach or assess clinical reasoning - you need multiple cases to do so. Second, assessment tools (both of the methods described in the studies above) rely on recognizing the key features that lead to specific decisions and the weight each feature has in swaying the decision. It is worth looking at both tools described to see the subtle differences. Which tool makes the most intuitive sense to you? Steve Miller

Return to Journal Club