June 2024

Hello COMSEP!

Yes, this is the June edition, dedicated to all of us who manage to get their work in just before the deadline.

 

A quick reminder in case you missed it. For every Journal Club review you agree to write between April 2024 and April 2025, your name will be entered into a raffle. At next year’s Annual Meeting, we will draw the names of one COMSEP member and one medical student--each of whom will be awarded a free meeting registration for the 2026 Annual Meeting.

There are still openings for reviewers the coming year--so reach out to sign up now.

Enjoy!

Jon, Karen and Amit


Should ChatGPT Write Examination Questions for Medical Students?
Laupichler MC, Rother JF, Grunwald Kadow IC, Ahmadi S, Raupach T. Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions. Acad Med 2024; 99(5): 508-512. https:/dx.doi.org/10.1097/ACM.0000000000005626

Reviewed by Gary L. Beck Dallaghan (with help from NotebookLM)

What was the study question?
Do differences exist in student performance on multiple-choice questions (MCQs) generated by ChatGPT 3.5, a large language model (LLM), compared to questions written by experienced medical educators? Can students differentiate between LLM- versus human-generated questions?

How was this study done?
Two sets of 25 MCQs about neurophysiology were generated, one set by an experienced medical educator, the other by ChatGPT 3.5 with guidance from prompts specifying the topic, learning objective, and question construction.

161 second-year medical students participated in an ungraded preparatory exam prior to their final neurophysiology exam. Four of the initial 25 LLM-generated questions were excluded from analysis due to inaccuracies, leaving a final data set of 25 human-generated questions and 21 LLM-generated questions.

What were the results?
No significant difference was found in the difficulty level between the two question sets. However, the human-generated questions demonstrated significantly higher discriminatory power, meaning they were better at differentiating between high and low-performing students. Students were able to correctly identify the source of the question 57% of the time.

How can this be applied to my work in education?
Medical students are always asking for question banks. This study demonstrates that with the appropriate prompts, questions for formative assessments could readily be generated. Of note, the lower discriminatory power of the LLM-generated questions indicates that further refinement of LLM capabilities and prompt design is necessary. Since it has been found that writing a good MCQ can take up to one hour of time, use of this tool could help faculty immensely. A final word of caution…some questions were removed from the experiment due to the LLM producing completely or partially incorrect answer options. Since LLMs are designed to provide a response to requests, there continues to be “hallucinations” and merit careful review of any output received from these LLMs.

Editor’s Note: This is the second article chosen in the last two months that focuses on ChatGPT. One thing is clear--for better or worse, artificial intelligence is part of the future of medical education. (JG)


Welcome to Medical Education Research and Scholarship
Bluemel AH, Gillespie H, Asif A, Samuriwo R. How to…Navigate Entry into the Field of Clinical
Education Research and Scholarship. Clin Teach 2024; 21:e13686. https://dx.doi.org/10.1111/tct.13686

Reviewed by Gary L. Beck Dallaghan

What was the article about?
This article addresses a challenge faced by many faculty who are engaging in medical education research and scholarship for the first time. The authors sought to offer practical guidance using professional experiences and review of the literature. As a result, they offer advice on establishing networks, managing time, learning about educational research methodologies, and generating research ideas.

What advice do they share?
The authors note that as one pursues educational research and scholarship, there may be a lack of mentors. They recommend finding mentors locally, and if nothing else, a group of
like-minded peers to grow together. They list a variety of European-based medical education organizations, but a similar list could be curated for the U.S. The take home from this suggestion is to broaden your network outside of your specific discipline. Every clinician knows how challenging it is to schedule time to participate in research. The authors emphasize the importance of collaborating with others, taking it slow at first and then becoming a lead as you gain confidence in your skills. As you begin your educational research journey, it is important to be transparent about skills you feel you need to develop. This article contains detailed tables listing resources related to a variety of educational research methods. These resources range from articles to book chapters to podcasts. When it comes to generating ideas to research, we all have questions that come to mind. Use your newly formed network to discuss these ideas. As you gain more experience, ideas will pop up at the oddest times.

How can this be applied to my work in education?
This is a fantastic article with practical guidance for those wanting to begin conducting
educational research and scholarship. The tables in the article read like an annotated
bibliography and are useful for even the most accomplished researcher to use. Given COMSEP members’ desire to generate scholarly products from their innovations, this article can provide some useful guidance on how to get that done.

Editor’s Comments: This article is concise yet packed with high yield suggestions and resources. The emphasis on developing a network and becoming involved as a collaborator in educational scholarship is at the very heart of COMSEP’s strategic plan. Do you have any junior colleagues that you could invite into our network? (KFo)


Validity Evidence for Assessing Third-year Medical Student’s Clinical Reasoning with a Computer-based Mapping Exercise
Torre DM, Mamede S, Bernardes T, Castiglioni A, Hernandez C, Park YS. Promoting Longitudinal and Developmental Computer-Based Assessments of Clinical Reasoning: Validity Evidence for a Clinical Reasoning Mapping Exercise. Acad Med. 2024 Jun 1;99(6):628-634. https://dx.doi.org/10.1097/ACM.0000000000005632

Reviewed by: Margaret Huntwork

What was the study question?
What’s the validity evidence for using a computer-based clinical reasoning mapping exercise (CResME) to assess clinical reasoning for third year medical students?

How was it done?
During an intersession, 120 third year medical students completed three cases on a
computer-based CResME. The exercise had 3 patient presentations. Each presentation had blocks with HPI, physical exam, and diagnostic studies. Students entered 4-5 differential diagnosis and then chose which parts of HPI, physical exam, and diagnostic study fit best with the diagnosis. Using a rubric developed under consensus from eight clinicians, two faculty scored each student. Faculty were surveyed after they scored the results. Authors assessed content validity, response process validity, internal structure validity, and relationships to other values (external validity).

What were the results?
Students scored highly on 2 of 3 mapping exercises. In scoring the exercises, interrater reliability was high (0.92 overall with range of 0.86-0.97). Internal consistency reliability (Cronbach alpha) was also 0.92 overall (range 0.75-0.91). The G and Phi coefficients were 0.45 and 0.56, respectively. Students represented 10% and case specificity (the interaction between the student and the case) represented 8% of the variance. There was no variance from raters. Faculty felt the cases were easy to score and took 5-10 minutes per student. Significant moderate correlations (0.32 – 0.36) were found with NBME subject exams except surgery. There were no correlations with overall OSCE score but there was a moderate correlation with the patient note.

How can this be applied to my work in education?
Studies of validity try to uncover whether an assessment tool finds meaningful and truthful
results. Authors purport that they found sufficient validity evidence indicating that this
assessment tool meaningfully and truthfully assesses clinical reasoning. They argue that the tool should be considered as one of several assessments to formulate a holistic conclusion about an individual’s clinical reasoning skills.

Editor's Note: This is a very nice study on how to determine validity for an assessment. It also
shows that we can assess clinical reasoning in learners even though clinical reasoning is a very complicated process. The computer based entry of this system makes it easy to distribute and while faculty felt it was easy to grade, it would be even better if computer graded. (AKP)