The Written Examination
David K. Kalwinsky, M.D.
Omer Berger, M.D.
In a pediatric clerkship, students perceive the emphasis on their learning to be centered around development of interpersonal skills and accumulation of a body of factual pediatric knowledge.In medical education, subjective test such as essay exams, or objective tests such as multiple-choice, matching, or true/false exams, are all utilized for assessment of student learning skills.In the majority of clerkships, either an objective in-house written examination or the NBME pediatric subject exam is utilized to evaluate factual knowledge.
There are a number of valid uses for a written examination in the pediatric clerkship.As a pretest, the written exam can determine the degree of background knowledge a student brings to a course.Periodic testing during a clerkship can help identify those students who have difficulty in comprehension and thus permit remediation.At the end of a clerkship, the written exam usually serves as an achievement test to help determine a student’s grade.
From the student perspective, an examination can motivate by providing a framework around which to study,can improve retention of material and can complement the learning process by requiring interpretation of facts and application of data.To be maximally effective as a teaching tool, a pediatric written exam should be reviewed with the student, focusing on areas of weakness in performance.
The ideal written exam for a pediatric clerkship should be based on clearly defined course objectives and curricula; including lecture materials, readings, handouts, and essential clinical experiences. The written exam should test important, not trivial, facts and concepts, as well as application of knowledge to clinical situations and problem-solving skills. One way of determining if the exam is balanced in its coverage of curricular objectives is first to outline the areas to be covered by the examination and the number of questions planned for each content area. Constructing an outline of the content areas after questions are developed, checks whether the exam has accurately covered the objectives as planned.
The Multiple-Choice Exam
The multiple-choice exam is the preferred objective test to measure short supply type answers.In general, the multiple-choice type of examination provides the most adequate measure of learning-outcome to subject-content.It can cover a large spectrum of clinical knowledge and its application. It also can readily be varied from simple to complex for assessment of different learning outcomes. Several options for multiple-choice exams are discussed here.
The ideal multiple-choice exam should include questions from a representative sample of a course’s curriculum.Within a department, a number of instructors should submit exam questions based on their curriculum to an exam committee.A exam committee can standardize exam difficulty and clarify the narrative (editorial function), select subsets of question that reflect a course’s priorities and avoid selection biases which are found when exams are prepared by individual instructors.
The typical multiple-choice question (type A) provides a stem that defines a problem or clinical situation, then provides several (usually three to five) alternatives, which are solutions to the problem.One of these alternatives should be the best correct answer, while the others are plausible but incorrect answers called distracters.Exam questions should attempt to assess problem solving and the application of basic data to common pediatric clinical situations.Questions can be made more complex by requiring analysis of data, review of graphs or photographs in the stem before choosing among alternative answers.
The multiple-choice exam should have clear instructions so that it can be self-administered.A favorable environment (i.e. quiet room with ample space) is essential for good test taking.The following are some general rules for construction of an ideal multiple-choice exam:
- Clearly state the focus of the problem in the stem. Avoid extraneous material or long narratives.
- With the stem, include 3-5 feasible alternative answers, one of which is correct.
- Keep the main content of the stem and the answers as short as possible.
- Avoid negative stems.
- Avoid the use of the K-type questions.K-type questions (i.e. A=1,2,3; B=1,3; C=2,4) are labor intensive, difficult to construct, and are less discriminating.
- Avoid “all of the above” or “none of the above” as alternative answers since they decrease test score reliability.
- Do not put absolutes in the stem such as “always” or “never”.
- All alternatives should be compatible with the stem and equally plausible.A test taker should not be able to eliminate a distracter as either being irrelevant or trivial.
- Eliminate clues to the proper response which are independent of question content, i.e. all responses should be of equal length, use uniform grammar throughout, randomly distribute the correct response among the three to five alternatives.
In addition to the type A multiple-choice question, one can consider using matching multiple-choice questions.Matching format can include 4 or 5 responses (type B) or a longer list of options (extended matching or type R). The extended match is thought to be the most like real clinical problems, and thus better evaluates students diagnostic and management skills. Because of more answer options (10-20) it is possible to include diagnoses from diverse causes and not compartmentalize by subspecialty. When developing this type of questions one first develops a clinical theme for the problem set (e.g., failure to thrive)A common lead in to all the stems is developed (e.g., for each patient with failure to thrive, select the most likely diagnosis). This is followed by brief typical scenarios which have a matching diagnosis among the answer options. When developing this type of question one starts with developing the clinical brief descriptions typical of the diagnosis within the overall theme, then generating the answer list. The bibliography contains a practical reference which is available to assist in developing this question type.
Grading of the multiple-choice exam is straightforward, often using a point system (i.e. number of questions correct),with all items having equal weight.To increase the relative weight of a subject area such as growth and development, one simply increases the number of questions pertaining to that area.Item analysis of individual items or subject areas allows the educator to evaluate how well a test discriminates between high and low scores, whether the distracters were effective, and allows identification of those subject areas which require additional class review or further study.Overall student performance by subject area (e.g., adolescence, nutrition, fluid and electrolytes, or therapeutics) can be readily quantitated.
The essay exam places emphasis on the ability of the student to integrate information, prioritize and express his or her ideas.The student is asked a question or posed a problem and then asked to develop an answer.The essay exam is ideal for testing complex learning behaviors.As with the multiple-choice exam, the questions raised should be related tospecific learning objectives in the pediatric curriculum.In more open-ended essays, multiple learning objectives can be tested.In the ideal essay exam, ample time is needed to properly address each question, and page limitations should be specified for open-ended questions.Students should be provided with guidelines as to how to budget their time on the exam.
The modified essay exam format has been developed as a more structured response to the open-ended nature of the essay exam. It involves providing a sequence of information with questions at each stage.Students can move on to the next stage of a given scenario, as the previous stage is completed.
Scoring of an essay exam remains a major problem and has limited the usefulness of this form of testing in many pediatric clerkships.Ideally an essay question should have a model or template answer, which is used in grading.The model can be divided into areas that are then assigned points and respective essays marked on a point system.For more a open-ended essay, grading is even more difficult and is best achieved by a rating method using preset criteria for each rating level. The modified essay exam usually has brief answers established in advance.For uniform grading of an essay exam, the same question should be scored on all examinations before proceeding to the next question, to avoid the “halo” effect where the grader is influenced by the remainder of the answers on an individual test.Ideally, essay exam responses should be graded with the student remaining anonymous and should include independent evaluation by more than one faculty.Grammar, syntax, and legibility of writing should not be permitted to influence grading of essay exams.
The strength of a multiple-choice exam lies in its ability to measure a diverse spectrum of learning outcomes.A 150 question exam can readily cover 10-12 subject areas within a pediatric curriculum.Unlike true/false questions, in which a student can approximate 50% correct answers by guessing, multiple-choice questions are more effective in discriminating through the multiple of choices available.A carefully written multiple-choice exam can measure knowledge base, analysis of data and comprehension.It, however, can not readily measure complex synthesis of knowledge.While preparation of a good multiple-choice examination can be difficult and time consuming, scoring of this exam is quite straightforward and subset analysis can reliably be performed.Student performance, however, can be influenced by “test taking skills”.
The essay exam is excellent for evaluating comprehensive subject matters and encourages the student to organize, interpret, and synthesize information.By the nature of the essay exam, it can only cover a limited spectrum of a course curriculum.Scoring of this exam remains a problem and is time consuming.Because of instructor bias, independent evaluation by several scorers is ideal.Writing ability, grammar, and syntax can influence scoring.
The NBME pediatric subject exam is a standard multiple-choice format and is currently used in approximately one-third of pediatric clerkships as a learning assessment tool.While the NBME test is standardized to a large normative base of medical students nationwide, it does not measure what is taught in an individual pediatric clerkship.Student performance is influenced by cumulative clinical knowledge that is built upon by the student’s experience in medicine, surgery, and ob-gyn clerkships.The NBME functions best as a credentialling or competency exam of broad-based clinical skills.Since it does not directly measure an individual pediatric curriculum program, it is unclear how student performance on the NBME shelf exam should be factored into assigning a course grade in a pediatric clerkship.The NBME recommends that their subject exam be only one factor in a clerkship grade determination and further recommends that normative data and your school’s overall track record on such exams be factored into the final grade assessment.The NBME Pediatric Subject Exam has only limited utility as a teaching tool in a clerkship sincedetailed item analysis is seldom available.The limited access to item analysis and the restrictions which NBME places on data review, precludes the NBME subject exam from serving as a useful educational experience for most students.
A multiple-choice exam ideally should be created by a diverse spectrum of faculty who are familiar with a clerkship.A final test should be edited and selected by a test committee.Alternative 100 to 200 questions can be obtained from a test bank of 500 to 700 items available through organizations such as COMSEP.Multiple-choice selection can be made by computer (i.e. the COMSEP Clearing House).Write to Jennifer Johnson, M.D., Department of Pediatrics, University of California, Irvine, 101 The City Drive, Building 27, Route 81, Orange, CA 92668, (714) 456-6155, Fax (714) 456-7658.
The pediatric NBME subject exam can be ordered from the National Board of Medical Examiners, 3930 Chestnut St., Philadelphia, PA19104 (215) 590-9500 or Fax (215) 590-9777.The NBME must receive an order form signed by the chief proctor three weeks before the exam is to be given.
An essay exam should consist of at least three representative questions covering a broad base of your curriculum.Students should not have a choice of essay exams; questions should be graded anonymously and more than one instructor should grade each answer.The essay format can supplement a multiple-choice exam or be used in lieu of an oral exam in a clerkship.
The multiple-choice exam is time intensive in creating the question bank of carefully crafted reliable items.Preparation time to select representative items, to grade and to analyze results are relatively rapid and inexpensive.The NBME pediatric subject exam costs $25 per test plus $25 for administration fee for each exam day.
The essay exam is particularly costly in terms of time required for grading. The modified essay exam is more structured and easier to grade. For larger classes, administration time and time for careful grading and analysis are prohibitive and have limited the use of the essay exam in many clinical clerkships.
Gronlund NE.Construction of Achievement Tests.Prentice Hall, Englewood, NJ pp.1-89. 1968.
Haladyna TM, Downing SM. Validity of taxonomy of multiple-choice item-writing rules.Applied Measurement Education 2:51-78. 1979.
National Board of Medical Examiners.Interpretation and use of NBME clinical science subject test scores.Newsletter for NBMEAugust 1991.
Osterlind SJ. Constructing Test Items. Kluwer Academic Publishers, Boston, MApp. 173-217. 1989.
Patel VJ, Dauphinee WD. The clinical learning environment in medicine, pediatrics and surgery clerkships.Medical Education 19:54-60; 1988.
Rabinowitz, H.K., The modified essay question: an evaluation of its use in a family medicine clerkship, Med. Educ.21:114-118.1987.
Sahler, O.J., Lysaught, J.P., Greenberg, L.W. et. al.A survey of undergraduate pediatric education.Progress in the 1980’s?Am. J. Dis. Child.142:519-523.1988.
Case SM, Swanson DB. Extended matching items: A practical alternative to free-response questions, Teach. Learn. Med. 5:107-115. 1993.