Section F

Computer Technology for Education and Evaluation

Robert L. Janco, M.D.


Researchers in medical education find themselves at a unique juncture in the evolution of educational processes.Computing technology and its supporting software have outpaced our understanding of the acquisition and retention of cognitive skills necessary to a medical education.Nevertheless, high-speed,low-cost, multimedia-capable personal computers with increased memory and accompanying software enable course directors to apply advanced information technology to traditional modes of teaching/learning and evaluation of medical students.Whether application of advanced technologies truly enhances medical education remains an unproven but potentially fruitful area of research in medical education.

Multiple applications of such decentralized computing technology include rapid numerical scoring and analysis of written examinations, computer-based instruction such as computer-based testing, computer-assisted instruction, and ultimately, artificial intelligence.Computer-assisted instruction may include drills and practice, tutorials, patient case simulations, gaming, problem-based learning exercises, model building and testing.All of these make use of hypertext or multi-media technologies.Potential applications beyond these are virtually limitless.

Definition of Terms

Computer-based testing (CBT), sometimes referred to as computer-assisted examination, is perhaps the simplest application of computing technology to the evaluation process.Basically it consists of the transfer of conventional testing instruments such as multiple-choice examinations to the computer. Students refer to questions on a monitor and highlight their choices using a mouse or cursor, or by typing a letter or number.When programmed correctly, such CBT allows students to review responses, go back to change answers, and quit the exam when finished.Additional formats such as matching questions, completion tasks, and even essay questions may be programmed for computers or individual work stations on local area networks (LAN).

Computer-assisted instruction (CAI) refers to processes whereby computers or LAN work stations are pre-programmed to provide students with customized learning exercises that conform to the skill or knowledge level of the student, allowing him/her to proceed at an individual pace.Implicit in CAI is levels of help or assistance that allow students to retrace their steps in the learning process, going back to earlier phases of the program to reinforce or clarify concepts.

Problem-based learning (PBL) represents a teaching method primarily used in small groups supported by a facilitator who helps the group define study questions, identify learning issues, and assign learning tasks to group members.When confronted by a complex problem, groups using PBL are able to use the collective energy of the group to accomplish several learning tasks simultaneously and efficiently.Such a broad-based approach to problem solving more closely represents adult learning strategies rather than the traditional pedagogical approach with lectures and structured subject-based formal teaching.Several key aspects of PBL may be translated to computer-based exercises.

Multi-media, strictly speaking, refers to the ability of computer platforms to generate or display information in text, graphic, pictorial, audio, or video animation formats or media. Hypermedia refers to a programming concept and collection of capabilities that enable user-defined linkages of such multi-media, for example allowing an individual reading text to selectively choose items, words, graphic images, or phrases that expand to include additional text, visual information such as graphics, pictures and animation, or audio data such as sounds, music, and the spoken word.Rather than linear information such as a traditional book with pages and chapters, hypermedia adds additional dimensions to learning by enabling user-defined links.The recent explosion in CD-ROM publishing takes advantage of the storage capacity of this medium to enable hypermedia with sounds, pictures, animation, and text in a variety of commercial applications.When applied in network environments with easy access to multiple users and authors, interactive hypermedia as a learning tool assumes even greater capability.

Multi-media techniques may be applied to CBT by providing pictures and sound for the multiple-choice exam, or it may be used extensively in CAI, allowing students to study a subject in greater depth and breadth as they see fit.Linking visual images with text-based learning is a keystone of hypermedia applications, as such linkage theoretically at least strengthens the retention of key concepts.

Review of literature

The literature describing the application of computer-based learning and evaluation techniques in pediatric medical education is just now emerging.Programs described in the literature exist to teach problem solving in anemia and coronary artery disease, to give computerized feedback on diagnostic judgment, to teach cardiac auscultation and well-newborn care, to simulate cardiac or intensive care patients, to evaluate seminars on child abuse, and to evaluate problem-solving difficulties of students2-9.Many other developers have created similar computerized learning or evaluation materials that are not yet described in the literature.

Criteria to evaluate new technology

While the relatively low cost and tremendous capability of individual computers using new multi-media technology portends their widespread application to newer or innovative learning and evaluation strategies, wide-scale adoption of such technology should await or proceed coincidentally with the development of common technical standards and standardized criteria to evaluate its usefulness.Particular issues deserving attention include the traditional reliability and validity of any evaluation process (courseware content), as well as feasibility, cost, efficiency gains, student acceptance (use),retention of concepts (achievement of learning outcomes), and quality improvement.

    • Reliability

Reliability (see also Basic Principles of Evaluation: An Overview, Section K) in the strict measurement sense means the consistency of an evaluation process or test in correlating observed with true scores.As true scores cannot be obtained in most cases, reliability is often estimated by testing examinees with several instruments and correlating their reliability mathematically.Applying such numerical methods to computer-based materials in medical education is problematic. Nevertheless, one of the challenges inherent in such applications will be the demonstration of their reliability.At first glance, solutions including the use of expert systems to document the most correct approach may be employed.Then, the issue becomes how to measure consistently the deviation from the ‘most correct’ approach and correlate the extent of deviation with a predetermined true score.In simpler terms, one might choose a panel of experts to take a computer-based test or interactive learning module.The experts’ consensus represents the most correct approach.Students taking the same test or module may deviate from that approach by choosing incorrect responses, ordering too many tests or formulating too few hypotheses, for example.One must then decide the limits of acceptable deviation based on the students educational level and the established norms for each chosen educational level.

    • Validity

Validity in the learning environment means that an evaluation strategy or technology actually measures the skills or knowledge which are truly relevant.In other words, a multiple-choice exam truly measures a students acquisition and retention of certain prescribed factual information or learning concepts.Simply stated, a testing strategy is valid if it measures what is purports to measure.

The major types of validity are content validity, criterion-related validity, and construct validity1.For rapidly expanding fields or disciplines such as clinical medicine, content validity may be difficult to establish as the information explosion alters the importance of certain facts or concepts.Newer data supplant old information and theories.Nevertheless, content validity is usually the first concern in the construction of testing instruments or strategies.In contrast, criterion-related validity attempts to predict outcome behavior based on test scores.For example, a strong correlation between a job test score and a successful job performance, however defined, would be strong criterion-related validity.It implies predictive potential and the ability to define what constitutes successful or desirable outcomes.

Construct validity is the degree to which a process measures the theoretical construct or trait it was designed to measure.As a newer concept, construct validity attempts to verify predictions made about test scores.The methods to test construct validity are more complicated as they attempt to examine a pattern of correlations among tests measuring traits in different ways.

Most computer-based evaluation strategies appear attractively feasible at first; however, when examined in greater detail, they may be more complex.In most cases however, feasibility will primarily depend on the commitment of medical educators at various levels to CAI.Such commitment will translate into monetary support for the hardware, software, and salaries of interested faculty to develop courseware.

Development costs for computer hardware, software, and courseware will exceed the costs of the traditional lectures, textbooks, and other learning materials; however, when hardware is already available and authoring schemes are in place, the marginal cost of developing and implementing CAI modules will diminish.Moreover, the advantages of CAI may lead to a reduction in the need for traditional lectures, saving both time and money as well as allowing faculty to explore more personalized and innovative teaching strategies.

Efficiency in the traditional medical education context implies, but is not limited to, the concept of less time required to prepare for formal teaching tasks such as lectures and greater freedom to explore more intimate, less formal student-teacher interactions such as bedside rounds, case discussions, or laboratory meetings.For example, if a formal lecture taking 6 hours of preparation could be replaced by a hypermedia learning module, the lecturer has more time to answer student questions, clarify difficult concepts, explore new ideas, and obtain feedback from students.

Retention of important basic principles or concepts useful in clinical medicine remains an important goal for medical educators.While emphasis must be placed on life-long learning and adult-learning strategies, certain basic ideas must be thoroughly integrated into a physician’s daily knowledge base.For example, cardiopulmonary resuscitation must be thoroughly familiar to practicing health care providers.CAI modules in CPR and Advanced Cardiac Life Support are in fact already commercially available. One should be able to demonstrate that these or similar products do in fact enhance retention of such basic concepts compared to more traditional modes.

Feedback for actual achievement of desired learning outcomes is an important goal in evaluating computer applications in learning.Those who incorporate computer-based evaluation or learning techniques should be able to demonstrate that the learning outcomes desired are in fact accomplished.This implies some form of retesting at a later time.

Continuous quality improvement (CQI) has become a paradigm for business management throughout the world.The same principles that underpin CQI in the for-profit sector, often called Total Quality Management, apply to medical education.Basically, the emphasis is on customer (student) satisfaction, creation of processes for continuous improvement in quality, and the design and application of metrics useful in measuring such improvements and satisfaction.These principles may also be broadly and constructively applied to the learning process in medical education.Specifically, when applied to the adoption of computer-based materials, they require both a demonstration of greater student (customer) acceptance and documentation of more successful achievement of educational objectives, however they might be defined.For both of these requirements, computing technology appears ideally suited as a process whose success can be measured and monitored over time.

Practical application of computer-assisted instruction and application has been successful in a number of settings.8,9,10,11,12,13The most successful published work in pediatric education has been by Schwartz8,9,10 who has developed a computer- assisted medical problem-solving (CAMPS) system to teach and evaluate medical students in a pediatric rotation.Dr. Schwartz is available for consultation to clerkship directors who wish to explore his system of teaching or evaluation.

Dr. William Schwartz
Children’s Hospital of Philadelphia
34th and Civic Center Boulevard
Philadelphia, PA 19104
Tel. (215) 662-6390

Additional Comments by David O. Link, M.D.

Any report on computer based testing (CBT) remains a “work in progress” summary.While it might be attractive to attribute the fluid nature of this CBT enterprise to ever more powerful software, real progress stems from hard work developing new cases which ever more closely mimics clinical circumstances.While transferring a multiple choice test from paper to screen represents a simple, rather straight forward piece of software effort, developing all the ingredients necessary to presenta clinical case with the look and feel of a real child, is a demanding task.Nonetheless, once the elements have been assembled – the pictures, x-rays, various blood and bacteriologic images, necessary audio components, physical findings, etc. – the case takes on a remarkable authenticity.

In addition to the marvelous array of realistic information one can make available, the computerized patient encounter also can be structured to reinforce good clinical habits.For example, history questions can be required before proceeding to the physical examination.Similarly, laboratory and x-ray findings can be sequenced, and students can be shown the primary study such as the x-ray film, EKG, blood smear, etc. rather than given the “answer”.By assigning relative “costs” to selected items, one can emphasize the need to develop a working diagnostic hypothesis from low cost information – history and physical examination – and select tests that are likely to yield useful diagnostic information.

Finally, and most importantly, one can probe clinical reasoning by students throughout the case exercise.By inquiring about the student’s diagnostic hypothesis several times throughout the development of the case, one can capture the gradual narrowing of clinical hypotheses into a more certain diagnosis.Similarly, one can ask students to identify the most important items of information which lead to the diagnosis.Likewise, one can score for efficiency of data gathering, yet insure that the student gathers sufficient low-cost data along the way.We are currently attempting to include in our data abstract information which testes students’ recognition of the value of information in reaching a diagnosis and in characterizing a patient’s severity of illness.Without sophisticated soft ware underlying a testing program, any attempt to capture such information from a written test would either be hopeless, or entail endless amounts of work for a clerkship director.

Perhaps the most difficult and challenging feature of computer based testing is designing a scoring algorithm.Important educational problems arise in attempting to formalize the assignment of a score in a testing situation where so many options are available to the student.Several examples are worth noting: should efficiency of data gathering be strongly rewarded?How does one assign (inevitably arbitrary) costs to various data items requested?There is general agreement that history questions in most physical examination items are very low cost.Invasive or high tech tests – such as various imaging modalities – are quite costly in reality and may be contraindicated, or even dangerous; how should these be assigned cost?A vexing issue arises when scoring a student who makes the wrong diagnosis yet works up the patient very well for that wrong diagnosis.Should the student be penalized heavily, or should this “competing diagnosis” and its workup be scored with a relatively high value despite the incorrect answer? (Put simply, what does one do with a “very good wrong answer”?)

At Harvard Medical School we are currently deploying a pilot program for student testing.The model is that of a classic pediatric case presenting in the emergency room or newborn nursery.Students are given three cases to work through, drawn from six large groups of cases.Their score is based onan algorithm which reflects decisions made about all the questions noted above.

As Dr. Janco notes, validation of such a test instrument remains an enormous challenge.Even when internally validated, one must compare the score from the computer based assessment with additional clerkship evaluation methods.Finally, the clerkship faculty must determine what to do with the score once available from the software.

Despite the challenges enumerated, CBT remains a tremendously attractive approach to clerkship testing.The cases are real and students find them attractive.They simulate real clinical circumstances very well, the test can be self administered and scoring is performed by the software, alleviating the burden ordinarily placed on the clerkship director.Finally, there is a consistency and objectivity to the test which helps to minimize differences arising from multiple clerkship sites, different times of year, and variation in student groupings.

Those wishing to discuss the program developed by the Department of Pediatrics at Harvard Medical School in conjunction with the Laboratory of Computer Sciences at Massachusetts General Hospital should call:

David Link, M.D.
Department of Pediatrics, Harvard Medical School
c/o 1493 Cambridge Street
Cambridge, MA02139
(617) 498-1497.

Summary and Recommendations:

The theoretical advantages and disadvantages of CAI are summarized in the table below. Educators interested in CAI should pay careful attention to issues of development time and cost.Off-the-shelf CAI suitable to individual institutions might be a wise choice for those unwilling or unable to make the commitments in development.


Individualized instruction
Quality instruction
Time efficiency
Learning outcomes well-defined
Learner controls the process
Remediation always available
Immediate feedback

Development time
Development costs
Limited choice of strategies
Limited range of media
Shift in paradigm:roles of teacher and


The following recommendations list topics or issues that must be considered in choosing available CBT, CAI or development of courseware:

  • As institutions adopt CAI for medical education, the opportunity to answer research questions abound. Carefully controlled methodology and the choice of valid research questions should be considered when course directors choose to compare CAI or CBT with more traditional learning and evaluation techniques.
  • Faculty development and promotion by recognition of education research should be encouraged.Innovation in application of new technology represents a meaningful criterion for promotion of faculty.Product developers on faculty should clarify their own institutional policies on promotion and tenure.
  • Peer review and publication criteria for manuscripts describing application of new learning technologies should be standardized across journals whenever possible.Editors and reviewers should be aware of relevant methodologies, learning theory, and appropriate statistical techniques.
  • Technical standardization of software, multi-media applications, and authoring strategies will allow many users to share courseware or purchase those commercially available.Moreover, cataloguing available courseware, CBT, or CAI would allow wide choices of products for those not interested in reinventing the wheel.Comparative evaluations of commercially available products or shareware might allow rational choices among available products.
  • Copyright/shareware issues must be addressed by software/courseware developers early in their product design.Software debugging and updates represent a significant ‘after-market’ task that may become daunting.Developers must be familiar with their own institutional policies, copyright laws, and responsibilities to those purchasing their software products.


  1. .Allen MJ, Yen WM. Introduction to Measurement Theory. Brooks/Cole Publishing Monterey, CA. 1979.
  2. Lyon HC, Healy JC, Bell JR et al.PlanAlyzer, an interactive computer-assisted program to teach clinical problem solving in diagnosing anemia and coronary artery disease.Acad. Med.67:821-8.1992.
  3. Poses RM, Cebul RD, Wigton RS et al.Controlled trial using computerized feedback to improve physicians diagnostic judgements.Acad. Med.67: 345-7. 1992.
  4. Mangione S, Nieman LZ, Greenspan LW and Margulies H.A comparison of computer- assisted instruction and small-group teaching of cardiac auscultation to medical students.Med Educ. 25:389-395. 1991.
  5. Desch LW, Esquivel MT and Anderson SK.Comparison of a computer tutorial with other methods for teaching well-newborn care.Am. J. Dis. Child. 145:1255-1258 1991.
  6. Saliterman SS.A computerized simulator for critical-care training: new technology for medical education.Mayo Clinic Proc.65:968-978.1990.
  7. Sajid AW, Ewy GA, Felner JM et al.Cardiology patient simulator and computer- assisted instruction technologies in bedside teaching.Med. Educ. 24(:512-517.1990.
  8. Kost S, Schwartz W.Use of a computer simulation to evaluate a seminar on child abuse.Pediatric Emergency Care5:202-203. 1989.
  9. Schwartz W.Using the computer-assisted medical problem-solving (CAMPS) system to identify students’ problems-solving difficulties.Acad. Med. 67:568-571. 1992.
  10. SchwartzW.Documentation of students’ clinical reasoning using a computer simulation.Am. J. Dis. Child. 143:575. 1989.
  11. Legler JD, Realini JP.computerized student testing as a learning tool in a family practice clerkship. Family Medicine26:14.1994.
  12. Smith WR, et al.Computer-assisted instruction in probabilistic reasoning during the inpatient medicine clerkship.Methods Information Medicine 32:309.1993.
  13. Lincoln MJ, et al.Iliad training enhances medical students’ diagnostic skill. J. Medical Systems 15:93.1991.