From: Frank Durso <fdurso@ou.edu>
To: "'Swisher, Bob'" <bswisher@ou.edu>,
Subject: RE: it-fyi: Computer Program Learns to Grade Essays (Chronicle of Hi gher Ed)
Date: Tue, 1 Sep 1998 13:53:30 -0500
Hey Bob,
Landauer was here about 18 months ago talking about this. The article
describing the LSA algorithm appears in _Psychological Review_. It's
interesting, but note that it doesn't care anything about word order. An A
essay completely scrambled within a paragraph would still be an A essay.
My students have been arguing that for years.
Ciao,
Frank Durso
405.325.2667 voice
405.325.4737 fax
405.410.3980 mobile
Department of Psychology
University of Oklahoma
Norman, OK 73019
http://www.ou.edu/cas/psychology/html/durso.htm
-----Original Message-----
From: Swisher, Bob [SMTP:bswisher@ou.edu]
Sent: Tuesday, September 01, 1998 9:25 AM
To: 'it-fyi@listserv.ou.edu'
Subject: it-fyi: Computer Program Learns to Grade Essays (Chronicle of
Hi gher Ed)
How a Computer Program Learns to Grade Essays
Developers say the technology saves time and improves the assessment of
students
By KELLY McCOLLUM
To a professor facing a stack of ungraded essays, this may sound like
the story of the shoemaker's elves -- a computer program that can scan
an essay and, in a few seconds, reliably identify what material the
student has learned and what he has not. The program is real, however --
created by researchers who have spent nearly a decade developing the
technology that makes it possible.
They call the result the Intelligent Essay Assessor. And when their
program and human graders have been given the same essays to grade, the
results have been remarkably consistent, they say -- except, of course,
that grading each essay takes the program almost no time at all.
The assessor joins a host of other devices and software packages --
from bubble sheets to computer-based tests -- that professors,
universities, and testing services have used for years to streamline
student assessment. And while every announcement of a new grading
technology prompts another round of alarm among those who worry that
computers will start putting professors out of work, faculty members who
have used the technologies say such fears are exaggerated. At most, they
say, grading programs and various other high-tech offerings allow them
to take a more "modular" approach to their jobs, concentrating on what
they do best and letting machines handle some of the most tedious
chores.
In fact, the same technologies that underlie automated grading are also
making increasingly sophisticated automated tutors possible. In regular
courses and especially in distance-education programs,
computer-administered assignments can provide valuable feedback to
students -- without taking up a lot of the instructor's time.
Many academics remain skeptical of grading technologies. Andrew
Feenberg, a professor of philosophy at San Diego State University, says
companies in the educational-technology business "have been promising us
a replacement for teachers for a long time." He points to instructional
television and computer-assisted learning as technologies that have
promised much -- and disappointed many.
"When you go to college, you're looking for a much more complex
experience" than what technological teachers can provide, he says.
"Nobody knows how to write a program smart enough to actually do what
even the most mediocre teacher does when he reads and grades a paper."
But the creators of the essay assessor, which is among the most
sophisticated grading programs, say there are clear advantages to having
a computer grade essays. In a large lecture class, for example, a
professor and several teaching assistants might have to split up a stack
of essays and spend hours grading them by hand. Not only would each
grader have a different grading style, but he or she might grade each
paper on a slightly different basis.The computerized essay assessor, say
its creators, can grade each essay using exactly the same standards in a
matter of seconds. The time saved can be spent giving personal help to
students.
"We think of it as giving the professors another tool," says Thomas
Landauer, a psychology professor at the University of Colorado at
Boulder, who is leading the development of the essay grader.
To use the program, a professor must first teach it to recognize a good
or bad essay by feeding it examples of both varieties, which have been
manually graded. The program can also be "trained" with what Mr.
Landauer calls a "gold standard" -- passages from textbooks or other
materials written by experts on the same subject as the essays to be
graded. In many cases, a professor who assigns similar topics each year
could use essays from past semesters to train the program, although the
material used to do so must be in electronic form.
Earlier digital essay graders -- most notably the Project Essay Grader,
a technology that has been in development for more than 30 years -- also
work by learning how humans grade and comparing new essays to those
examples. But while that grader analyzes the sample essays mechanically
-- looking at sentence structures and counting commas, periods, and word
lengths -- Mr. Landauer says his program can actually "understand"
student writing.
It does so using a sophisticated form of artificial intelligence that
the researchers call "latent semantic analysis" to look at how words in
the reference material and sample graded essays are used in relation to
one another. By examining the words in essays it is asked to grade, the
computer can tell what subject the writer is discussing even if the
writer hasn't used the very same words as the source material.
That process, says Mr. Landauer, allows the computer, "to a good
approximation, to understand the meanings of words and passages of
text." The researchers get into the nuts and bolts of the process on
their World-Wide Web site (http://lsa.colorado.edu), which also offers
interactive demonstrations of the essay grader.
The program compares the patterns of word usage in ungraded essays with
the usage patterns it has learned from the initial samples. If an essay
appears to convey the same knowledge as verifiably good essays, the
computer gives it a high score. If a student's work looks similar to a
poor essay, it gets a low score. The program can also point out what a
student has omitted.
For each essay, the professor is presented with a report that gives a
score and notes any irregularities that a human grader should check on.
For example, if a student attempted to beat the computer by packing an
essay with keywords relevant to the subject matter, the grading software
would flag the essay for the professor to read.
The grader can also tell if an essay has been plagiarized from material
it has already learned, even if the student has attempted to paraphrase
the source. "It's looking for content," says Mr. Landauer. "You can
change the words, but the content's still there."
But the program cannot recognize clever turns of phrase or creative
approaches to an assignment. Mr. Landauer says it is not intended to be
used for English-composition or creative-writing assignments, in which a
student is being graded more on writing skill than on knowledge of a
subject. The essay assessor works best on essays assigned to check
students' factual knowledge in such subjects as history, political
science, economics, and the sciences. The program tests knowledge "by
the student's ability to put it into words," he says.
So far, the program has been used in middle-school and high-school
courses, at the University of Colorado and New Mexico State University,
and on essay questions from the Graduate Management Admission Test.
Peter Foltz, one of the other developers of the essay assessor, used the
program in his psycholinguistics class at New Mexico State University.
For one essay assignment, he set up a Web site through which his
students submitted their essays to the assessor, which gave them scores
and pointed out information they had missed. Then, rather than take the
first grade the computer assigned, Mr. Foltz let the students edit their
work and resubmit it as many times as they liked. He told them he would
do the final grading himself for those who didn't like the computer's
evaluation, but none of the students took him up on the offer.
Mr. Foltz, an assistant professor of psychology, sees such tutoring as
the best use for the essay grader. "Students don't get enough writing in
their classes, and this is a way to incorporate more writing where
professors don't have to evaluate every essay they see," he says. "It
frees the professor up to spend more time with the students, because the
professor's not spending all the time grading essays."
Still, San Diego State's Mr. Feenberg is concerned about "the tail
wagging the dog." If professors come to rely on computers for testing
students, he says, the limits of the technology will control what the
students are taught. "The students are being taught only how to pass the
test," he says. If an essay grader cannot monitor creative
problem-solving or deeper analysis, students will be required to do no
more than remember facts to get good grades, he says. While the creators
of the essay grader have raised eyebrows by claiming that it can
evaluate writing reliably, programs that make such claims are far less
common than is software that simply administers and scores quizzes and
tests, and can tell students where they've made mistakes.
Malcolm Duncan, an instructional-computing expert at Purdue University,
says programs like these can be even more valuable when used to tutor
students. When students leave the classroom, he says, "they may not have
gotten it in the lecture today -- but if they can go to the computer lab
and go through a tutorial session on-line, they may." Mr. Duncan says
such on-line tutorials can be as useful -- and just as marketable -- as
textbooks.
That's exactly what worries some professors who have incorporated
lectures, class materials, quizzes, and exams into on-line courses. Once
a professor stores his or her knowledge on a Web site or a CD-ROM --
whether in lecture form or in a test format -- the material could take
on a life of its own as a stand-alone on-line course. And it would not
necessarily be under the control of the professor.
"A lot of professors are not going to be all that excited about buying
somebody else's course," says Mr. Duncan, but they might buy question
banks to aid in teaching specific topics. Such an approach would mean
that the professors "still have control over how their course is taught,
but they can have some good assistance in presenting that to their
students."
Roy Rada, a professor of information systems and computer science at
Pace University, already thinks of his job as being modular, at least to
some degree. For example, he notes, while he has taught some classes
using textbooks he has written himself, more often he uses the work of
other authors. He still has to lecture, create and grade tests, and run
the classroom, but instead of creating a textbook as well, he plugs in
an outside source. He could just as easily plug in someone else's tests,
and he has already plugged in other people's grading: He asks his
students to grade one another's papers on some essay assignments.
Mr. Rada, who says he has taught on line since long before there was a
World-Wide Web, says automated tools could have their greatest impact in
distance education, in which it is hard for professors to communicate
frequently with individual students. "You can put material on the Web
and people can read it, but that's not very helpful," he says. Tools
like self-grading quizzes and the essay grader, he says, can provide
feedback to students who would otherwise feel isolated in the absence of
a traditional classroom.
But in an automated course with automated feedback, does the professor
still play a role? "The pedagogy is in the people who put it in the
system," says Mr. Rada. He says professors will always be necessary,
although they may find themselves concentrating their skills more on
some aspects of teaching and letting computers handle others. And, he
says, there will always be classrooms. "There is an enormous need for
traditional education. I don't think automated grading will have any
impact on it."
"We're social animals," he says. "We go to class every Wednesday at 6
p.m., and it suits us."
Copyright (c) 1998 by The Chronicle of Higher Education