it-fyi: Searching for the Essence of the Web (NY Times on the Web

Swisher, Bob (bswisher@ou.edu)
Mon, 12 Apr 1999 09:54:13 -0500


From: "Swisher, Bob" <bswisher@ou.edu>
To: "'it-fyi@listserv.ou.edu'" <it-fyi@lists.ou.edu>
Subject: it-fyi: Searching for the Essence of the Web (NY Times on the Web
Date: Mon, 12 Apr 1999 09:54:13 -0500

April 11, 1999

Searching for the Essence of the World Wide Web

By GEORGE JOHNSON

Grazing through a computer screen onto the vast expanse of the World
Wide Web, one feels like an explorer perched at the edge of an endless
wilderness. It's a bit of a letdown, then, to learn how very finite the
whole place really is.

Researchers at a company called Alexa Internet, using computers to
automatically plumb the depths of this ocean of information, recently
estimated that, as of last summer, the Web was three terabytes in size
-- three trillion bytes of information, about 5,000 CD-ROM's. Just about
the whole thing would fit onto Sun Microsystem's top-of-the-line
StorEdge A7000 Intelligent Storage Server, an array of speedy hard-disk
drives occupying less than 150 cubic feet. This cyberspace that people
have been romping around in could be squeezed inside a bedroom closet.

But it's not the size of the Web that matters. As the world is
increasingly coming to appreciate, physical space and cyberspace operate
according to different rules. In what they describe as a new science of
Webology, computer scientists at the Xerox Palo Alto Research Center in
Silicon Valley recently funneled a large portion of the Web, about 55
million pages (leaving out the pictures), onto 400 billion bytes of disk
space. Held in captivity in Palo Alto, this Web in a Box is poked and
prodded, studied like a great beast -- or, to use the metaphor the
researchers prefer, like an ecosystem.

With the help of this simulation, and by probing the real, living Web
with electronic signals, they seek laws by which the members of the
planetary community of Internet foragers compete and cooperate in the
constant search for information. The Internet has become a living
laboratory, a place to study mass human behavior with a precision and on
a scale never possible before.

"No central authority has cultivated the Web as a beautiful garden,"
said Dr. Bernardo Huberman, an Internet ecologist at Xerox PARC. "It
grows on its own like an ecosystem." Informavores hunting down an
interesting site link it to their own, and that site is soon linked to
others, forming a vast spider web of connections.

"The sheer reach and structural complexity of the Web makes it an
ecology of knowledge, with relationships, information 'food chains,' and
dynamic interactions that could soon become as rich as, if not richer
than, many natural ecosystems," Dr. Huberman wrote in a paper last year
with his colleagues Peter Pirolli, James Pitkow and Rajan Lukose.

But it is hard to find the right metaphor for something so strange.
Viewed in real time, with data seekers buzzing from site to site, the
Web can seem like a swarm of virtual insects, one whose flutterings (in
the form of mouse clicks) can be recorded and sifted for clues to
behavioral laws.

"We are not doing computer science," Dr. Huberman said, "but something
more akin to social science." What strategies do people use to hunt down
information? Why, for no apparent reason, do storms of activity suddenly
surge through the Internet, causing the whole thing to grind to a halt?
And why, just as mysteriously, do these information fronts suddenly
subside?

Ever since the Web began to burgeon, barely under human control, people
have been straining to relate it to something familiar -- an ecosystem,
the weather, an unruly crowd at a rock concert. The Web is a great ocean
on which you surf from site to site. It's a cyberspace with a topology
of its own: Two points distant in physical space can be adjacent in
cyberspace, a single mouse click away. But an E-mail message sent in an
instant to a neighbor next door might be routed through a maze of links
extending thousands of miles.

Lada Adamic, a Stanford University graduate student working on Xerox
PARC's Internet ecology project, recently found that cyberspace, like
the world described in the John Guare play "Six Degrees of Separation,"
is a small place indeed. Just as any two people on Earth are said to be
connected by a human chain of acquaintance with no more than a few
links, so can you pick two Web sites at random and get from one to the
other with about four clicks.

The research quantifies what Web users intuitively know: Because of the
high density of connections, it can be surprisingly easy to find
information in what amounts to a library without a card catalog, filled
with unindexed books.

The thunderstorms of congestion on the Net, another study found, can be
analyzed in terms of crowd behavior. (Meteorology, sociology -- the
metaphors inevitably clash.) Sudden clots of congestion can sometimes be
traced to obvious causes, like the recent virtual lingerie show of
Victoria's Secret. More often they arise and quickly dissipate for
obscure reasons best understood using what social scientists call game
theory.

You log on to the Internet and find the playing field uncrowded. With
Web sites popping up as quickly as you touch their links, you click more
and more, downloading video files and sound tracks with little regard
for the capacity, or "bandwidth," you are consuming. Millions of other
players are selfishly doing the same. Inevitably the activity reaches a
threshold and connection speeds start to crawl.

Should you stay around, knowing that others will soon give up in
frustration, leaving you more room? Or will you gain in the long run if
you help relieve the congestion, logging off until the storm has
probably blown by? You must decide, in terms of game theory, whether to
defect from the common good or cooperate.

The result is a classic social dilemma, a vastly larger-scale version of
what happens when you are confronted with a steady busy signal at the
theater box office and must decide whether to call back later or set
your phone on constant redial. Short spikes of congestion are followed
by lulls -- a pattern that can be predicted statistically and verified
by "pinging" the Net, as the engineers say, bouncing thousands of
packets of information off a particular site and timing in milliseconds
how long they take to return.

>From measuring millions of mouse clicks, another study has derived a
mathematical "law of surfing" predicting how many pages one typically
visits within a single Web site -- about 1 1/2, a finding that has been
of keen interest to Internet entrepreneurs.

As the Web continues to grow exponentially (with everyone someday as
likely to have a Web page as a street address), it will become an ever
richer distillation of human behavior. Even the dead, discontinued pages
will be around for scholars to scrutinize. A group called the Internet
Archive in San Francisco has collected and stored on disks and tapes
over a billion Web pages, exceeding 13 terabytes. (The entire Library of
Congress has been estimated to contain 20 terabytes of text.) The plan
is to provide snapshots, year by year, of just what the great
terrestrial brain has been thinking.

Related Sites

Xerox PARC Internet Ecologies Area.
(http://www.parc.xerox.com/istl/groups/iea/)
The Internet Archive. (http://www.archive.org/)