A website by Jeffrey Veen   more →

Brewster Kahle and the Internet Archive

26 May 2003

A while back, I attended a lecture given by Brewster Kahle of the Internet Archive project. Brewster has an interesting pedigree in the world of search and information retrieval. In the ‘80s, he was behind both Thinking Machines and WAIS, though you’re probably more familiar with Alexa — the “see also” engine built into browsers. After selling that company to Amazon, he focused his attention full-time to archiving copies of the Web. To that point, he’s currently set up in the San Francisco Presidio with a bunch of terabytes of disk space, scraping the public net on a regular basis. In fact, you can browse it all through the Way Back Machine.

I was particularly struck with his opening comment: Recent studies show that children are turning to the Web not only as their primary source of information, more and more as their only source. Yet, lawmakers and enormous copyright holders are colluding to keep an alarming amount of content out of the public domain. In fact, every few years, the Disney Corporation pays millions of dollars to lobbyists to ensure their copyright protection is extended — and taking with them tens of thousands of viable works that the public no longer can legally access. As Brewster put it, “If you don’t have access to the past, you live in a very Orwellian world.”

But how do we get knowledge online? If we can set aside the politics and current legislation for a moment, it turns out the task isn’t all that onerous. Brewster explained that there are an estimated 20 million individual books in the world today. Using current book scanning technology, along with a human “page turner”, it costs roughly $10 per book to get the data into digital form. Suddenly, we have a realistic project. Who would pay for it? Well, the budget for the library system in the US last year was $12 billion. That more than covers $200 million to scan every work ever into a public accessible database. A terabyte of storage costs about $2,000 now. The Library of Congress would need $60,000 worth of disk and would fit on a shelf. Brewster demoed a $5,000 kiosk that could print any book in a matter of minutes. He can drive this “Internet Bookmobile” to any neighborhood and give free books to kids. Except, of course, that it’s illegal. Mickey Mouse is still worth a few bucks, unfortunately.

He went on, but my mind was starting to overflow. I did capture a few more statistics, though. There are about 2 million recorded audio works, 100,000 movies (half are from India!), and 3 million hours of television. All this stuff is even easier to digitize. And again, very little of it can be.

My recommendations: check out archive.org, pay a visit to EFF.org and make a donation, and follow the copywrite fiasco at Lawrence Lessig’s weblog. ​