A Guardian Netbytes article on the Project Gutenberg archives

by Michael Cook on January 15, 2008
NewsPG News

Peruse the world’s best public library

Jack Schofield wrote this nice little article for the Guardian website yesterday. He talks about the project archives, volunteering and includes a nice little quote from the site about preservation.

Here is the article in full;

If you fancy a good read, Project Gutenberg has more than 22,000 books available to download, and they are all free. You can’t grab the latest blockbusters, of course. Instead the database provides access to out-of-copyright texts, which generally means pre-1923, for the US site. You can download works by Jane Austen, Charlotte Bronte, Mark Twain, Sir Arthur Conan Doyle, Joseph Conrad, HG Wells, William Shakespeare and many others, going back through Beowulf to Plato.

And if this sort of thing doesn’t appeal to you, why not contribute something? Project Gutenberg is a volunteer effort. There’s no particular logic to the selection of works. They are all texts that someone somewhere felt were worth the effort to retype or, more likely, scan in.

Project Gutenberg is participating in Yahoo’s Content Acquisition Program, but has been going far longer. It was started in 1971 by Michael Hart, who had been given $100,000,000 worth of computer time on a mainframe, and according to the project’s FAQ it began to take its current form in 1991.

A book a day

In 1991, the target was to add one book each month. By 1996 it was a book a day. Today, about 400 books are added each month.

Volunteers do more than just retype and scan books. The organisation has a volunteer chief executive, production directors, a posting team and, via a companion project, hundreds of Distributed Proofreaders. It’s a real community effort.

The Project Gutenberg database also includes some audio books, photos, videos and music files, but it’s mostly e-texts, mostly in English. It tries to make everything available in plain ASCII text, though other formats are acceptable.


The ultimate aim is preservation. The site says: “The point of putting works in the PG archive is that they are copied to many, many public sites and individual computers all over the world. No single disaster can destroy them; no single government can suppress them. Long after we’re all dead and gone, when the very concept of an ISP is as quaint as gas streetlamps, when HTML reads like Middle English, those texts will still be safe, copied, and available to our descendants.”

The Project Gutenberg website is similarly plain, with very little marketing. A commercial organisation might have recommendations – a book of the day, an author of the week, and suchlike. Instead you are left to find books for yourself by searching the database, or you can skim charts of the Top 100 books and authors yesterday, over the past 7 days or the past 30 days.

Surprisingly, perhaps, the most-downloaded books over the past month are The Outline of Science, Vol 1, by J. Arthur Thomson; Manual of Surgery Volume First: General Surgery, by Miles and Thomson; and Manners, Custom and Dress During the Middle Ages and During the Renaissance Period, by Paul Lacroix.

But if Alice in Wonderland, Machiavelli’s The Prince, or Tolstoy’s War and Peace are more your bag, Project Gutenberg has those too.

If you liked this post, say thanks by sharing it.