PG Monthly Newsletter: Error Correction Of Pg Ebooks (2008-12-25)

by Michael Cook on December 25, 2008

Project Gutenberg Monthly Newsletter

Error Correction of Project Gutenberg eBooks

Your assistance is hereby requested:

As many of you know, I like to do something around this time

every year to take a new step forward in Project Gutenberg.

As luck would have it, I recently received an email reminder from one of our 
volunteers who reads our eBooks out loud for those who need or want audio 
eBook versions of our library.

This volunteer was kind enough to keep a log of errors found while recording 
one of our classics eBooks out loud and then sent us that list of errors, and 
now was following up.

Due to the fact that we receive more errors messages than we have volunteers 
to  handle, these errors were not corrected, which stimulated me to write a 
request for help on this in a recent Project Gutenberg Newsletter.

The results were immediate, effective, and continuing.

The new edition, complete with ~23 corrections is online and has been for a 
couple days already, and we are still getting more volunteers for error 

This is a great and wonderful thing because the one thing in the history of 
eBooks that separates Project Gutenberg is an everlasting continuing process 
of improvement.

Hundreds of our eBooks are reissued each year with a variety of improvements, 
some technical, some in format and/or style of presentation, many with various 
error corrections.

How Good Can An eBook Get?

If we keep this process going for as many years more as this has been going on 
already, there is no reason average eBooks should not be as accurate, or even 
more accurate, than books being published on paper.

Some people like to pretend Project Gutenberg eBooks that we run through 
certain processes are "perfect," but I think our own sensibilities tell us 
this is not the case.

The recent new edition mentioned above is a perfect example, as it had been 
through just about all the processes we have, and yet reading it out loud 
revealed ~23 more errors.

I would certainly hesitate to bet that our average 250 pages long book would 
not have ~23 errors still in it.

After all, 25 errors in 250 pages at only 1,000 characters a page, would mean 
the book had 1 error per 10,000 characters, or that it was 99.99% perfect.

I won't bore you all with numerical details, other than just a quick mention 
that the earliest eBook standards were 99.9% and then The Library of Congress 
upped that to 99.95%, and a few years later Project Gutenberg raised it to 
99.975% and I would certainly bet our average eBook that has completed all our 
standard processes is at least that good.

However, there is always room for improvement, and that's an awfully touchy 
subject for some, but not for CEO Greg Newby, or for myself, or for a few 
others who are willing to create a new Project Gutenberg Error Correction 

Believe it or not, we have receives perhaps 10,000 messages, over 37 years, 
encouraging us to check certain parts of book files for errors.

10,000 error messages!!!

We should expect to receive many more in the coming years as we will have many 
more readers.

What Makes A Project Gutenberg eBook?

As I said earlier, the greatest difference between Gutenberg eBooks and all 
others is in the proofreading.

No one spends as much time and effort on accuracy as we do.

In the end, after virtually all the easy to find eBooks have been created, 
there will only be error correction to do, and translations into other 
languages, the rest grinding slowly, but assuredly to a halt, unless copyright 
trends reverse.

There is a reason that Project Gutenberg is used so greatly, particularly when 
compared to the millions of other eBooks-- and that is because we work harder 
to make them better.

It takes an hour to work over the average book to correct an already existing 
list of errors. . .you have to get the book and then you have to open up in a 
program that won't leave a trace behind, the various "artifacts" you often see 
when the eBooks have been pumped through ill-mannered programs, and a final 
pass to make sure all the margins still fit, etc.

Even then, one of our "Whitewashers" has to go over the book with a final fine 
tooth comb that pops out every character-- every single character, even a 
comma, that changed from what was in the previous edition, and make sure each 
one of those changes was intentional.

It's really not terribly easy to be the last persons to work on an eBook, and 
to know that any errors you leave behind or accidentally create will be there 
for millions of readers in the world until, hopefully, the next error checker 
finds and corrects them.

It is a great responsibility, but it also carries a greatest sense of 
achievement, as you realize all the future readers, which could be billions, 
will benefit from your work.

So, I thank each and every one of our Error Checking Team in great sincerity 
for their efforts, and at the same time I am asking for new members for this 
team to step forward to make yet one more level of contribution towards 
creating the best library humanity has ever seen.

Please be encouraged to forward this message to everyone and anyone you know 
who might be interested.

Again my HUGE thanks to you all!!!!!!!

Michael S. Hart
Project Gutenberg

gmonthly mailing list


If you liked this post, say thanks by sharing it.