Error Correction of Project Gutenberg eBooks

by Michael Hart on December 25, 2008

As many of you know, I like to do something around this time every year to take a new step forward in Project Gutenberg.

As luck would have it, I recently received an email reminder from one of our volunteers who reads our eBooks out loud for those who need or want audio eBook versions of our library.

This volunteer was kind enough to keep a log of errors found while recording one of our classics eBooks out loud and then sent us that list of errors, and now was following up.

Due to the fact that we receive more errors messages than we have volunteers to handle, these errors were not corrected, which stimulated me to write a request for help on this in a recent Project Gutenberg Newsletter.

The results were immediate, effective, and continuing.

The new edition, complete with ~23 corrections is online and has been for a couple days already, and we are still getting more volunteers for error correction.

This is a great and wonderful thing because the one thing in the history of eBooks that separates Project Gutenberg is an everlasting continuing process of improvement.

Hundreds of our eBooks are reissued each year with a variety of improvements, some technical, some in format and/or style of presentation, many with various error corrections.

How Good Can An eBook Get?

If we keep this process going for as many years more as this has been going on already, there is no reason average eBooks should not be as accurate, or even more accurate, than books being published on paper.

Some people like to pretend Project Gutenberg eBooks that we run through certain processes are “perfect,” but I think our own sensibilities tell us this is not the case.

The recent new edition mentioned above is a perfect example, as it had been through just about all the processes we have, and yet reading it out loud revealed ~23 more errors.

I would certainly hesitate to bet that our average 250 pages long book would not have ~23 errors still in it.

After all, 25 errors in 250 pages at only 1,000 characters a page, would mean the book had 1 error per 10,000 characters, or that it was 99.99% perfect.

I won’t bore you all with numerical details, other than just a quick mention that the earliest eBook standards were 99.9% and then The Library of Congress upped that to 99.95%, and a few years later Project Gutenberg raised it to 99.975% and I would certainly bet our average eBook that has completed all our standard processes is at least that good.

However, there is always room for improvement, and that’s an awfully touchy subject for some, but not for CEO Greg Newby, or for myself, or for a few others who are willing to create a new Project Gutenberg Error Correction Team.

Believe it or not, we have receives perhaps 10,000 messages, over 37 years, encouraging us to check certain parts of book files for errors.

10,000 error messages!!!

We should expect to receive many more in the coming years as we will have many more readers.

What Makes A Project Gutenberg eBook?

As I said earlier, the greatest difference between Gutenberg eBooks and all others is in the proofreading.

No one spends as much time and effort on accuracy as we do.

In the end, after virtually all the easy to find eBooks have been created, there will only be error correction to do, and translations into other languages, the rest grinding slowly, but assuredly to a halt, unless copyright trends reverse.

There is a reason that Project Gutenberg is used so greatly, particularly when compared to the millions of other eBooks– and that is because we work harder to make them better.

It takes an hour to work over the average book to correct an already existing list of errors. . .you have to get the book and then you have to open up in a program that won’t leave a trace behind, the various “artifacts” you often see when the eBooks have been pumped through ill-mannered programs, and a final pass to make sure all the margins still fit, etc.

Even then, one of our “Whitewashers” has to go over the book with a final fine tooth comb that pops out every character– every single character, even a comma, that changed from what was in the previous edition, and make sure each one of those changes was intentional.

It’s really not terribly easy to be the last persons to work on an eBook, and to know that any errors you leave behind or accidentally create will be there for millions of readers in the world until, hopefully, the next error checker finds and corrects them.

It is a great responsibility, but it also carries a greatest sense of achievement, as you realize all the future readers, which could be billions, will benefit from your work.

So, I thank each and every one of our Error Checking Team in great sincerity for their efforts, and at the same time I am asking for new members for this team to step forward to make yet one more level of contribution towards creating the best library humanity has ever seen.

Please be encouraged to forward this message to everyone and anyone you know who might be interested.

Again my HUGE thanks to you all!!!!!!!

Michael S. Hart
Project Gutenberg

If you liked this post, say thanks by sharing it.