eBooks Versus Digital Picture Books

by Michael Hart on December 23, 2007
News

eBooks as I invented them are what you would get if you sat at your computer and typed in a book the same way a person types in anything else, however, but most of the news items that claim a million books are book pictures rather than actual computer characters.

Here is the difference:

When you hear about a megabyte, that is the one million sized bunch of computer characters…just what we get if we typed in a million characters like this.

A gigabyte is a billion characters, or bytes.

A terabyte is a trillion characters…etc.

This month terabytes have been around $200, very cheap.

A million eBooks with a million characters each makes a library of eBooks that just fills up a terabyte as long as these are real computer characters, such as the ones you and I type in to create these eMails.

Footnote: if you use .zip compression files you can do 2.5 million eBooks per terabyte. This does not work on the digital picture books explained below.

You can search every character of these eBooks, and put quotes from them into your eMail, your word processors, or any other text oriented programs in the world. Your programs can do spell checking, indexing, concordancing and any numbers of other things usually associated with the way people use books.

BUT!

Not so with digital picture books.

The Million Book Project, Google and all the others who tout having a million books are talking about pictures, picture of books, but NOT characters, NOT what we type, here and now, creating our eMails, but what you get out if you were to take a digital photograph of characters, digital photographs of books, or of anything else.

A picture is harder for computers to use than letters.

It is just that simple.

Each page has to have its own picture in its own file.

Way too many files.

The better the file looks, the bigger it is.

Small picture files look “pixellated” and obvious that they are computer pictures.

Big pictures files look better and take so much longer to download and to process that “flipping pages” isn’t really in the cards…but with real eBook pages your pages can go by as fast as you like, and all the pages can easily be in one single file.

One file…versus…250 files…per book.

If you’ve ever opened a directory with many thousands, or even less, files in it, you know what is coming….

It takes the computers an enormous effort to contain a seriously large number of files.

The more files, the more of the computer is wasted.

Let’s take the 1.5 million books mentioned in the news being covered at this very moment.

Presuming 200 pages per book that’s 300 million pages!

Can you grasp what your computer would be like if your
1.5 million books took up 300 million files?!?!?!?!

Unless you had a much better computer than most people your computer would be serious dragging with the load!

Just try downloading a book from Project Gutenberg and a book from The Million Book Project and it will be an obvious situation…be sure to use a stopwatch.

Project Gutenberg sends out over one eBook per second, and your eBook shouldn’t take any time whatsoever more than your own network delay…if you have very quick Internet connections you can pick one collection of an assortment of eBooks and get a number of books at just about one book per second…though most connections, sadly to say, have so much time lag that you would not usually notice how fast the eBooks went out.

Then try downloading a Million Book Project entry.

See what I mean?

It’s only obvious if you try.

You could literally download 1.5 million eBooks of the Project Gutenberg variety to your terabyte drive.

No problem.

Plenty of space to spare for another million books, if your fancy should be tickled.

Now let’s consider The Million Book Project, Google or any other of the “digital pictures of books” variety.

Each file is going to be sent separately.

They don’t really go out of their way to make it easy, because they don’t really want to you OWN their books, what they really want is to get you to read over their shoulders, so to speak…they keep the books and you never get to actually own them.

Not so with Project Gutenberg.

You download 100,000 of our eBooks…you own them!!!

And it doesn’t take all your computer power to do it!

300 Million Pages!!!

Even at about 10 pages per second it takes a year!!!

Huh???

Think about it… .

Your computer network probably runs 30 million seconds per year, presuming a few percent downtime, for normal maintenance, upgrades, network outages, etc.

IT WOULD TAKE ONE ENTIRE YEAR TO GET 1.5 MILLION BOOKS

If these books were “digital picture books.”

Get the “picture?”

Then think about how to store 300 million page files.

Get the picture?

Then think that you can’t cut and paste quotations.

Or even search for quotations.

Or eMail the quotations.

Or correct typos in the books.

Or use them in Microsoft Word when writing.

ALL you can do is READ them… .

Which is certainly nice… .

But it is no advantage over paper books other than in the sense of reading at a distance.

You still have to type in all your notes, quotations, and anything else.

Let’s say you were doing Romeo and Juliet in theater.

You want to make up scripts for your cast.

You want to individualize the scripts for your play– each player only needs certain portions.

Not all portions are usually included, editions would be different about this.

You can tailor make your script from a dozen editions simply by cutting and pasting the parts you want.

Then you can cut and paste up each players scripts.

Daily changes, corrections, stage instructions can be included in each printout.

Try THAT with a “digital picture book.”

Hopefully your imagination has been sufficient to see what I have been trying to explain here, but you will find me easily available to clarify, if needed.

Part II

We actually tried The Million Book Project’s eBooks a bit since the big announcement came out November 29– and have a few more things to report as a result.

First, we should report that this is a renamed effort started at Carnegie Mellon back in the heyday of what turned out to be the beginning of major interests for the eBook world by world-class universities.

Sadly to say, this project never fulfilled the dreams of its creators in either the number or quality of an eBook collection, hence every few years announcements of a “new” version of this project came out trying to generate new interest, or financial support, etc, but the result is that most of the books we looked at for our sample either had missing pages, fuzzy pages [the computers and humans both have trouble reading these] and pages that were not photographed straight, [which causes the same reading problems as above].

It would be an overstatement to say that 1.5 million, or even close to it, book were available to download, at least in the sense of what you would get from your local bookstores, libraries, etc.

It is one thing to say an eBook [computer characters] is 99.99% accurate to the original and that this will move to 99.999% and then 99.9999% in the future, but, it is totally something else to say that a collection of books has 99% of its pages in decent formats but a 1% portion of the pages is missing or unreadable.

Obviously these comments are only from small samples, but it appears from such samples that the process had been totally automated and that no human being looked to see that each page had actually been scanned in an accurate enough way to read, or scanned at all.

However, it is also obvious that LATER someone looked and marked the pages as missing.

Why this couldn’t have been arranged for at scanning, and then corrected while the book as at the scanner– is beyond me.

[Note, we are not experts in Chinese, though we quite literally are just starting a few eBook projects from Chinese books, so we are just guessing that the notes we found where pages are missing, were telling us the pages were missing.]

While no one expects perfection from new projects the project in question is not new, barely referenced for this in the press release, then simply by an old name for the project, it will be difficult in the extreme, too difficult to imagine being done well or soon, for someone to dig up the missing and unfocused pages, in fact, it might take as much effort as the scanning of the original books…100% reinvestment of time.

With eBooks it is trivial to correct most mistakes in the books, you just put the book in a word processor, correct the error, and save the new file.

Anyone on any computer can correct eBook errors.

But errors in “digital picture books” are hard to fix in that sense, you would actually have to “photoshop”
the page the way they do for movies to fix an error– very time consuming and labor intensive, and not such a thing most people could do on their own computers.

Thus we should not expect much improvement in digital picture books, while we should expect improvements of a significant nature in eBooks.

Just another of the differences that put these books, often confused with each other, into separate camps.

Summary

An eBook and a digital picture book are different.

eBooks are small files, needing only 2.5 megabytes to store 2.5 million character book…or one megabyte, if you use compressed files such a .zip file.

Digital picture books are large, one file per page to start with, and each file does not compress well from the standpoint of such .zip files. So many files are not easy for most computers to handle, which wastes a lot of the hard drive space, the directory structure, and the time it takes to flip through the pages.

eBooks are trivial to correct. Just bring them up in your favorite word processor, any will do, and fix up whatever errors you find. You can even include notes about the errors, or about anything else, in seconds.

Digital picture books are impossible to correct, with reference to the average computer owner. It takes an expertise with “Photoshop” or similar programs and we do not have that expertise en masse to do corrections in the same way the average computer user can fix any error they find in a regular eBook.

The time it takes to download eBooks is short.

The time it takes to download picture books is long.

It takes one terabyte to store a million eBooks.
[2.5 million if you use .zip files] All for $200.

It would take more hard drives than anyone has that I know of to store a million digital picture books of a variety such as The Million Book Project or Google.

We are talking about a seriously major investment.

Personal Computers As Personal Libraries

The average personal computer today is under $500.

Adding a terabyte might add an average of $200.

That easily gives you space for a million eBooks.

Want two million, just add another terabyte.

They are out there, free for the downloading.

If you simply presume “digital picture books” are ten times the size in disk usage, it moves the picture to something only 1% of computer users might consider to be possible, given either their physical space or the limitations of their budgets.

If you presumed the digital picture books were larger by a factor of 100 times, then a personal library can be said to be the stuff only of the elite.

eBooks were designed to be for everyone to own.

Digital picture books were designed for major players to own, not for the rest of us to do more than a read over their shoulders now and then.

“eBooks” are designed to be incorporated into eMails, school papers, research papers, or new editions, just make your changes to the old editions, and you should be ready to publish your own edition of any one of an assortment of millions of free eBooks out there.

Try that with a “digital picture book!”

eBooks were designed to be searchable in seconds.

You can download The Complete Shakespeare in a minute and then search it as many times as you like, seconds or less for each search.

Try that with a “digital picture book.”

The difference should be clear… .

Thank You!!!

Give the world eBooks for 2008!!!

Michael S. Hart
Founder
Project Gutenberg

100,000 eBooks easy to download at:
http://www.gutenberg.org [already passed 25,500 eBooks]
http://www/gutenberg.cc [already passed 75,000 eBooks]
http://gutenberg.net.au Project Gutenberg of Australia 1570+
http://pge.rastko.net 65 languages PG of Europe ~500
http://gutenberg.ca Project Gutenberg of Canada

Blog at http://hart.pglaf.org

If you liked this post, say thanks by sharing it.