EPUB books now available at Project Gutenberg

by Michael Cook on March 20, 2009
PG News

EPUB Books at PG

It was only a few months ago that Project Gutenberg announced an effort to make mobile all editions of their titles available. This was big news, however, in my eyes the latest eBook format to be released by PG is even bigger news.

Project Gutenberg has now made almost all their titles available in the industry ebook standard EPUB and all are DRM free (Digital Rights Management)!

Although EPUB has only embraced as an eBook standard within the last 12 months it has been embraced by many big names including; Sony, Google, Penguin, Harper Collins and Adobe, to name but a few. There are also many EPUB readers available, both software and hardware, that can read books in this format.

For all you gadget lovers, you can read EPUB formatted books on;

  • iPhone and iPod Touch using the very popular Stanza Reader.
  • Google Android and other Linux-based mobile devices using FBReader
  • Sony Reader PRS-505 and PRS-700
  • Apple iPad with many apps; iBooks, Bluefire, Kobo, etc.

Although the Amazon Kindle does not read EPUB files natively, there are several popular programs (Calibre) that will convert our EPUB files so that they can be read on your Kindle device.

There’s also a number of desktop readers such as the wonderful Calibre eBook Management program, and the Stanza Desktop reader.

Onlines readers such as the excellent ibisReader/Bookworm (an online reading application hosted by O’Reilly) allow you to upload your own EPUB books and read them from any computer or mobile device which has a web browser and internet connection – this also includes the Amazon Kindle!

Project Gutenberg Experimental EPUB

It must be stated that at this time, the PG EPUB books should be considered experimental. It’s a huge task to convert  the entire PG collection, so many may be either buggy or not actually work at all.

The EPUB files are generated automatically from the HTML version, if there is one, otherwise the Plain Text file is used. In this case the conversion program must guess at the structure of the text, so it is more than likely that the EPUB book will contain some formatting errors. These can include verse lines running together or paragraphs being marked as headers. Still, they are very readable.

EPUB eBook Reading Software

There are a number of other readers out there so you might want to search around to find your preferred software.

If you liked this post, say thanks by sharing it.
Thomas Krebs April 15, 2009 at 7:09 am

Many german EPUB-Documents are unfortunately not yet readable (in Adobe Digital Editions or Sony PSR-700) due to the bad representation of the german special characters (“Umlaute” ä, ö ,ü, schafes s) : For example in “Japanische Märchen” (ETxt-No. 23393). However, HTML works fine but has to be converted with calibré before downloading to the reader!

Mike Cook April 15, 2009 at 10:26 am

Hi Thomas,

That’s quite a big problem for German texts isn’t it! Thanks for letting me know, I’ll pass this info along to Marcello. Hopefully it won’t be a big task to fix this.

ronnie sahlberg April 16, 2009 at 10:43 pm

Great stuff.

I have tried a lot of books and most work well. There are some snags though which would make it useful if there was a mailinglist where issues/improvements can be discussed.

For example, it could be improved to use better heuristics to detect what is a pagenumber and should therefore be filtered out/handled differently than just make it part of the text.
Example : http://www.gutenberg.org/etext/28473
This, as well as a few hundred other books at gutenberg use :
span class=’pagenum pncolor’ a id=’page_30′ name=’page_30′ /a 30 /span
to mark this as a pagenumber in a box to the far right on the html page.
The conversion heuristics can not yet handle this with the result that the “30″ appears in the actual paragraph text.

Page breaks,
The book in the example above uses
hr.ppg-pb {border:none;page-break-after: always;}
hr class=’ppg-pb’ /

To enforce a page-break after the ruler.
Currently the conversion does not pick up where page-breaks are encoded in the html.
It would be great if the conversion could pick up where “page-break-after: always” is active and enforce a hard page-break at those spots.

Jonathan Moore January 16, 2010 at 3:38 am

The Barnes & Noble Nook also supports ePub format natively.

Craig April 23, 2010 at 12:32 pm

iBooks on the iPad (and on the iPhone/iPod Touch when the 4.0 OS is released this summer) also supports ePub format…just drag the file into the upper left area of iTunes.

David Hendrickson May 14, 2010 at 3:28 pm

I appreciate the availability of epub-formatted books on PG.

I do have one complaint, however. Earlier implementations of your converter provided flush-left.rag-right output. Now the output is justified, which makes the books close to useless on small e-readers.

It is never necessary to justify e-texts. Justification looks nice on a printed page, but creates more problems than it solves in an electronic format. It is fast becoming an anachronism. It is silly for one medium to mimic another. You give the impression that you’re just trying to curry favor with sentimental book-lovers. I love my many many books but have no need to have my e-texts mimic the printed page.

Mike Cook May 23, 2010 at 4:25 am

@David, I had a chat with Marcello about this and he says that if the source HTML file has justified text then the EPUB version will also have justified text.

Also, this comes down personal tastes, which could be debated endlessly. You either like it or your don’t. In the future eReaders will hopefully allow this to be overridden.

a dude January 20, 2011 at 12:59 am

Anybody try reading these epubs with a sony reader? I primarily didn’t buy a kindle because it doesn’t support epub, but when I try download an epub from PG on my sony reader, I just get a file format not supported error.

Mike Cook January 20, 2011 at 10:16 am

There is a limitation with devices like the Sony Reader, in that they can not handle chapter sizes (the HTML files) larger than 300KB and the last time I checked PG creates these EPUB’s without splitting the chapters in separate HTML files; therefore many are over this 300KB limit.

I’m doing conversions of the PG ebooks on my epubBooks.com website, which conform to the EPUB specification, but also have more advanced features like linkable footnotes. I like to think the formatting is also much better than PG’s. They work very well on any EPUB enabled device.

Currently the catalogue is quite small but I’ve concentrated on the more popular titles and the list is contstantly growing.

Manuel J. Escobar March 31, 2011 at 2:43 am

Hi people out there, how can I get Stanza to keep page breaks when converting books from Word or PDF format? I tried (in Word) to insert the appropriate commands but they all get stripped away. Maybe using a different converter? Thnx

{ 2 trackbacks }