For sometime I’ve been working through the Newsletter Archives doing a manual count on all the ‘posted’ listings, then entering the results into a spreadsheet. I wanted to confirm that all the statistics added up correctly. There was some inconsistencies but I believe that now it is a much more accurate week-by-week, year-by-year account.
As a result there will be some differences in the numbers previously recorded in the newsletters. Therefore I have put together this article to show these new yearly statistics for each year since 2001.
Notes of interest include;
3,042 is the starting figure for 2001 (previously thought to be 3,100)
Almost all of the DP-EU books have been posted to the PG site. These duplicates are now subtracted from the totals. Both overall and unique totals will be shown.
PG U.S. total includes the PG-EU ’non-unique’ figures as these are still a part of that count.
REposted total had to be adjusted at the end of 2006 as it was found that a number of reposts were not documented in any newsletter listings.
This is neither a fully accurate nor definitive count, but it certainly brings us another step closer to that goal.
The following is a listing of all the entries on the Project Gutenberg Reserved/Pending list. These numbers have been reserved for various reason and at present there is no date as to when they will get posted.
Reserved / Pending
gbnewby 20000
Reserved "The Road Leads On, by Knut Hamsun" 7536
Reserved "The Arabian Nights Entertainments Volume 2, Anon." 5613
Reserved / Pending Unknown 3018
Reserved / Pending Unknown 2879
Reserved "Added Upon a Story, by Nephi Anderson" 2877
Reserved / Pending Unknown 2738
Reserved "The Home Book of Verse, by Burton E. Stevenson V8" 2626
Reserved "The Home Book of Verse, by Burton E. Stevenson V7" 2625
Reserved "The Home Book of Verse, by Burton E. Stevenson V6" 2624
Reserved "The Home Book of Verse, by Burton E. Stevenson V5" 2623
Reserved "Human Genome Project, About the Human Genome Files" 2200
Reserved "The Original Writings of Samuel Adams, Volume 1" 2091
Reserved "2001, by Arthur C. Clarke" 2001
Reserved "1984, by George Orwell (Did it come true?)" 1984
Reserved Pietro di Miceli (former PG Webmaster ) 1964
Reserved Reserved for WWI 1914
Reserved "Twelfth-Night, by William Shakespeare" WL 1789
Reserved Reserved for Shakespeare WL 1767
Reserved Reserved for Shakespeare WL 1766
Reserved "I Have A Dream, Martin Luther King, Jr." 1691
Pending Unfilled 1648
Pending Unfilled 1647
Pending Unfilled 1255
Reserved The Project Gutenberg Encyclopedia 199
Reserved The Project Gutenberg Encyclopedia 198
Reserved The Project Gutenberg Encyclopedia 197
Reserved The Project Gutenberg Encyclopedia 196
Reserved The Project Gutenberg Encyclopedia 195
Reserved The Project Gutenberg Encyclopedia 194
Reserved The Project Gutenberg Encyclopedia 193
Reserved The Project Gutenberg Encyclopedia 192
Reserved The Project Gutenberg Encyclopedia 191
Reserved The Project Gutenberg Encyclopedia 190
Reserved The Project Gutenberg Encyclopedia 189
Reserved The Project Gutenberg Encyclopedia 188
Reserved The Project Gutenberg Encyclopedia 187
Reserved The Project Gutenberg Encyclopedia 186
Reserved The Project Gutenberg Encyclopedia 185
Reserved The Project Gutenberg Encyclopedia 184
Reserved The Project Gutenberg Encyclopedia 183
Reserved The Project Gutenberg Encyclopedia 182
Reserved The Project Gutenberg Encyclopedia 181
If you’re interested in converting the Project Gutenberg Plain-Vanilla ASCII texts into PDF, HTML or other format that will allow you to create a versatile display format, then you may find it useful to remove the mid-paragraph hard linebreaks that exist in these files.
A ‘How To’ was recently posted on the Project Gutenberg gutvol-d mailing list, which is a great guide on how to do this procedure.
The following article was posted by Jon Noring on the TeleRead blog in February 2007. This is an excellent discussion on why Digital Text Master files should be created along with ideas on how to implement it. — Ed
‘Digital Text Masters’ (Digitizing the classic public domain books)
by Jon Noring
The recent TeleBlog articles about the Project Gutenberg (PG) text Tarzan of the Apes (see 1, 2), suggest that not all is well in the existing corpus of public domain digital texts.
My personal experience the last twelve years in digitizing several public domain books has helped me to see a number of problems which I’ve mentioned in various forums, including the PG forums, and The eBook Community. For the sake of not turning this already long article into a whole book, I won’t cover here the complete list of problems I found, plus those found by others.
To summarize what I believe should be done to resolve most of the known problems, when it comes to creating a digital text of any work in the public domain, we should first produce and make available what we call a “digital text master,“ which meets a quite high degree of textual accuracy to an acceptable and known print source. From the “master,” various display formats, and derivative types of texts (e.g., modernized, corrected, composite, bowdlerized, parodied, etc.) can then be produced to meet a variety of user needs.
(Btw, what better example to illustrate the concept of a “digital text master” than to show the self-portrait of the great 17th century Dutch master painter, Rembrandt van Rijn, whose attention to detail and exactness is renowned.)