In this months newsletter, Michael Hart announces some major projects from Project Gutenberg for this year.
Web Pages designed by and for our Project Gutenberg readers
Michael wants to make a push for web pages to be designed by, and for, people from various age groups and in any language. This can include the youngest through to the oldest with pages designed around favorite subjects, favorite authors, or even favorite books or characters.
Project Gutenberg Monthly Newsletter
The Project Gutenberg Monthly Newsletter, Mar. 21, 2009
eBooks Readable By Both Humans And Computers Since 1971
45 Months to The End of the World Via Mayan Calendaring
on December 21, 2012 [some now saying October 11, 2011]
Leaving 3 years 9 months, 15 seasons or 44 months.
Not to worry, I will still make long range predictions.
Erratum: Last month I reversed the labels on the month
before and the month before that in the statistics part
and I have two possible totals for the past month which
are indicated in the current statistical review.
It would be nice to have some do spreadsheets of these,
hint, hint. . . .
My apologies, it was a tough month.
Headlines
PG Listed in 100 Best Websites for Free Adult Education
http://www.onlinedegreeworld.com/blog/2009/
100-best-websites-for-free-adult-education/
In line with our major projects for the year listed below, here
is a cute little awk [mawk] script that you can use to convert
eBooks to formats for smaller screens. The default is 15 lines
but you can work your own preferences into the script.
Next month we should be announcing that pglaf.org will have the
tools online for you to convert eBooks to be read on cellphones.
If you can contribute any ideas, scripts, programs, etc., to the
effort to make eBooks available on more devices just let me know
and will write your contribution up a future Newsletter.
Script begins:
#!/usr/bin/mawk -f
# Written by Jon-Egil Korsvold on friday the 13th of March 2009.
Mare is short for Mawk Reformatter. The program can
# reformat text files to increase readability on small devices
with dumb ebook readers. My mp3 player has a 14 characters
# wide display, and the ebook reader breaks the words in
inappropriate places. This program doesn't split long words,
# but the line is broken after each long word, so they won't
mess up the display for more than a few lines.
#
# This program can be freely distributed. You may give away
copies of it, but you may not sell it or remove my name from it.
# Use at your own risk!! Run the program without arguments to get
the manual _before_ you attempt anything else! You may
# need too edit the path to mawk above and md some of the
commands below. No warranty, have fun! This program has not been
# extensively tested. It should be considered beta software.
#
#
# Jon-Egil Korsvold 15th of March 2009
#
#
[Warning from Michael Hart: I am not sure my cut and paste did
everything exactly, so if you have trouble running this, email
me at hart at pglaf.org and I will forward you my original copy.]
BEGIN {
tempfile="/tmp/mare.txt"
fc1="find -L "
fc2=" -noleaf|egrep txt$|htm$|html$ >> "tempfile
rm="rm "tempfile
md="mkdir -p " #for directories
rm="rm "tempfile
md="mkdir -p " #for directories
sep="/"
x=0 #Holds the current line position in
characters
y=0 #Holds the length of the current word
val=0 #Holds the return value, if greater than 0,
the help text is
printed
os="err" #Dos or *nix
#Exit if less than four arguments were used (width of display in
characters, -d/-u,
output dir and source
dir)
if (ARGC > 3)
{
# Get and set width in characters, exit with error message
unless the value
is a number
count=ARGV[1]
ARGV[1]=""
if (count !~ /[0-9]+/)
{
val=1
exit
}
#The os value is initially "err". Set it to dos or nix if
the appropriate
switch was used. Define
line endings
#accordingly. Exit with error message if os=err (No switch
was used)
if (ARGV[2] ~ /^-d$/)
{
os="dos"
nl="
"
os="dos"
nl="
"
}
else
{
if (ARGV[2] ~ /^-u$/)
{
os="nix"
nl="
"
}
}
if (os ~ /^err$/)
{
print ("You have to use -d or -u as the second
argument!")
val=1
exit
}
ARGV[2]=""
#Get and set output directory. Add a trailing slash if
necessary.
odir=ARGV[3]
ARGV[3]=""
if (odir ~ /./)
{
print ("The third argument has to be a directory. A
file won't do!")
val=1
exit
}
if (odir !~ sep"$")
{
odir=odir""sep
}
#Loop through the rest of the command line arguments. Call
find and grep to
get the files in
directories,
#but write files to tempfile directly. Skip unsopported file
types with a
warning.
fctr=4
while (fctr < ARGC)
{
idir=ARGV[fctr]
ARGV[fctr]=""
if (idir ~ /./)
{
if (idir ~
/.txt|.htm|.phtml|.shtml|.htm/)
{
system ("echo " idir " >> "
tempfile)
}
else
{
print ("The file type of " idir "
isn't supported!")
}
}
}
}
else
{
system(fc1 idir fc2)
}
fctr++
}
FS=sep
fctr=0
#Exit with error message if tempfile is empty or doesn't
exist.
if (getline < tempfile < 1)
{
print ("No files found!")
val=1
exit
}
close (tempfile)
#Traverse tempfile line by line and use slash as field
separator. The whole
line is stored in pa
(path array)
#which holds the input files. The last field holds the file
name without the
path, and it is stored i
fa
#(file array). The field before the last field holds
directory information.
It is stored in da
(directory array).
#Directories are created as needed below.
while (getline < tempfile > 0)
{
x=NF
fa[fctr]=$x #file array
if (x > 1)
{
x--
da[fctr]=$x #directory
array
(odir/da[actr]/)
if (da[fctr] !~ sep"$")
{
da[fctr]=da[fctr]""sep
}
}
else
{
da[fctr]=""
}
system (md odir""da[fctr])
pa[fctr]=$0 #path array (for
input files)
fctr++
}
#Reduce by one to get the last element of the arrays. Reset
field separator
to get words. Remove
tempfile.
fctr--
FS=" "
system (rm)
#Loop through the arrays from the last to the first element
(0). Try to open
the elements in pa as
files
#and print a warning on errors.
while (fctr >= 0)
{
if (getline < pa[fctr] < 1)
{
print ("Error processing "pa[fctr])
}
close(pa[fctr])
#Loop through the words in each line.
while (getline < pa[fctr] > 0)
{
gsub ("
", "") #Remove dos endings
ctr=1 #Used to reference fields in
the current
record
#Set output file, i.e. edit the path, add
format information
and change the
#file type to txt.
ofile=fa[fctr]
gsub(/..*/,"",ofile)
ofile=odir""da[fctr]"fmt-"count"-"ofile".txt"
#Keep track of the length of current word
(y) and the
position on the line (x), break
lines
#accordingly with the content of nl (dos or
nix endings)
#Skip lines starting and ending with css or
html commands
while (ctr <= NF && $0 !~ /^<.*>$/ && $0
!~ /^{.*}$/)
{
y=length($ctr)
x=x+y
if (x < count) #Increment x to
account for trailing
space
{
x++
}
else
{
printf("%s",nl) > ofile
x=y+1
}
#Remove some embedded html and css
commands and
superfluous spaces
gsub (/<.*>/, "")
gsub (/{.*}/, "")
gsub (/[ ][ ]+/, " ")
printf("%s ",$ctr) > ofile
ctr++
#Increment to
reference next field (word) and
loop
}
if (NF == 0 && $0 !~ /^<.*>$/ && $0 !~
/^{.*}$/)
#Print a double newline to make a
paragraph if the
record was empty
{
printf("%s%s", nl, nl) > ofile
x=0
}
}
printf("%s%s", nl, nl) > ofile
print("Writing to "ofile)
close(ofile)
fctr--
#Next file in array
}
exit
}
else
{
#exit with error message if less than four arguments were
used
val=1
exit
}
}
#Exit with the help text in case of errors
END{
if (val == 1)
{
print ("
Mare (mawk reformatter) reformats ebooks for
viewing on small
displays.
")
print ("Width in characters, option, output directory, input
directories or
files")
print ("Example: mare 20 -d ebooks /mnt/sda2/gutenberg
/mnt/sda2/freeread")
print ("Reformat all text and html files in the last two
directories.")
print ("Use 20 characters per line and dos style line
endings.")
print ("Reformat all text and html files in the last two
directories.")
print ("Use 20 characters per line and dos style line
endings.")
print ("The resulting files are written to the last level of
the original")
print ("directory tree in the directory ebooks in the
current directory.")
print ("Run the program without arguments to get this
help!
")
print ("Valid options:")
print ("-d Use dos style line endings")
print ("-u Use *nix style line endings
")
print ("Requirements:")
print ("- mawk")
print ("- a *nix version of find")
print ("- a *nix version of mkdir")
print ("- echo")
print ("- egrep")
print ("- rm
")
print ("The target os can be dos/win or *nix.")
print ("The host os probably has to be *nix.
")
print ("Written in March 2009 by Jon-Egil Korsvold.")
print ("Use at your own risk, no warranty!")
print ("The program can be freely distributed with author
information,")
print ("but not sold. Happy reading!")
}
}
A Few Major Projects To Start Out the New Year. . . .
1. Web Pages Designed By And For Our Project Gutenberg Readers.
Including kids. If you know of any kids or schools interested
in making eBooks, eBook pages, etc., please let me know.
In fact, I would LOVE to see kids write up their own versions of
our classics such as Alice In Wonderland, Looking Glass or Peter
Pan, Robin Hood, AEsop's Fables, etc., in their own words!!!
THAT would be a VERY interesting collection to read!!!
2. Textbooks Are Becoming A More And More Highly Requested Item.
3. Request To Help Complete Our Collection Of Andrew Lang Books.
4. eBooks On Cellphones: We Have Several Formats You Can Try.
And a new one coming next month!
1. Web Pages Designed By And For Our Project Gutenberg Readers.
This would include other languages, web pages designed by and for
people of various ages from the youngest to the oldest, and, even
web pages designed around favorite subjects, favorite authors, or
even favorite books or characters.
Personally, I would LOVE to see web pages designed for readers at
various grade levels and then translated into many languages.
2. Textbooks Are Becoming A More And More Highly Requested Item.
As more and more people spend more and more years homeschooling a
greater portion of modern kids, they are asking us for more books
to help teach any of the various subjects, from reading, writing,
and arithmetic, to geography and astronomy, to the dinosaurs, and
an enormous number of other subjects.
If you ever wanted to pass on your knowledge, now is the time and
the place, for books here last forever and cover the world.
3. Request To Help Complete Our Collection Of Andrew Lang Books.
Many of you are familiar with the various "Color" Fairy Books, as
"The Red Fairy Book," by Andrew Lang, and a host of other colors,
but few of us have ever even seen a list of them all, including a
surprising number of books relating true events, etc.
If you find any Andrew Lang books, Fairy, Animal, True, etc., that
we
don't have in our collection, please let me know, and we will help
in
the process of completing this collection.
4. eBooks On Cellphones: We Have Several Formats You Can Try.
Let me know if you would like to help us set up our Cellphone pages
to bring more eBooks to more people in more of the world.
Our All Time Hottest Requests!!!!!!!
FLASH RAM
I am looking for the earliest flash RAM possible.
The very earliest were PCMCIA cards, such as used for the
Poqet computer, HP 95, etc.
The earliest USB flash drives were Disgo/Dizgo, M-Systems
and these were OEMed by IBM, HP, etc. They are particular
in a recognizable fashion because their snapon connectors
resemble the connectors of jigsaw puzzles.
We received two examples of RAM actually labeled "Flash,"
for the H-P 95 pocket DOS machine from 1991, and a sample
of Fairchild bubble memory, as well, from down under.
The PCMCIA cards were labeled series TWO, need series ONE.
Thank you, Mate!
POWERPOINT
We need someone who can do PowerPoint illustrations.
One in particular, building a 3-D box of 1,000 dominoes.
Additional Newsletter Services
In addition, we will provide the PG Canada Newsletter and
totals from PG of Australia, Europe, PrePrints, etc.
You should notice that we had a very good month, with 100
books done nearly every single week.
These totals do NOT include 75,000+ at
httpwww.gutenberg.cc
Where there are eBooks representing over 100 languages.
The Project Gutenberg Statistical Report
[As of about noon Central Daylight Time]
These are the various totals from the ~30,000 at
httpwww.gutenberg.org
and our other Project Gutenberg Sites
day | cnt
----------------+-----
Sat 2009-03-14 | 2
Sun 2009-03-15 | 11
Mon 2009-03-16 | 8
Tue 2009-03-17 | 4
Wed 2009-03-18 | 6
Thu 2009-03-19 | 7
Fri 2009-03-20 | 13
Total 51
Thanks to Marcello Perathoner!
Here are the current language totals
for languages with over 100 eBooks.
28272
23852 English en
1392 French fr
572 German de
493 Finnish fi
408 Dutch nl
399 Chinese zh
312 Portuguese pt
227 Spanish es
188 Italian it
Grand total for today: 28,272 [+ 243]
Compared to last month's 28,029
23669 English en
1374 French fr
567 German de
490 Finnish fi
402 Dutch nl
399 Chinese zh
302 Portuguese pt
225 Spanish es
178 Italian it
Thanks to Greg Newby!
//////
And From Project Gutenberg Sites Worldwide
28,272 up 243 PG General Automated Count
1,749 up 21 PG of Australia
602 up 37 PG of Europe
2,020 up 2 PG PrePrints, Reserved [42],etc.
242 up 20 PG of Canada, Estimated.
======
32,814 up 367 or 323 Sorry, I reversed last months totals
as below, my apologies, and can't find all
of the details to check between these two.
This was reported as last month but was really the month before.
27,755 up 280 PG General Automated Count
1,728 up 5 PG of Australia
565 up 12 PG of Europe
2,013 DOWN 481 PG PrePrints, Reserved [42],etc.
222 up 20 PG of Canada, Estimated.
======
32,283 DOWN 164 due to PrePrints and Reserved fixes
Reversed from what was reported as the month before below
Switch the months and it will make much more sense, sorry.
27,475 up 287 PG General Automated Count
1,723 up 6 PG Australia
553 up 13 PG Europe
2,494 up 33 PG PrePrints
202 up 12 PG Canada [Estimated]
======
32,447 up 349 by various automated counts and newsletters
Note Without counting PrePrints, we are still about 30K,
and some of the new .lit collection will not make it under
our current rules of addition from PrePrints, and would be
deleted from PrePrints without moving to other listings.
The 307 Chinese eBooks in PrePrints will probably go, as a
team of our best Chinese workers says they are not worth a
lot more time to work on, etc.
Note There are perhaps 100 eBooks not listed here
that are already in circulation from Project Gutenberg.
Note PG Canada includes English, French, and Italian.
Here is how we ended 2008
27,616 PG General Automated Count
1,726 Project Gutenberg of Australia
554 Project Gutenberg of Europe
225 Project Gutenberg of Canada [Estimated]
[202 up to December, no current report]
2,431 PrePrints [Counting the 307 Chinese eBooks +111]
====== ======
32,552 Grand Total [Counting those PrePrints]
Here is how we ended 2007
The combined PG projects had produced a total of 26,161 titles.
The most number of books posted...
...in one day was 65 on the 26th December
...in one week was 151 in Week 18 (week ending 9th May)
...in one month was 477 in November
We averaged
338 per month [Over 4,000 for the year]
78 per week
11.13 per day
99 titles were newly REposted to the new filing system, bringing us
almost to the
2,000 mark.
Here is a small selection of project milestones;
TOTAL Original Project Gutenberg eBooks equals about
the number of books in the average U.S. public library
32,500 on 20082121 [Counting the 307 Chinese Preprints]
[And presuming 3 after official count]
32,000 on Calcuating
31,500 on 20081021 [not an error, 1,777 PrePrints]
30,000 on 20081021
29,500 on 20080919
29,000 ~~ Calculating
28,500 ~~ Calculating
28,000 ~~ 20080516
27,500 on 20080405
27,000 ~~ 20080229
26,500 on 20080126
26,000 on 20071224
25,000 on 20071012
24,000 on 20070710
23,000 on 20070415
PG-AU
1,700 on 20081010
1,600 on 20080208
1,500 on 20070407
PG Canada
175 on 20080930
100 on 20080325
110 on 20080417
It was only a few months ago that Project Gutenberg announced an effort to make mobile all editions of their titles available. This was big news, however, in my eyes the latest eBook format to be released by PG is even bigger news.
Project Gutenberg has now made almost all their titles available in the industry ebook standard EPUB and all are DRM free (Digital Rights Management)!
Here is the news from PG Canada for January.
We published a total of 23 ebooks during the month: we have now published a
cumulative total of 251 ebooks.
The New Releases section at the top of the PGC main page always gives the
details of new releases for the most recent three months.
LANGUAGES:
- 17 in English
- 5 in French
- 1 in German
GENRES
- 9 novels
- 2 biographies
- 2 instructional manuals
- 2 books for children
- 2 books of drawings
- 1 book of music criticism
- 1 history book
- 1 technical monograph
- 1 personal journal
- 1 collections of essays
- 1 play
8 of this month's ebooks were by Canadians or had a connection to Canada.
14 of this month's titles were fiction, and 9 were non-fiction.
The month featured a very wide variety of ebooks. We published our first play,
Tieck's translation into German of Shakespeare's "The Tempest". We also
published the first of the two volumes of O. D. Skelton's massive biography of
Canadian prime minister Sir Wilfrid Laurier, one of Canada's most celebrated
political biographies.
Authors new to PGC this month included:
Brazil, Angela (1868-1947) [English novelist]
Féval, Paul (1816-1887) [Romancier français]
Foster, Robert Frederick (1853-1945) [Scottish authority on card games]
Lambert, Leonard Constant (1905-1951) [English composer and conductor]
Legendre, Napoléon (1841-1907) [Journaliste canadien]
Lyall, Edna [Bayly, Ada Ellen] (1857-1903) [English novelist]
McIlwraith, Jean Newton (1859-1938) [Canadian author and editor]
Melville, Frederick John (1882-1940) [British philatelist]
Shakespeare, William (1564-1616) [English playwright and poet]
Tieck, Johann Ludwig (1773-1853) [German novelist and poet / romancier et poète
allemand]
***************
Thanks as ever for your support!
Mark