Project Gutenberg and Languages

by Marie Lebert on October 8, 2010
PG News

Project Gutenberg has been a visionary project launched by Michael Hart in July 1971 to create free electronic versions of literary works and disseminate them worldwide. The project got its first boost with the invention of the web in 1990, and its second boost with the creation of Distributed Proofreaders in 2000, to help digitizing books from public domain. In 2010, there are Project Gutenberg websites in the U.S., in Australia, in Europe, and in Canada, with more websites to come in other countries.

1990-94

Initially, the ebooks were mostly in English. As Project Gutenberg was launched from the University of Illinois, with the help of English-speaking volunteers, its first goal has been to provide ebooks to the English-language community, that is to say 95% of internet users in the early 1990s. (Non-English-speaking users reached 50% in summer 2000, with a percentage steadily growing ever since.)

Project Gutenberg was also inspiring other digital libraries elsewhere. Projekt Runeberg, for classic Nordic literature, and Projekt Gutenberg-DE, for classic German literature, started respectively in 1992 and 1994.

Projekt Runeberg was the first Swedish digital library of books from public domain, and a partner of Project Gutenberg. It was created by the students’ computer club Lysator, in cooperation with Linköping University, as a volunteer project to produce and organize free electronic editions of classic Nordic (Scandinavian) literature. Around 200 ebooks were available in 1998, with a list of 6,000 Nordic authors as a tool for further collection development.

Projekt Gutenberg-DE was the first German digital library of books from public domain, and also a partner of Project Gutenberg. A number of texts were available for online reading in 1998, with one web page for short texts and several web pages – one per chapter – for longer works. There was an alphabetic list of authors and titles, with a short biography and bibliography for each author.

In 1997

French was the second language of Project Gutenberg, and still is in 2010.

The first ebooks released in French were six works by Stendhal and two works by Jules Verne, all released in early 1997.

The six works by Stendhal were four short novels: L’Abbesse de Castro (published in 1837), Les Cenci (1837), Vittoria Accoramboni (1837) and La Duchesse de Palliano (1838), and two novels: Le Rouge et le Noir (1830) and La Chartreuse de Parme (1839).

The two novels by Jules Verne were: De la terre à la lune (1865) and Le tour du monde en quatre-vingts jours (1873).

Three novels by Jules Verne were already available in English: From the Earth to the Moon (original title: De la terre à la lune), released in September 1993; Around the World in 80 Days (original title: Le tour du monde en quatre-vingts jours), released in January 1994; and 20,000 Leagues Under the Seas (original title: Vingt mille lieues sous les mers, 1869-70), released in September 1994.

Since then, Jules Verne has constantly been in the most downloaded authors, at 11 in the “Top 20” available on the website of Project Gutenberg in December 1999, and at 6 in the “Top 100 during the last 30 days”, on October 7, 2010.

As a side remark, the first pictures ever made available in Project Gutenberg were French Cave Paintings, released in April 1995 as ebook #249, with an XHTML version available in November 2000. This ebook offers four copyrighted photographs of paleolithic paintings found in a grotto in Ardèche, a region of south-western France. The photographs were sent to Project Gutenberg by Jean Clottes, a general curator for cultural heritage (conservateur général du patrimoine), for everyone to enjoy them.

Released in August 1997, eBook #1000 was La Divina Commedia de Dante Alighieri (published in 1321), in Italian, its original language.

In 1998

In October 1997, Michael Hart expressed his intention to produce more ebooks in other languages. In early 1998, there were a few titles in French, German, Italian, Spanish, and Latin.

Stendhal and Jules Verne were followed by Edmond Rostand with Cyrano de Bergerac, published in 1897 and released in Project Gutenberg 100 years later, in March 1998.

In 1999

Released in May 1999, eBook #2000 was Don Quijote (1605), by Cervantes, in Spanish, its original language.

In July 1999, Michael Hart wrote in an email interview: “I am publishing in one new language per month right now, and will continue as long as possible.”

In 2000

Released in December 2000, eBook #3000 was À l’ombre des jeunes filles en fleurs (In the Shadow of Young Girls in Flower), vol. 3 (1919), by Marcel Proust, in French, its original language.

In 2001

Project Gutenberg Australia was launched in July 2001, and began producing ebooks in English and in other languages, for example some volumes in French of À la recherche du temps perdu (In Search of Lost Time, 1913-27) by Marcel Proust. (The last volumes of Proust’s masterpiece are still copyrighted in the U.S., the reason why they are lacking in Project Gutenberg.) 1,750 ebooks were available in Project Gutenberg Australia in February 2009.

Released in October 2001, eBook #4000 was The French Immortals Series (1905), in English. This book is an anthology of short fictions by authors from the French Academy (Académie française): Emile Souvestre, Pierre Loti, Hector Malot, Charles de Bernard, Alphonse Daudet, and others.

In 2002

Released in April 2002, eBook #5000 was The Notebooks of Leonardo da Vinci (early 16th century), as an English translation from Italian, its original language. Since its release, this ebook has regularly stayed in the “Top 100” of downloaded ebooks.

In 2003

The Project Gutenberg Consortia Center (PGCC) was affiliated with Project Gutenberg in 2003, and became a sister project. Since 1997, PGCC has been working on gathering collections of ebooks from other sources, in a number of formats and languages, with 75,000 ebooks in 2010.

In 2004

In early 2004, there were works in 25 languages in Project Gutenberg.

In February 2004, Michael Hart went off to Europe, with stops in Paris, France; Brussels, Belgium; and Belgrade, Serbia.

He gave a lecture on February 12, 2004 at the headquarters of UNESCO (United Nations Educational, Scientific and Cultural Organization) in Paris. He chaired a discussion at the French National Assembly on February 13.

The following week, he addressed the European Parliament, in Brussels.

Then he went to Belgrade to meet with the team of Project Rastko and support the launching of Project Gutenberg Europe and Distributed Proofreaders Europe.

Project Rastko, a non-governmental cultural and educational project, was founded in 1997 in Belgrade as part of the Balkans Cultural Network Initiative, a regional network for the Balkan peninsula in south-eastern Europe.

Distributed Proofreaders Europe (DP Europe) has used the software of the original Distributed Proofreaders, created in 2000 by Charles Franks to share the correction of ebooks among many volunteers.

DP Europe has offered a plurilingual interface right from the start, with its main pages translated into 12 European languages by volunteer translators as soon as April 2004. The long-term goal was 60 languages, for the interface to be available in most European languages.

In 2005

In May 2005, DP Europe finished processing its 100th ebook. These ebooks were in several languages, as a reflection of European linguistic diversity. DP Europe has supported Unicode to be able to proofread ebooks in many languages. 600 ebooks were available in February 2009.

In July 2005, there were works in 42 languages in Project Gutenberg, including Sanskrit and the Mayan languages. The seven main languages were English (with 14,548 ebooks on July 27, 2005), French (577 ebooks), German (349 ebooks), Finnish (218 ebooks), Dutch (130 ebooks), Spanish (103 ebooks), and Chinese (69 ebooks).

In 2006

Released in December 2006, eBook #20000 was the audiobook of Twenty Thousand Leagues Under the Sea (Vingt mille lieues sous les mers, 1869), by Jules Verne, in its English version.

In December 2006, there were ebooks in 50 languages. The ten main languages were English (with 17,377 ebooks on December 16, 2006), French (966 ebooks), German (412 ebooks), Finnish (344 ebooks), Dutch (244 ebooks), Spanish (140 ebooks), Italian (102 ebooks), Chinese (69 ebooks), Portuguese (68 ebooks), and Tagalog (51 ebooks).

In 2007

Project Gutenberg Canada (PGC) was launched on July 1st, 2007, on Canada Day. Distributed Proofreaders Canada (DPC) started production in December 2007, with 100 ebooks in English, French, and Italian in March 2008, and 250 ebooks in February 2009.

In 2010

On October 7, 2010, there were works in 59 languages in Project Gutenberg. The ten main languages were English (with 28,441 ebooks), French (1,659 ebooks), German (709 ebooks), Finnish (536 ebooks), Dutch (496 ebooks), Portuguese (473 ebooks), Chinese (405 ebooks), Spanish (295 ebooks), Italian (250 ebooks), and Greek (101 ebooks). They were followed by Latin, Esperanto, Swedish, and Tagalog.

In 2020, may be

It is hoped machine translation software will be able to convert the ebooks from one to another of many languages.

In ten years from now, machine translation may be judged 99% satisfactory – research is active on that front, and software is quickly improving – allowing for the reading of literary classics in a choice of many languages.

The machine translated ebooks won’t compete with the work of literary translators and their labor of love during days and months if not years, but they would allow readers to get the gist of some literary works that have never been translated so far, or only translated in a few languages, for commercial reasons.

The output of translation software could then be proofread by human translators, in a similar way the output of OCR software is proofread by the volunteers of Distributed Proofreaders.

So, may be, we will see Distributed Translators one day, as a partner or sister project of Distributed Proofreaders and Project Gutenberg?

[My mother tongue is French.]

Copyright © 2010 Marie Lebert

If you liked this post, say thanks by sharing it.