If you’re interested in converting the Project Gutenberg Plain-Vanilla ASCII texts into PDF, HTML or other format that will allow you to create a versatile display format, then you may find it useful to remove the mid-paragraph hard linebreaks that exist in these files.
A ‘How To’ was recently posted on the Project Gutenberg gutvol-d mailing list, which is a great guide on how to do this procedure.
The following article was posted by Jon Noring on the TeleRead blog in February 2007. This is an excellent discussion on why Digital Text Master files should be created along with ideas on how to implement it. — Ed
‘Digital Text Masters’ (Digitizing the classic public domain books)
by Jon Noring
The recent TeleBlog articles about the Project Gutenberg (PG) text Tarzan of the Apes (see 1, 2), suggest that not all is well in the existing corpus of public domain digital texts.
My personal experience the last twelve years in digitizing several public domain books has helped me to see a number of problems which I’ve mentioned in various forums, including the PG forums, and The eBook Community. For the sake of not turning this already long article into a whole book, I won’t cover here the complete list of problems I found, plus those found by others.
To summarize what I believe should be done to resolve most of the known problems, when it comes to creating a digital text of any work in the public domain, we should first produce and make available what we call a “digital text master,“ which meets a quite high degree of textual accuracy to an acceptable and known print source. From the “master,” various display formats, and derivative types of texts (e.g., modernized, corrected, composite, bowdlerized, parodied, etc.) can then be produced to meet a variety of user needs.
(Btw, what better example to illustrate the concept of a “digital text master” than to show the self-portrait of the great 17th century Dutch master painter, Rembrandt van Rijn, whose attention to detail and exactness is renowned.)
This is an extract from Jonathan Lethem's article on copyright in Harper's. It's a long article but well worth the read.
Literature has been in a plundered, fragmentary state for a long time. When I was thirteen I purchased an anthology of Beat writing. Immediately, and to my very great excitement, I discovered one William S. Burroughs, author of something called Naked Lunch, excerpted there in all its coruscating brilliance. Burroughs was then as radical a literary man as the world had to offer. Nothing, in all my experience of literature since, has ever had as strong an effect on my sense of the sheer possibilities of writing. Later, attempting to understand this impact, I discovered that Burroughs had incorporated snippets of other writers' texts into his work, an action I knew my teachers would have called plagiarism. Some of these borrowings had been lifted from American science fiction of the Forties and Fifties, adding a secondary shock of recognition for me. By then I knew that this “cut-up method,” as Burroughs called it, was central to whatever he thought he was doing, and that he quite literally believed it to be akin to magic. When he wrote about his process, the hairs on my neck stood up, so palpable was the excitement. Burroughs was interrogating the universe with scissors and a paste pot, and the least imitative of authors was no plagiarist at all.
Nelson W. Polsby, who marshaled intellectual rigor, lucid writing and a knack for drawing striking lessons from real-life observation in his enduring studies of Congress and the presidency, died on Tuesday at his home in Berkeley, Calif. He was 72.
The cause was complications of congestive heart failure, his daughter Emily Polsby said.
Mr. Polsby, a political scientist, wrote or edited at least 15 books and scores of articles and edited The American Political Science Review, the most prestigious political science journal. He was especially known for his studies of Congress, the presidency, political parties, policy making and the media.
The Library of Congress has received a $2-million grant from the Alfred P. Sloan Foundation to digitize public domain works. The grant emphasizes digitizing "at-risk" titles—or books that are falling apart—and volumes about American history. Dubbed "Digitizing American Imprints at the Library of Congress," the project will also allow the LoC to invest in such technology as, according to a statement from the organization, "suitable page-turner display" along with a program dedicated to quickly indexing and capturing chapters and other sections of a work.Publishers Weekly Daily, 5 February 2007
Teachers have steered the Shakespeare curriculum for younger pupils in England away from Othello and Henry IV Part I in favour of lighter texts. After a poll, plays set for 13 and 14-year-olds in England could include Romeo and Juliet and As You Like It. Othello did not make the list because more than half of those questioned said the themes of sexual jealousy and racism were not suitable for that age. Teachers say the exam system impedes the enjoyment of Shakespeare anyway.
Molly Ivins, the liberal newspaper columnist who delighted in skewering politicians and interpreting, and mocking, her Texas culture, died yesterday in Austin. She was 62.
Ms. Ivins waged a public battle against breast cancer after her diagnosis in 1999. Betsy Moon, her personal assistant, confirmed her death last night. Ms. Ivins died at her home surrounded by family and friends.
In her syndicated column, which appeared in about 350 newspapers, Ms. Ivins cultivated the voice of a folksy populist who derided those who she thought acted too big for their britches. She was rowdy and profane, but she could filet her opponents with droll precision.
The last instalment of the Harry Potter saga will be published on 21 July, author JK Rowling has announced. She confirmed the date fans will be able to get their hands on Harry Potter and the Deathly Hallows on her website. Rowling has said two characters die in the final book and fans are wondering whether Harry is one of them. The Potter books have sold 325 million copies worldwide, have been translated into 64 languages and spawned five blockbuster movies.
Sidney Sheldon, best-selling US author of Rage of Angels and The Other Side of Midnight, has died at the age of 89. He died of complications from pneumonia at a hospital near Palm Springs, California, his publicist said. Before turning to novels at the age of 50, Sheldon had a successful career writing Broadway plays and films. He won an Academy Award in 1948 for The Bachelor and the Bobby-Soxer, starring Cary Grant, and created long-running TV series Hart to Hart. But it is his hugely popular novels – devoured by readers though scorned by critics – for which he will be remembered.