Cultural Genome created from eBooks

by Michael Hart on December 24, 2010
News

Cultural Genome Graphic

Ever since eBooks started people have been analyzing a variety of concepts, ideas, subjects, etc. about ways a language system works.

Believe it or not back in the days when The Oxford Text Archive was still thinking it could dominate the world, at least the virtual literary world, they spent several times an entire median income just counting up “and” as it was used by Shakespeare. This was only part of many different studies trying to determine if Shakespeare is really Shakespeare, or just some other bloke using that name to further his own interests.

Personally, I think they could have used something such as The Project Gutenberg Shakespeare along with program features in Wordstar, etc., to do the counting with the same kind of results, perhaps even more accurate, and a lot less wasted time and money.

Today’s Cultural Genome studies hopefully are done in seconds rather than a year and hopefully cost much less, unless someone is getting drastically overpaid just for supervising what could be done by student assistants, as was also done back in the day.

Here are some of the highlights of this study of a five million plus collection of eBooks:

1. People are Getting Famous at Younger Ages.

Hmmm, that seems obvious since The Mickey Mouse Club–though we do have to consider The Little Rascals, also Shirley Temple, Mickey Rooney, and Judy Garland, but a whole era since teens took over major portions of this economy, obviously has created many more such stars.

It also becomes more obvious when we consider that the eras before radio, movies and television will not have the same opportunities to consider.

In those eras people really had to have both a talent, and the ability to apply that talent to the world with no “talent scout” industry to round them up and get an entire career going, sometimes to the detriment of the child stars, as we have seen so much of lately.

Of course, if you go back far enough, perhaps further, and further than this Cultural Genome seems to, we are likely to find the ages once again getting younger, as the average lifespan wasn’t very long.

Think about The Montague’s and Capulet’s of Verona in an interesting play by Shakespeare called Romeo & Juliet.
The four parents of Romeo & Juliet are probably around 30 years old, given that Romeo & Juliet are young teen age kids, and Juliet should likely have kids less than a year later if her parents plans had worked out.

Given that the Montague’s and Capulet’s were the leaders of civic Verona at that age and no mention was made of their parents, we must presume they have passed away.

This leaves only the young….

It isn’t that I don’t enjoy reading the results of the Cultural Genome study, I just think they probably will find out, just as with Shakespeare, that things are in many ways the way they appear.

After all, only those who were victims of censorships, and there were plenty, didn’t already think the central characters, both children or parents in Romeo & Juliet were of any age considered serious today.

2. Things are Changing Faster and Faster

This was demonstrated in several different ways, but I wasn’t surprised or amazed by them, sadly to say.

A. Total words in English doubled from WWII to today, approximately and I don’t think anyone would be unduly surprised to hear that, given the number of new words, each and every year. The cool part is that this was a conclusion drawn independently of lexicographers doing their picking and choosing which goes in dictionaries.
That part I like a lot, and I think should be included in planning future dictionaries…to look at reality in terms of what words people are actually using.

B. “Half Life” of references.

The average number of references to “1880” fell a half in the 32 years up to 1912, but similar references for “1973” fell to half in only 10 years up to 1983.

Again, in a world that is obviously addicted to using, and only using, the present tense, while using history for little more than a doorstop, this is not amazing.

While I must say I would like to expect something more from The Journal Science, I am pleased to find that it gives some background support in terms of how history, present and past, is ignored more and more quickly and thus explains the entire fixation our society has with the idea that things are getting more ADD and ADHD.

When it comes right down to it, this study indicates a lot of our society has been more and more ADD when the subject comes to remembering the past, not just when a current generation is measured, but also back to 1973, and probably a lot longer.

I believe that there is some truth to what some people say about there being too much information today, that no one can really get a handle on it, that they should be expected to have to depend on others, professionals of some nature, to pick and choose, and evaluate great amounts of information and then say what to look at.

However, that being said, I must repeat that I said it was “some truth” and that an even larger worry will be the certainty that these choices will be biased, even, as scientifically proven ad nauseum, the most honest a person can hope for can be hired for such purposes.

Even those who intend to be impartial have biases, and cannot help but pass them along. Then, on top of that, we must consider that most media professionals are not this honest to begin with.

However, I really do like the idea of having databases of everything ever written, or whatever. What I don’t like, of course, is that there is limited access to them, even the public domain part.

Of course, by 2100 there won’t be enough public domain for that to be a consideration.

[Formatting, links and spelling corrections made–Ed.]

If you liked this post, say thanks by sharing it.