Sunday, September 06, 2020

More Fun with Word Power

Have I mentioned I get obsessive when I'm stressed?

Last (half) week was the start of school. Always a busy time in my work life-- and this year, there's so much extra to do and we're making it up just as fast we can.

Anyway...

For a long time, I've been wondering how my word usage compares to other poets'. Not to brag, but I have a really large effective vocabulary compared to most English users. (Pardon a brief trip down memory lane: When we got back from Tanzania, I was planning to apply for college. I studied for the SATs. My mother and I went to a bookstore and picked up a study guide. I chose the "intermediate vocabulary" study guide, assuming it would be the right place to start. My mother glanced through it, and without saying anything, put it back on the shalf and handed me the advanced guide.)

I know a lot of words, and I'm not afraid to use them. Especially, I'm not afraid to use them in poems.

Is that good or bad? See my earlier thoughts about Gene Wolfe, and "doing more with less." Short answer: I don't know. It's all in what you're trying to accomplish. I like words.

I found a couple of sites that will take a text document and give you some word usage stats. Wordlist Maker and Character Count. (Note: Character Count does not count the number of characters that appear in a piece of fiction. I was disappointed.) They give slightly different results, which doesn't surprise me: you can get slightly different word counts out of different releases of Microsoft Word. I suspect a lot has to do with how they interpret plurals, contractions, posessives and the like.

I ran four texts through both of the above: High-Voltage Lines, Country Well-Known as an Old Nightmare's Stable, the manuscript for The Day of My First Driving Lesson, and the current state of the manuscript of Dervish Lions. (Note: Dervish Lions went to the publisher a couple of weeks back: there will be an editing and book design phase starting shortly.)

Both WM and CC give a total word count and a count of distinct words. CC's counts are consistently higher, but the ratio of distict words to total words, for each of my manuscripts, was very close to the same according to WM as according to CC. CC also gives a count of words that are used only once, and a count of "difficult" words (the site meant to be used educationally).

According to CC, the percent of words that are used only once is highest for Driving Lesson (66%) followed closely by High-Voltage (65%). Country Well-Known scored a little lower at 63%. Dervish Lions came in substantially lower, at 54%.

Which means what?

My guess is that this ratio will tend to drop as the piece of text gets longer. (I would try it on Drumheart, but it might take a long time to process such a big piece of text.) The longer a document is, the more likely any given word is to be repeated at least once. Dervish Lions is 68 pages (including title page, contents, etc.) That makes it twice as long as Country and High-Voltage, both at 34 pages. Also, both Country and High-Voltage also feature formal poetry with a high degree of repetition-- villanelles, sestinas, and pantoums-- which would drive the percent used only once down. So it makes sense that Driving Lesson would have the highest percent used only once, out of these texts.

CC scored Country substantially higher for "difficult" words. The other three texts scored 26 - 27% difficult words: Country scored 30%. Alas, CC did not tell me which words were considered difficult.

This still doesn't tell me anything about how my work compares to other poets' work. Unless I'm willing to retype an entire chapbook (to say nothing of a book!) and dump it into Character Count just out of curiosity...

Books Available
Country Well-Known as an Old Nightmare's Stable
High-Voltage Lines
Knocking from Inside

No comments: