https://onezero.medium.com/how-data-science-pinpointed-the-creepiest-word-in-macbeth-3150995d3808
It’s not the word you’d expect — and it appears in this very sentence
Macbeth is a creepy play. https://www.academia.edu/8108283/The_language_of_Macbeth
Actors have long been superstitious about acting in it. That’s partly because performances have been riddled with accidents and fatalities; indeed, actors consider it bad luck to even utter the name of the play. (They call it “The Scottish Tragedy”.) And it’s partly because the basic substance of the plot is eldritch: You’ve got black magic, witches, a gore-flecked ghost and walking forests.
But fans of Macbeth often say its freaky qualities are deeper than just the plot devices and characters. For centuries, people been unsettled by the very language of the play.
Actors and critics have long remarked that when you read Macbeth out loud, it feels like your voice and mouth and brain are doing something ever so slightly wrong. There’s something subconsciously off about the sound of the play, and it spooks people. It’s as if Shakespeare somehow wove a tiny bit of creepiness into every single line. The literary scholar George Walton Williams described the “continuous sense of menace” and “horror” that pervades even seemingly innocuous scenes.
For centuries, Shakespeare fans and theater folk have wondered about this, but could never quite explain it.
Then a clever bit of data analysis in 2014 uncovered the reason. (The paper is here.)
It turns out that Macbeth uncanny flavor springs from the unusual way that Shakespeare deploys one particular word, over and over again.
That word?
“The.”
How could this be? How could the most common word in the English language (it appears three times in this very sentence) be responsible for the skin-crawling affect of Macbeth?
Initially, the scholars who did this analysis — Jonathan Hope and Michael Witmore — didn’t think to look at something as mundane as “the”.
They began by reading through scholarly work on the play, looking for various previous hypotheses. They pondered some of the more obvious weird things about how Shakespeare uses language in Macbeth, such as how witches speak in trochaic tetrameter (“Bubb-le | bubb-le, | toil and | trou-ble”). That rhythm is jarringly different from the train-chugging-along iambic pentameter of everyone else (“So fair | and foul | a day | I have | not seen.”). But the witches don’t appear onstage often enough for their odd prosody to contaminate the feel of the entire play.
Then Hope and Witmore moved on to another point that scholars historically made about Macbeth, which is that the play has a lot of repetition. The witches talk about battles “lost and won”, Duncan uses these precise words when he enters, too. When Lady Macbeth first greets Macbeth, she uses phrasings very similar to when the witches first greet Macbeth.
“Repetition,” Hope and Witmore concluded, “is a characteristic trope of the play.”
So that made them wonder: Maybe they should do a word-frequency analysis of Macbeth. Perhaps it’d show the recurrence of certain words that would help identify the source of floating menace.
So they did an analysis of the “log-likehood” of words in the play. “Log likelihood” is a metric of whether a word is used more or less often than normal. So they compared word-usage in Macbeth to Shakespeare’s overall writing. What words did he use in Macbeth more frequently than in his other plays?
The results are in the chart below. The higher the “Log likelihood” number, the more frequently Shakespeare used it in Macbeth …
Sure enough, you can see several words used with unusual frequency! This includes several we could justifiably call creepy — like “knock”, “cauldron”, “tryant”, “weird”, “trouble”, “dagger”, “fear”, and “horror”.
Cool. But the thing is, that still can’t really explain what’s going. Sure, Shakespeare used these words more often than normal — but they don’t occur so frequently that they’d change the entire mouthfeel of the play’s language. Nor is their overoccurrence terribly surprising. Of course “cauldron” would appear more often in a play with witches, and “thane” in a play about, well, a thane. “This result,” as Hope and Witmore noted drily, “is not very interesting … We hardly need computers and advanced statistics to tell us this.”
Ah, but then Hope and Witmore looked at the list again. And realized there was one word that was pretty odd to see …
“The” is pretty strange word to over-use!
How exactly did Shakespeare wind up using “the” so frequently? To figure out, Hope and Witmore began combing through the play, looking for uses of the word “the”.
They began to notice a pattern. Consider this example below; it’s Lady Macbeth speaking. The Macbeths are getting all jittery and nervous, and they’re startled by some noises in the night. Lady Macbeth explains the noises thusly …
Now, that’s a weird way to talk about that owl. Imagine you and I were walking through the woods and we suddenly heard a hoot. I’d probably say, “oh — it’s an owl!” An owl. Not the owl. If you say “the owl,” you’re referring to a specific owl that you, and everyone around, you is already familiar with.
By saying “it was the owl that shriek’d”, Lady Macbeth is — in a quite deliciously creepy way — implying that everyone already knows what owl she’s talking about.
It is a collusively strange way for a character to talk. And it makes us, the readers, feel slightly alienated from our own sense of ourselves, and our own knowledge of the world. (Man, maybe I do know that owl? What the hell is going on???) It’s very subtle effect, but it sends a little shiver down your spine.
Hope and Witmore have another, different way of looking at it: By saying “the owl”, Lady Macbeth makes the bird seem like “a generalised, mythical or proverbial owl … The owl becomes an idea, rather than a thing.”
This curious use of “the” is all over the play. Shakespeare just kept on doing it. Here’s Lady Macbeth again, when she’s counseling Macbeth on how to lie …
Same thing here with “serpent”! Normally you’d say, “be a serpent” — but “be the serpent” sounds so much more specific and freaky.
Here’s one last example, from when Macbeth is steeling himself to stab Duncan to death …
Again, interestingly awkward word choice! As Hope and Witmore note, you’d expect Macbeth to refer to “my hand” and “my eye”. By writing it as “the hand” and “the eye”, Shakespeare neatly evokes the way Macbeth is beginning to be tormented by his own decisions; he disassociates from his own body. In a few acts he’ll be a totally unravelled mess.
It is a very fun discovery about Macbeth. When you go back and reread the play, you now have a new type of x-ray vision, and you notice Shakespeare’s fascinating overuse of “the” everywhere.
It’s obviously not the only reason Macbeth is an unsettling play. Like all good art, it’s super complex and can’t be reduced to any single literary effect.
Still, this is one of my favorite examples of using data analysis to ponder literature. The field of the “digital humanities” — which often involves using data analysis to study books — can get a bad rap sometimes. I get it: Analyzing literature as a big old bag o’ words can risk glossing over the very things that make art art; worse, it can feel like yet another grim devaluing of the traditional humanities, a caustic symptom an academe trying to STEMify everything. Learn to code, you liberal-arts weenies!
But what’s so delightful about Hope and Witmore’s work is how it’s genuinely a cyborg, centaur piece of literary analysis. They started by pondering a phenomenon that has puzzled Shakespeare fans for centuries. They did some data analysis that pointed to the word “the”. But to figure out why “the” was so key, they had to go back and reread the play closely, engaging in a very rich line-by-line literary analysis. The computation existed as a set of fresh alien eyes, telling the humans where to direct their attention. But it was up to the humans to find the meaning.
It also has a nifty little lesson built into it. Consider it a retelling of “The Purloined Letter”: When you’re faced with a mystery, the answer might be something so common and obvious that it’s already in front of you — but you’ve stopped noticing it. You need to find a new way to look at the world around you.
Clive Thompson is a contributing writer for the New York Times Magazine, a columnist for Wired and Smithsonian magazines, and a regular contributor to Mother Jones. He’s the author of Coders: The Making of a New Tribe and the Remaking of the World, and Smarter Than You Think: How Technology is Changing our Minds for the Better. He’s @pomeranian99 on Twitter and Instagram.