(all the files I used for this project, including text files of the speeches, can be found here on GitHub)
After making collages for political speeches from American history, I had the itch to apply the same concept to current events. The idea here is to see if looking at the context of frequently-used words in modern-day speeches can give us a new angle on a candidate’s message, without wasting time sifting through rhetoric and fluff—a “Cliff’s Notes” version, but with a new twist.
For this project, I made a collage for each of the 2016 US Presidential candidates’ campaign kickoff speeches.
From this angle, we can look at the similarities and differences between candidates in their approach to the same prompt: a speech to declare their candidacy and rally citizens behind their cause. By assuming that the words used most often in a speech are significant, we can look closely at what each candidate chooses to highlight in their speeches, and how it differentiates each of them from their competition.
Can you quickly tell what separates the candidates? What is their message or unique value proposition? Do these images confirm what you already believed, or did you come away with a different impression of the candidate than you had previously? Do the candidates actually say anything substantial with their speeches, or is the rhetoric hopelessly vague and a giant waste of time?
There isn’t any (conscious) political agenda behind this—I don’t particularly like any of the candidates, though I appreciate the (relative) honesty of folks like Bernie Sanders and Rand Paul, and I’ve been critical of Hillary Clinton and Jeb Bush on this blog before. But in general, I think that we should force our politicians to be more precise in how they talk to us, in how they talk to each other, and in how they decide and justify policy. We live in the era of big data where our government collects and analyzes massive amounts of data on its citizens—it’s only fair that we turn the tables back onto them by creatively using data, math, and analytical methods to hold them accountable for what they say and do during their time in office.
Anyway, here is the distribution of speech lengths, measured in words (before filtering out stopwords):
Here are the top ten most common words across all speeches after filtering out low-content stopwords. This is measured as an average to standardize for the fact that some speeches are longer than others. For example, after filtering out stopwords, “america” was, on average, 1.02% of the remaining words in each candidate’s speech.
And here is the collection of collages (click on them to make them larger):
There’s probably room for more mathematical analysis here, as opposed to my visualization-heavy exploratory approach. But I do think the collages have the advantage of being interpretable to folks who might not understand the math-heavy world of machine learning and word2vec models.