Data Art Project: 2016 U.S. Presidential Candidate Campaign Kickoff Speeches

(all the files I used for this project, including text files of the speeches, can be found here on GitHub)

After making collages for political speeches from American history, I had the itch to apply the same concept to current events. The idea here is to see if looking at the context of frequently-used words in modern-day speeches can give us a new angle on a candidate’s message, without wasting time sifting through rhetoric and fluff—a “Cliff’s Notes” version, but with a new twist.

For this project, I made a collage for each of the 2016 US Presidential candidates’ campaign kickoff speeches.

From this angle, we can look at the similarities and differences between candidates in their approach to the same prompt: a speech to declare their candidacy and rally citizens behind their cause. By assuming that the words used most often in a speech are significant, we can look closely at what each candidate chooses to highlight in their speeches, and how it differentiates each of them from their competition.

Can you quickly tell what separates the candidates? What is their message or unique value proposition? Do these images confirm what you already believed, or did you come away with a different impression of the candidate than you had previously? Do the candidates actually say anything substantial with their speeches, or is the rhetoric hopelessly vague and a giant waste of time?

There isn’t any (conscious) political agenda behind this—I don’t particularly like any of the candidates, though I appreciate the (relative) honesty of folks like Bernie Sanders and Rand Paul, and I’ve been critical of Hillary Clinton and Jeb Bush on this blog before. But in general, I think that we should force our politicians to be more precise in how they talk to us, in how they talk to each other, and in how they decide and justify policy. We live in the era of big data where our government collects and analyzes massive amounts of data on its citizens—it’s only fair that we turn the tables back onto them by creatively using data, math, and analytical methods to hold them accountable for what they say and do during their time in office.

Anyway, here is the distribution of speech lengths, measured in words (before filtering out stopwords):

histogram

Here are the top ten most common words across all speeches after filtering out low-content stopwords. This is measured as an average to standardize for the fact that some speeches are longer than others. For example, after filtering out stopwords, “america” was, on average, 1.02% of the remaining words in each candidate’s speech.

top10words

And here is the collection of collages (click on them to make them larger):

Ted Cruz

Ted Cruz

Rand Paul

Rand Paul

Mike Huckabee

Mike Huckabee

Martin O'Malley

Martin O’Malley

Marco Rubio

Marco Rubio

Lindsey Graham

Lindsey Graham

Jeb Bush

Jeb Bush

Hillary Clinton

Hillary Clinton

George Pataki

George Pataki

Donald Trump

Donald Trump

Bernie Sanders

Bernie Sanders

There’s probably room for more mathematical analysis here, as opposed to my visualization-heavy exploratory approach. But I do think the collages have the advantage of being interpretable to folks who might not understand the math-heavy world of machine learning and word2vec models.

2 thoughts on “Data Art Project: 2016 U.S. Presidential Candidate Campaign Kickoff Speeches

  1. Very nice post. I just stumbled upon your blog and wished to say that I’ve really enjoyed
    browsing your blog posts. In any case I’ll be subscribing to your rss feed and
    I hope you write again very soon!

  2. saad says:

    Nice article.

    I thought you will find this link interesting.:

    http://www.campaign2016speeches.com/tag/presidential-campaign-announcement/

    These are all videos of campaign announcements. The videos have ‘Table of Content’ with them which allows comparing content of speeches across candidates.

    Let us know if we can help you in any way.

    best

Leave a Reply

Your email address will not be published. Required fields are marked *