A year and a half ago I Tried To Virtually Stalk Mark Zuckerberg. It was a failed attempted and instead I analyzed comments on one of Mark’s posts.
Few days ago I came across a repo/archive of Donald Trump’s Tweets and I thought it would be interesting to run a similar analysis on this data.
My methodology remained the same from the original post. In summary, I took all of the Tweets for 2017, then for every Tweet I compared it with every other Tweet, giving them a score based on how many words they have in common, divided by the total length of two Tweets. Then I imported the resulting data into Gephi open source data graph visualization tool. Finally, I organized the data using “Forced Atlas 2” algorithm.
I tried to follow an advise from one of the comments and used cosine similarity instead of my simple approach between two Tweets, but it did not work nearly as well. It’s possible that I did something wrong, but in the end I went back to my original method.
You can find all of the source code used including both experiments in Git history here.
Note that Tweets are simplified, to help with the analysis, but they are still very readable.
Here are the top 30 words that Trump used in 2017:
Click here to see the full network. Depending on your screen size, you may need to zoom in a bit to make the text readable. You can also click on the image bellow.
You can also download it as a huger PNG for offline viewing.