My wife and I were watching Jack Ryan, where John Krasinski is playing an FBI analyst. I wanted to be like him, so I began to look for things to analyze.
In the past I’ve done text analysis for comments left on Mark Zuckerbergs Facebook post, which I described in I Tried To Virtually Stalk Mark Zuckerberg. I used similar technique to do a Network Analysis of Donald Trump’s Tweets for 2017 (which did not turn out to be that exciting at all). I decided to try my luck one more time and analyze “Russian troll” tweets, to see if I can find anything interesting.
I looked for a source and Google pointed me to this post: Twitter deleted 200,000 Russian troll tweets. Read them here.
I describe my methodology in my other posts, but here is an overview of what I did.
First I compare every tweet to every other tweet to see how similar they were. Turns out 200,000 multiplied by 200,000 is a pretty large number. In the past I was able to use just my laptop. This time I had to rent a few powerful computers from Amazon Web Services to get all the computation done. I slightly under-estimated the cost, but few days and $80 later, I had my scores… I was able to identify around four thousand tweets out of original 200,000 that were closely related to one another, indicating that I may be able to group them further and get some insights.
I loaded my findings into a special Network Analysis tool called Gephi and performed cluster analysis on my results. This is a fancy way of saying that I was able to further group the four thousand tweets, into a thousand closely related smaller groups. The largest group was mostly noise, and the smallest groups only had two tweets in them. I limited my results to first 105 largest groups (dropping the largest noise group). Resulting 105 groups ranged in size from 56 to 6 tweets per group. Larger groups are still a bit noisy, but things get better as groups get smaller.
Finally, I went through every group and found a tweet that was most connected to all other tweets in the same group. This leader tweet provides a good insight into what this particular group is about.
Below are my results, which can also be seen in Google Drive via this link.
A cluster ID from “results” tab can be used to find all other tweets in the same cluster, listed under the “clusters” tab.
To be honest, I was hoping to see more diversity in the outcomes. However, assuming that NBC News did not selectively released only some of the tweets, the data seems to show that the campaign was very much pro-Trump, with very few messages deviating from that line.
There were some that stood out though. For example there was a group of tweets, that seemed to praise Angela Merkel in German…
Merkel ist die mchtigste Frau der Welt #Merkelmussbleiben #girlstalkselfies
Which translates to:
Merkel is the most powerful woman in the world #Merkelmussbleiben #girlstalkselfies
This is not what I would expect from a pro-Trump group…
There were also a good number of Tweets promoting (spamming) various products. Such as:
RT @jayceodpromoter: indie artist The best #soundcloud promotion for just $5.50 click here to start https://t.co/nzlO08sMFg #buy now… htt
RT @EsquirePromo: Forget unfollowers, I believe in growing. 12 new followers in the last day! Stats via https://t.co/hIJ9ZyBNk0
Not really sure what to make out of this… Were the “trolls” also trying to get a product off the ground?
In any case, this was a fun, time-consuming, and a little costly experiment 🙂
If anybody is interesting in any more details or data, please let me know in the comment area bellow. I would be happy to share more info.