Virality on Social Media

Evolution of Tweet Diffusal Speed in US Presidential Debates 2016

Aman Abhishek

July 12, 2018

Dataset

All tweets from the first and third US Presidential Debate 2016 which exlcusively mention either Trump or Clinton is the dataset for this analysis1 ((clinton OR hillary) -(trump OR donald)) OR ((trump OR donald) -(clinton OR hillary)).

Further, I extracted all “retweet chains” that had a length of more than 200. That is, the original tweet was retweeted at least 200 times within the time span of the debate2 There were in total 326,512 retweet chains for debate 1 and 211,007 in debate 3. Only 99.4% and 99.3% of them, respectively, got retweeted more than 200 times.. This forms the subset of the data which is used subsequently.

Definitions

The tweets in each retweet chain were first ordered chronologically. To calculate how the speed of retweeting (within each retweet chain) evolved as the retweets spread across Twitter, I computed the time gap between the \(RT_{n}\) and \(RT_{n-50}\), where \(RT_{n}\) is the \(n^{th}\) retweet3 Say tweet \(T\) was created at \(t=0\), first retweet \(RT_{1}\) at \(t=1\), \(RT_2\) at \(t=4\), … \(RT_{51}\) at \(t=20\), \(RT_{52}\) at t=27. Then the speeds, measured in tweets per second, is: \(s_1=\frac{50}{20-1}=2.6\), \(s_2=\frac{50}{27-4}=2.1\) and so on.. In other words, the speed was calculated by looking at how long it took for sets of 50 consecutive retweets4 The smallest time resolution in the dataset is seconds, therefore there are problems with calculating speed when multiple retweets occur within a second - the speed becomes infinite. To resolve this, I add a random millisecond noise to all timestamps, which means that if the speed is being calculated between two retweets within a second, its value will be random but finite. In the dataset, the largest number of retweets within a second is about 100, which means a lot of random values for speed if I chose to calculate speed from consecutive retweets. I chose the number 50 to balance this, and this choice is somewhat arbitrary. If I choose a number larger than 50, errors would decrease but so would the resolution; choosing a smaller number would do the reverse. to occur.

Analysis

Below is a figure reflecting how the retweeting speed evolves as users retweet the original tweet.

It can be seen above that there are three exceptionally viral tweets which take off at very high speeds - about 80 retweets per second to up to 160 retweets per second. All the three tweets are from user(s) on the higher end of the follower count (light blue). In comparison, there is a dark blue line (representing a tweet from a user with lower number of followers) which is retweeted at a “slow” but steady speed, and getting 20,000 retweets in a span of 90 minutes.

Elite versus Ordinary Users

The underlying mechanism here seems to be the following. Imagine a network where there is one “elite” (central) node, connected to a lot of other “ordinary” nodes that are all identical. Each ordinary node has a certain probability with which it decides to retweet a tweet. When the elite tweets, the retweets would occur really fast at first, but then slow down and fall to zero as the probabilities of ordinary nodes not retweeting compounds. Katy Perry’s tweets’ retweets5 First debate tweets 1, 2, 3. Third debate tweets 1, 2, 3. reflect this behavior:

In a real-world network such as Twitter there are many elites, so if a tweet created by an ordinary user is retweeted by an elite, the aforementioned dynamic would occur. This would cause an otherwise falling speed profile to rise, and the rise would depend on how “elite” the node was that retweeted the original tweet.

Below is such a case where a tweet created by an ordinary user went viral. The user @danibucaro had 488 followers when she created the tweet which went viral and has got retweeted 90K times until today.

Clinton versus Trump

It is also possible that a group of users (or bots) organize in order to retweet a tweet rapidly. This would not be distinguishable from an elite-driven speed boost in the current analysis. For instance, it can be seen below that there are some unique patterns in Trump’s retweets that do not show in Clinton’s. At this point it is unclear what is causing the huge boosts in retweeting speed for Trump’s tweets once it is retweeted by around 1000 - 2500 users.

It can also be seen that Clinton’s tweets start with speeds that are about 4-5 times higher than Trump’s tweets, and she also has many more viral tweets in general (see plot on the right). However, this comparison cannot be made with confidence because the the dataset consists only of a subset of all the tweets made by the two candidates (where they explicitly mention either Trump or Clinton), as noted in the beginning.