When the Social Feed Manager (SFM) team approached me to work with them, as a subject matter expert, on exploring the idea of creating curated research collections for Middle East Studies, I was intrigued: after all, the use of social media in the Arab Spring has been one of the biggest research topics in the field in the past decade, and its analysis would have been much easier using a tool like SFM. I was also, however, a bit apprehensive. I am actually quite a new Twitter user, and only started my own account when I had to for work. I still felt like I was feeling my way around who to follow, what a list was and how was it useful, and how to track what you’re looking for in such a vast field of information. But other members of the Global Resources Center had already been working with the team on projects like collecting tweets on Okinawa and anti-corruption campaign accounts in China, so I decided to give it a try!
Just as I got trained up on how to use SFM, however, a few convenient, bounded projects presented themselves: the Iranian Presidential Elections (May 19th) and the 50th anniversary of the Six-Day War (June 5th-10th). I started to search through Twitter for activity on these events, and encountered my first surprise: while there was an incredible amount of activity on the elections, the anniversary had almost no chatter on Twitter whatsoever. I had initially assumed that this was because the election was happening the very week I started collecting, while the anniversary was several weeks away. But now that the anniversary has passed, it’s clear that it went fairly quietly on Twitter, which is an interesting phenomenon in itself.
I was seeing a great deal of chatter about the elections on my own Twitter Feed, so I started marking hashtags that seemed relevant. I ended up following those that showed up the most: the names of the candidates in both Persian and Latin script (and variations in transliteration: for example, the candidate قالیباف was sometimes transliterated as Ghalibaf, sometimes as Qalibaf), the generic hashtags for Iranian Elections 2017 (or the Iranian year 1396), and several others that seemed to come up again and again that were used by the opposition party in the run-up to the election. As such, the tweets we collected may be skewed toward the opposition in number, but the opposition generally tends to be louder than the establishment in any venue, so this did not concern me that much. The actual hashtags we tracked were: IranElections2017, IranElections, IranElection, انتخابات_96, انتخابات۹۶, MyVoteRegimeChange, Rouhani, روحانی, Raisi, رييسي, Ghalibaf, Qalibaf, قالیباف, HassanRouhani, Rouhani.
Over the course of five days (May 18th-23rd), SFM grabbed 558,799 tweets. I was actually shocked at the sheer amount we collected, and in taking a closer look at the tweets, they did indeed all seem relevant, and had a great deal of variety of opinions, languages, and points of view. But a few days later, I realized that people were still talking about the elections on my Twitter feed, so we decided to try to do one more bulk collection to collect the post-mortem conversations and analysis. Here we encountered our first technical issue: in creating a search collection instead of a filter collection, we could not mix Persian and English search terms. We had to split them up into two separate collections, which was slightly inconvenient but not a huge obstacle. What we didn’t realize at the time, however, was that the search went back further than we expected. When I saw that my single post-mortem search, using only the Persian hashtags, had over 100,000 more tweets than our filter had collected over the whole weekend, using both English and Persian hashtags, I was stunned and confused! But after analyzing the data, it became clear that there was a lot of duplication: the search had tweets as much as a week old, which was actually earlier than when we had started our filter. Plus, it had continued for an extra two days after our filter ended. We are currently analyzing how many tweets were actually duplicated, but this is important to keep in mind if we want to actually use this data for research.
- Events may be important in the real world, but that importance may not be reflected on Twitter. Whether it is because most chatter was happening on other social media platforms, or whether the tough talk of last year was put on hold because of current political realities, the fiftieth anniversary of the Six-Day War was not well-represented on the platform. The Iranian Elections, however, showed a great deal of activity.
- For political events on Twitter, as in other mediums, the loudest voices/most frequent posters will have the most representation, and the most easily-collected hashtags.
- When searching for terms in Arabic script languages and English, filters work smoothly, but searches have to be run separately for each script. I am not sure whether the reason for this is the right-to-left nature of Arabic script, the separate characters/alphabet, or whether it is specific to Arabic and Persian.
- Searches can go as far back as a week, so deduplication needs to be done before they can be really useful if you attempt a post-mortem on an event soon after it ends.
I am playing around on an SFM test server with the idea of tracking the word انتخابات, which means “elections” in both Arabic and Persian, to see whether that will give me cohesive collection of elections in the Arabic and Persian-speaking world, or if it will be too messy. In the meantime, I hope the collection of Iranian election tweets could be useful for researchers – let us know if you’d like to use them!
Note: GW researchers should contact the SFM team for access to the dataset. For other researchers, we’re happy to publicly post the tweet ids (in accordance with Twitter policy).