Why are Tweets about the War in Russia and Ukraine relevant?
The war in Ukraine has affected the political climate, the economy and people's sense of security all over the world. This makes it relevant to explore the public opinion towards the Russia-Ukraine situation

Twitter is used by 229 million users daily and is thereby an extensive source to access insights about the public opinion of a trending matter. We exploit Twitter to access all tweets including hashtags and key words related to the Russian-Ukranian war as well as a variety of info about the tweets for the purposes of text and network analysis.
Easy access to huge amounts of data
Quick identification of recent trends
Data format makes it easy to visualise networks graphs
The Russian invasion of Ukraine has shook many countries and has severely affected economies, families and political relationships across the world. As an effect of Russia's endless war crimes, threat to Ukraine's independence and strained relationship to NATO, many countries have imposed sanctions on Russia resulting in price increases and a massive decrease in delivery of gas from Russia to the world. More importantly, more than 14.000 Ukranians have lost their lives and many more has become refugees in Europe.
Political relationships are changing
Sanctions against Russia affect economies worldwide
Ukranians are in need of help from the rest of the world
Breaking down Trends into
#Hashtags, Social Networks and Sentiments
#Hashtags
Has the usage of hashtags changed from February 21st 2022 to now?
Social Networks
Do strong communities within the different languages exist? Do the users gather in communities based on similar opinions?
Sentiments
Are there any shifts in sentiment and discourses from February 21st 2022 until now? Do people using languages express different views on the war?
The quick and dirty on the dataset
More than a million tweets, lots of different languages and 54 days of war

The tweets are downloaded based on a data set consisting of tweet IDs. The dataset is available at the github repo here, where the authors are also listed. The tweet ids from the data set was retrieved from Twitter using the Twitter streaming Api and include tweets from each day in the period 21-02-2022 to 15-04-2022.
The tweet ids have been collected based on a range of relevant keywords (russia, ukraine, putin, zelensky, kyiv, etc.). This list was continuously updated by the authors during the period of the data collection, as more keywords became relevant for the crisis.
Below is a complete list of the keywords as well as their introduction date. All keywords were translated to russian as well as ukranian
27th Feb: russia, ukraine, putin, zelensky, russian, ukranian, keiv, kyiv
1st Mar: kharkiv
3rd Mar: khorsan
4th Mar: zaporizhzhia, energodar
To make sure we had a uniform distribution of data during the whole period, we extracted 18.000 randomly chosen tweet ids from the github repo for each day
Due to the reduction in data from our side as well as the chosen keywords from the author of the dataset, the following analysis is not a complete reflection of sentiment surrounding the Ukraine/Russia war on Twitter. However, it is still a fair indication of it.
After retrieving the tweet ids, we use the Twitter Api v2 to collect the corresponding tweets from Twitter and saved them all in a .csv-file. Our dataset consists of 18.000 tweets for each of the 54 days, and the dataset include tweets from 651.363 users in total. As the majority of the users do not allow tracking of geolocation, we were not able to save the countries/locations of the users and tweets as first intended. Instead we tracked the language used in each tweet and in total we found 32 languages.
We saved the following 8 attributes for each tweet:
- text of tweet (“full_text”)
- hashtags (“entities.hashtags”)
- id of tweet (“id”)
- the name of the user of the tweet (“user.screen_name”)
- the user that the tweet is replying to (“in_reply_to_screen_name”)
- date of tweet upload (“created_at”)
- the language the tweet is written in (“lang”)
- the location of the tweet (if any) (“location”)
The full text and hashtags were saved to be used in our text and discourse analysis later on. We used the creation date and written language of the tweet to filter the tweets on time and language, respectively, when performing the text and network analysis, and the tweet id and parent author were used to generate several social network graphs during the analysis.
After extracting all tweets, we performed multiple steps of data cleaning before using the data for analysis. The following major steps were performed on all tweets in our data cleaning-process:
Translation to english
Lowercasing
Stemming
Stopwords Removal
Filtering out emojis and other non-alphanumeric content
Hashtags detection
Tokenization of tweet texts
Tweet Volume for English, Ukranian and Russian Tweets
It is easy to see that the volume of tweets written in english far succeeds the tweets written in ukranian and russian respectively. This is only expected as the english language has an overall higher volume of people speaking it. However, this leads to english hashtag trends and keywords possibly dominating in the analysis where we are not splitting by language.