top of page

Twitter Data Analysis And Natural Language Processing

shilpsgohil

Understanding the public perception around topics climate change, global warming, pollution and Greta Thunberg.



There are millions of tweets created everyday from across the entire world, in many different languages. While Twitter is far from a comprehensive record of public conversation, it can help provide insight into popular trends and import cultural and political moments. Twitter is useful in that it can be used as a measure of public opinion or dissent on import political or social topics. In the past, Twitter data has been used to analyze political polarization, public opinion of world leaders and the spread of protest movements. Twitter analysis can also be used a tool for marketing or product analysis in order to collect mentions of the product and identify people talking positively about it, examine the size of the retweet network mentioning the product and deriving meaning through sentiment analysis of the tweets to understand the public perspective.


One of the most important social and political issue we face today is the gradual rise in average global temperatures and the consequent effect on climate change and global warming. Greta Thunberg, an iconic figure known for her activism against climate change and challenging world leaders to take immediate action for climate change mitigation is a key figure for conversations around climate change.


Therefore, in order to download tweets from the Twitter API, I requested a Twitter Developer Account. The keywords I used to download tweets around my topic of interest were:


  • Climate Change

  • Global Warming

  • Pollution

  • Greta Thunberg


I intended these keywords to capture the essence of my topic of interest without overcrowding the analysis. Due to limitations in computing power, I downloaded and analyzed 1000 tweets pertaining to the aforementioned keywords.



I choose "Greta Thunberg" as part of my keywords as she is a central figure is the fight against climate change and I wanted to understand how often she is mentioned in the tweets relative to the other keywords and the sentiment around her as she has been sometimes known to be a polarizing topic depending on whose perspective is taken. Therefore, in order to make the analysis more interesting I also included "Greta Thunberg" as a keyword to filter the downloaded tweets.


Count of keywords


Counting the number of times a word has appeared in the text of the tweets compared to other keywords in the text allows us to convert words into numbers. Furthermore, counting words allows us to tell how many times a company, product or hashtag is mentioned and hence give us an idea about whether there's more talk about one topic as compared to another.



From the chart above, the most mentioned keyword was climate change (28%) followed by pollution, global warming and finally Greta Thunberg. The most popular topic among our given keywords is therefore, climate change whereas the least talked about on the Twitter social media platform is Greta Thunberg.


Time series analysis of the tweets


After counting the number of tweet mentioning keywords or phrases, I wanted to understand how the mentions change over time. Tweets about companies, products, social and political issues vary by the day, hour, minutes and even to the second and this variation can be captured through graphing on a line graph.


For my particular analysis, I examined the variations in the tweets over 1 minute time periods. In order to perform this, I used the pandas function "resample" after converting the time column to an appropriate format (pd.to_datetime) and subsequently setting the index to the "created_at column.


The 1000 tweets I collected are over an approximately 22 minute time window. From the graph above we can see the variation in the keyword mentions (climate change, global warming, pollution and Greta Thunberg). Interestingly, we see that there is some overall synchrony between tweets pertaining to global warming and Greta Thunberg. There is also significantly more variation in regards to tweets pertaining to the topics of climate change and pollution.


This methodology is very useful in understanding the variation in mentions of a new market product launch or a changing political or social landscape (the Arab spring) as this gives real-time results in change.


Sentiment Analysis - Understanding Reactions To Topics


After having detecting the presence of keywords in tweets and plotting their relative prevalence across time, we can understand and derive meaning from the text through sentiment analysis. Sentiment analysis is a type of natural language processing method that determines whether a word, sentence, paragraph or a document is positive or negative. Sentiment analysis is useful in gauging reactions to a company, product, political, policy or social issue. In order to perform sentiment analysis, I used the VADER package from NLTK.


The idea behind sentiment analysis is that we count the words which are positive or negative as a proportion of the words in the rest of the document.

Four score values obtained from sentiment analysis:

  • Negative: Provides the negative sentiment value

  • Positive: Provides the positive sentiment value

  • Neutral: Measures words that do not contribute to the sentiment

  • Compound: A combination of the positive and negative (an overall assessment that ranges between negative 1 and positive 1.

A compound score below 0 is negative, and above 0 is positive.


Some examples of tweets and their corresponding sentiment analysis values (positive, negative, neutral and compound scores):


Example 1:

The tweet above is considered to have an overall positive sentiment due to having a compound score of 0.7414.


Example 2:

This tweet is considered to have relatively negative sentiment as we can see that the compound score is -0.4588.


Example 3:

The overall sentiment is slightly more negative than the tweet above as indicated by the low compound score of -0.4939.


Overall sentiment analysis


In order to understand the overall sentiment associated with each of my keywords, I separated the tweets based on their topic i.e. climate change, global warming, pollution and Greta Thunberg. I performed sentiment analysis on each of the topics and took the mean value of the compound score in order to give me an overall sentiment score for each of the topics.



From the gauge charts above, we see that the overall climate change sentiment and overall global warming sentiment is quite similar (+0.069 and +0.058, respectively) and slightly positive. This is an indication that most tweets pertaining to these topics (climate change and global warming) are slightly positive. Values closer to +1 would indicate higher positive sentiment.


The gauge charts for overall pollution sentiment and Greta Thunberg sentiment are negative thus indicative of more negative sentiment when discussing issues associated with these topics.


Visualizing all the neutral, negative & positive sentiments per keyword.

From the bar plot above we see that climate change tweets along with pollution tweets have a high percentage of tweets that are of neutral sentiment but also have a fair share of negative sentiment as indicated by the red segment of the bars. Greta Thunberg has overall slightly more positive sentiment than negative sentiment but overall fewer tweets in general associated with that topic. Global warming also has fewer tweets in general however most tweets pertaining to that subject matter are either negative sentiment or neutral sentiment.


Word cloud to show overall subject matter of tweets related to keywords


Word clouds are an effective visualization that contain words that vary in size and colour - the largest words suggest a higher frequency or usage. Word clouds are used frequent in marketing, for example, when faced with 1,000 responses to a questionnaire, word clouds can be utilized to instantly summarize words and phrases that appear most frequently.


Further investigation showed that Steven Donziger is a central figure in the fight against climate change. He is an American attorney known for his legal battles with Chevron. He initially represented over 30,000 farmers and indigenous people in Ecuador in a case against Chevron related to environmental damage and health effects caused by oil drilling. The Ecuadorian courts awarded the plaintiffs $9.5 billion in damages, which led Chevron to withdraw its assets from Ecuador and launch legal action against Donziger in the US. In 2011, Chevron filed a RICO suit against Donziger in New York City.


Isabella Kaminski is also freelance environmental journalist who is very vocal against climate justice, environmental policy and nature.


On May 26th, the day the tweets were downloaded marked Chevron's 11th shareholder meeting since the company lost a historic $9.5 billion judgement for deliberate Ecuadorian Amazon. Hence, many people were tweeting about this event and as such this is reflected in the word cloud that summarizes the 1,000 tweets that were downloaded on May 26th, pertaining to "climate change", "global warming", "pollution" and "Greta Thunberg".


Some of the headlines in the media about this topic read as such:


"The 16 billion gallons of toxic pollution Chevron admitted to deliberately dumping remains in the Amazon rainforest, continuing to leech poison into rivers and streams every single day. During its annual general meeting (AGM), there is no doubt that Chevron CEO Mike Wirth will once again try to brush aside shareholders calling for justice in Ecuador, by hiding behind a deceptive 2014 RICO decision that Chevron orchestrated."

"The Chevron-Ecuador Case Is Critical to the Climate Justice Movement."

For more information about this please see the following link.


Some limitations and future improvements to consider


  • Due to limitations in computing power, I choose to only download and analyze 1,000 tweets. However, having more tweets and hence more data would have provided even better results and insights.

  • The most insightful results are obtained from Twitter analysis when tweets are downloaded during known events or when anticipating known events. For example, during political rallies (the George Floyd protest) or release of new products (Apple events).

  • The Twitter analysis is more useful in the marketing context when analyzing whether a newly released product is well received or evaluating the general interest in a product through product mentions or obtaining a general feedback or improvements based on public tweets about a certain product. For example, Peloton's release of new stationary bikes.



Comments


bottom of page