What Is Sentiment Analysis?
Sentiment Analysis is a concept in Deep learning, where a machine is trained to analyze sentimental information from a given text, For example, if you have a blogging website like our’s and you have to analyze whether your blog is loved by your users or not (we should guess comments are in thousands) then you could do that with the help of Sentiment Analysis of those comments. We shall see the working of Sentiment Analysis and later we have to dig in some code to see the sentiment analysis of tweets done by twitter users.
To get more insight towards sentiment analysis, let us see some tweets by the twitter user and analyze on our own:-
We took some tweets from amazon’s post
Reply 1 is clearly a negative tweet for a company while Reply 2 is kind of very positive, but amazon had hundreds of comments on their post, how much time it could take us to analyze them on our own? A lot of time, right? In Sentiment Analysis of a given text, we tend to give a score to every text on the basis of the sentiment it is predicting, let us see how.
Working of Sentiment Analysis
The main question behind the intuition of sentiment analysis was that how could a machine predict sentiment behind the text, Answer was, by scoring each text with some numeral, every positive text is awarded a positive number (+1), every neutral text is awarded a neutral number (0) and every negative text is rewarded negative value (-1) by this we could easily count the number of positive, neutral and negative text and figure out the most common review or opinion given by users.
Sentiment Analysis Procedure:-
To give appropriate weights to the text or document we need a method, The best method you could apply to assign weight is Bag of Words, let us give you a quick tutorial for the bag of words.
We are making a model that not only works fine with the existing data but to the unseen data too, in order to do so, we have to split our data into training as well as testing data. Our next step would be cleaning stopwords in order to make our data more informative ( we only want relevant information to extract out features from a text) and at last, we have to provide vectors to each text so that it could analyze the given text.
In order to remove punctuation and stopwords, we have to identify all the punctuation and stopwords to eliminate them, we use tokenization to create a small token that consists of stopwords, punctuation, and useful words.
We are going to use a python library called NLTK to show you how tokenization works:
import nltk from nltk.tokenize import word_tokenize word_tokenize("My Email address is: [email protected]")
Out: ['My', 'Email', 'address', 'is', ':', 'taneshbalodi8', '@', 'gmail.com']
sent_tokenize("My name is tanesh balodi. I am a machine learning engineer.")
['My name is tanesh balodi.', 'I am a machine learning engineer.']
You may have to download ‘Punkt’ to run the code, use nltk.download(‘punkt’) to download the package. word_tokenize() is used to tokenize a word and sent_tokenize() is used to tokenize one sentence.
Removal of Stopwords and Punctuation
Once we have identified the stopwords and punctuation we need to remove them from our dataset, let us see through code:-
from nltk.corpus import stopwords stop_words = stopwords.words('english') from string import punctuation punct = list(punctuation) print(dataset['quote']) tokens = word_tokenize(dataset['quote']) len(tokens)
I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.
cleaned_tokens = [token for token in tokens if token not in stop_words and token not in punctuation] len(cleaned_tokens)
You might need to download the stopwords package, do it by running ntlk.download(‘stopwords’), As you may see in the above code, after cleaning the stopwords and punctuation that act as junk to our dataset and increase our running time is removed, and all that left is less than half of the document.
Stemming and lemmatization
Stemming and lemmatization are other normalization techniques used in natural language processing to remove the conjugation from words, as by removing the inflection we can reduce the data size and make cleaner data. For example:-
Let us see how to implement it via code:-
All we have to do now is to assign a vector to the words, the simplest way to assign weights or score to any word is to count the number of occurrences of that word as well as the frequency of the occurrence, more the counts or frequency, larger the score, and vice versa.
Now that we have known about the working of sentiment analysis, we should move forward to see some real-world example:-
Sentiment Analysis of Twitter tweets
There are two factors that determine the sentiment of a text or tweet i.e polarity and subjectivity.
Polarity -> If the polarity of a text is 1 or near 1, that means the text is fairly positive and if the polarity of a text is near 0 or in a negative number, that means the text is fairly negative.
Subjectivity -> If the subjectivity is near 1 or near 1, the given text is more subjective whereas if subjectivity is near 0, this indicates that the given text is less subjective.
Let us understand it via code example:-
TextBlob is a library that is used to check sentiments of a given text through polarity and subjectivity.
For the text “Your blogs are awesome, Keep the good work” we have a polarity of about 0.85 i.e near 1 which indicates that the text has positive sentiment, whereas the other text has a polarity of 0 which indicates a neutral or slight negative sentiment.
To see the polarity of live twitter tweets we are going to use the same library with the help of twitter API.
Our first step is to create an app with the help of twitter developer, just visit their website developer.twitter.com and add your details.
Go to your dashboard and click on create app
After creating an app, you will need to have key information to get further, go to your app and search for your keys:-
After clicking to the keys you will need an API key & secret as well as Access token and secret credentials:-
Now we have everything that we need to have in order to find the sentiment analysis of live tweets, now let’s quickly see the code.
Run this code in your colab notebook and see the output for yourself, we have taken the live tweets for Barack Obama, you could search for any person or entity you want, Remember, tweepy is a twitter library to authorize your credentials.
Sentiment Analysis has been used by the various analyst to analyze the user outcome, in reviewing a product, movie or a blog, sentiment analysis is in use, Though we have accomplished a lot in past few years on natural language processing, we know there are many challenges to be met further as NLP has proved to be a lot more useful. To know more about the natural language processing algorithm, stay connected with us.