What is sentiment analysis in R?
Overview
We use sentiment analysis to examine the opinions of a given text. With sentiment analysis, we can check the sentiment of the author of a text. Big companies like Twitter and Facebook use it to check their tweets or status for hate speech. The algorithm identifies the sentiment by analyzing patterns of words in different lines of text.
The algorithm checks the words against a set of positive and negative words. Using this algorithm, we can also check the magnitude of these sentiments, that is, how positive or negative these words are.
How to do sentiment analysis
The algorithm picks up words and performs its computation. The most commonly used data for sentiment analysis comes from tweets. These tweets contain words and punctuation. Since punctuation has no fundamental importance, we must remove all punctuation and special characters from the data before analysis.
We use the tm package to clean the text.
library(tm)
Syntax
We use the following command to create a vector of tweets for preprocessing.
tweets <- iconv(data)
We use the following commands to remove all the unnecessary data and clean it.
tweets <- tm_map(tweets, tolower) # converts the dataset to lower casetweets <- tm_map(tweets, removePunctuation) # removes punctuationstweets <- tm_map(tweets, removeNumbers) # removes numbers
Libraries
- We use the
syuzhetpackage to classify emotions and their relative scores. It has an in-built classification algorithm for analyzing emotions.
Code
library(syuzhet)#Reading the Tweet datadata <- read.csv("file_path.csv", header = T)tweet_lines <- iconv(data$Sentence)#Calculating the scores using syuzhet libraryscores <- get_nrc_sentiment(tweet_lines)#Plotting the scores in the form of a bar plotbarplot(colSums(scores),las = 2, ylab = 'Count',main = 'Sentiment Analysis of Tweets')
Explanation
We use the code above to find out the sentiments of the tweets.
- Lines 1: We import the relevant packages.
- Line 4: We read the data from
file_paththat is the path to the tweets. - Line 5: We convert the
.csvdata into a vector of tweets. - Line 8: We use the
syuzhetto find the sentiments. They range from anger to positive. - Line 11: We convert the score data found in line 8 into a bar plot.
Conclusion
We learned how to preprocess the data to make it suitable for sentiment analysis. We also learned about assigning numeric scores to a sentiment, which lets us know the strength of the emotion.
Free Resources