“To some people R is just the 18th letter of the alphabet. To others, it’s the rating on racy movies, a measure of an attic’s insulation or what pirates in movies say.” (The New York Times)
R is also a programming language intended for deep statistical analysis. It is open source and available across different platforms, e.g., Windows, Mac, Linux. It is now used in a variety of applications including visualizations and data mining.
This series will walk through a few basic examples showing how you can use R to extract and visualize Twitter data. In part one we’ll create an app to extract data from Twitter. Then we’ll dive into real world examples with word clouds in part two and graphing positive and negative tweets in part three.
- You have already installed R and are using RStudio.
- In order to extract tweets, you will need a Twitter application and hence a Twitter account. If you don’t have a Twitter account, please sign up.
- Use your Twitter login ID and password to sign in at Twitter Developers.
1. Steps to Create a Twitter Application
a. Navigate to My Applications in the upper right hand corner.
b. Navigate to My Applications in the upper right hand corner.
c. Create a new application.
d. Fill out the new app form. Names should be unique, i.e., no one else should have used this name for their Twitter app. Give a brief description of the app. You can change this later on if needed. Enter your website or blog address. Callback URL can be left blank. Once you’ve done this, make sure you’ve read the “Developer Rules Of The Road” blurb, check the “Yes, I agree” box, fill in the CAPTCHA and click the “Create Your Twitter Application” button.
e. Scroll down and click on “Create my access token” button.
f. Note the values of consumer key and consumer secret and keep them handy for future use. You should keep these secret. If anyone was to get these keys, they could effectively access your Twitter account.
2. Install and Load R Packages
R comes with a standard set of packages. A number of other packages are available for download and installation. For the purpose of this post, we will need the following packages:
– ROAuth: Provides an interface to the OAuth 1.0 specification, allowing users to authenticate via OAuth to the server of their choice.
– Twitter: Provides an interface to the Twitter web API.
Let’s start by installing and loading all the required packages.
install.packages("twitteR") install.packages("ROAuth") library("twitteR") library("ROAuth")
3. Create and Store Twitter Authenticated Credential Object
If you are a Windows user, you need to get “cacert.pem” file. Download the “cacert.pem” file from the specified URL and store it in your working directory. Then create an object “cred” that will save the authenticated object for later sessions and initiate the handshake. This is where you will enter the consumerKey and consumerSecret from the first step. Once the handshake is complete it will direct you to a hyperlink in the console window.
# Download "cacert.pem" file download.file(url="http://curl.haxx.se/ca/cacert.pem",destfile="cacert.pem") #create an object "cred" that will save the authenticated object that we can use for later sessions cred <- OAuthFactory$new(consumerKey='XXXXXXXXXXXXXXXXXX', consumerSecret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', requestURL='https://api.twitter.com/oauth/request_token', accessURL='https://api.twitter.com/oauth/access_token', authURL='https://api.twitter.com/oauth/authorize') # Executing the next step generates an output --> To enable the connection, please direct your web browser to: <hyperlink> . Note: You only need to do this part once cred$handshake(cainfo="cacert.pem")
Navigate to the specified link to authorize app and click “Authorize App”.
Note the pin number generated.
In RStudio, type in the pin number. Save the object “cred” on your local machine as “twitter authentication.Rdata.”
#save for later use for Windows save(cred, file="twitter authentication.Rdata")
4. Extract Tweets
Load “twitter authentication.Rdata” file in your session and run registerTwitterOAuth. This should return “TRUE” indicating that all is good and we can proceed. Then we set two variables, one for the search string, which could be a hashtag or user mention, and the second variable is the number of tweets we want to extract for analysis. Use searchTwitter to search Twitter based on the supplied search string and return a list. The “lang” parameter is used below to restrict tweets to the “English” language.
load("twitter authentication.Rdata") registerTwitterOAuth(cred) search.string <- "#nba" no.of.tweets <- 100 tweets <- searchTwitter(search.string, n=no.of.tweets, cainfo="cacert.pem", lang="en") tweets
Extracting tweets from Twitter can be useful, but when coupled with visualizations it becomes that much more powerful. In the next two blog posts in this series we’ll walk through how to create a word cloud based on a Twitter extract and how to analyze tweets for positive and negative sentiments.