Twitter is a treasure chest. There are 330 million of us using Twitter each month to share our thoughts, appraisals and outcries with the world. The vast corpus of wild data this produces each day is useful to data scientists in myriad ways. From training sentiment classification models to understanding public opinion about businesses and individuals, Twitter data can open up research opportunities for many different purposes.
So how do we go about collecting it? At Graphext, we developed Tractor, a desktop application to scrape data from popular platforms including Facebook and Twitter without using code. Since this is a tool available only for Graphext PRO users, we decided to create this guide, alongside a Google Collab notebook helping everyone to access all kinds of data from Twitter.
"Because words have deep meaning, tweets have power."
- Germany Kent
To collect data from Twitter, you need to have keys to the Twitter API. This guide will walk through the process of accessing the keys required to authenticate your requests for data from Twitter. Once you have a consumer key, a consumer secret, an access token and an access token secret you can skip straight to the notebook. Simply enter a query using the Twitter query language and watch the tweets roll in. Finally, our notebook will save your data as a CSV file.
API documentation is generally horrible stuff. Stay close and follow the steps below!
To access the Twitter API, you need a Twitter account and a Twitter developers account. Sign up for a developers account here.
Once your developer's account has been approved you can create an 'app' giving you access to the keys required to start making requests. Head over to the 'Developer Portal' to start creating an app.
Once you've created an app you have all of the keys you need to retrieve data from Twitter. If you need to regenerate these keys, navigate to your app's 'Keys and tokens' window.
Access Token Secret
Now you've set up your app within your Twitter developer portal, you can use the keys provided to access data from Twitter. Our Google Collab notebook provides all of the code required to do this using a Python library called 'tweepy'. Simply add your key information into the relevant variable placeholders, set a search term using the Twitter query language and run the notebook to start collecting data.
Any text you save to the variable 'search_query' will be used to perform your request for data from Twitter. Tweets matched against this query will be retrieved and stored locally within the notebook before you download them as a CSV file.
Twitter's query language follows the same structure used to find tweets inside of the search box on Twitter. Use single quotation marks to set exact terms and use OR as well as AND to match your query to more than one term. You can also use the Twitter query language to find tweets belonging to specific users or users within a list.
Finally, use the 'since' and 'until'' parameters to set a date range for your query. The 'lang' parameter provides the ability to restrict your data to tweets posted in a specific language. Removing these parameters will remove any date or language criteria from your query.
Although our notebook is hosted with Google Collab, we have included a command at the bottom of the file to export your data as a CSV file to your local computer. Running the final command will open up a file explorer on your computer allowing you to save the data locally.
In order to run the notebook from your own Google Drive, make a copy of our notebook in your own drive. Once you've copied the file into your own Google Drive, running the code will save your data inside your own drive.
Alternatively you can download the notebook and run it locally on your computer using tools such as Jupyter Notebooks. Using this method will also save your retrieved data to your local folders.
After you've got your hands on the CSV file full of tweets from your search query, you can analyze this anyway you want including uploading it to Graphext. With Graphext you can run prediction models to understand the relationships between tweets, cluster them based on the similarity of their topics or identify key community members.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.