Twitter is a treasure chest. There are 330 million of us using Twitter each month to share our thoughts, appraisals and outcries with the world. The vast corpus of wild data this produces each day is useful to data scientists in myriad ways. From training sentiment classification models to understanding public opinion about businesses and individuals, Twitter data can open up research opportunities for many different purposes.
So how do we go about collecting it? At Graphext, we developed Tractor, a desktop application to scrape data from popular platforms including Facebook and Twitter without using code. Since this is a tool available only for Graphext PRO users, we decided to create this guide, alongside a Google Collab notebook helping everyone to access all kinds of data from Twitter.
"Because words have deep meaning, tweets have power."
- Germany Kent
To collect data from Twitter, you need to have keys to the Twitter API. We created a guide to show you how to access these keys. If you haven't already read it or retrieved your keys, you'll need to start there. Once you have an API key, a API secret key, an Access token and an Access token secret you are ready to start collecting data from Twitter.
We will walk through the notebook step by step and explain what process we are using. In order to run the notebook yourself you will need to enter a query using the Twitter query language. Then, simply watch the tweets roll in and finally, our notebook will save your data as a CSV file.
Now you've set up your app within your Twitter developer portal, you can use the keys provided to access data from Twitter. Our Google Collab notebook provides all of the code required to do this using a Python library called 'tweepy'. Simply add your key information into the relevant variable placeholders, set a search term using the Twitter query language and run the notebook to start collecting data.
Any text you save to the variable 'search_query' will be used to perform your request for data from Twitter. Tweets matched against this query will be retrieved and stored locally within the notebook before you download them as a CSV file.
Twitter's query language follows the same structure used to find tweets inside of the search box on Twitter. Use single quotation marks to set exact terms and use OR as well as AND to match your query to more than one term. You can also use the Twitter query language to find tweets belonging to specific users or users within a list.
Finally, use the 'since' and 'until'' parameters to set a date range for your query. The 'lang' parameter provides the ability to restrict your data to tweets posted in a specific language. Removing these parameters will remove any date or language criteria from your query.
Although our notebook is hosted with Google Collab, we have included a command at the bottom of the file to export your data as a CSV file to your local computer. Running the final command will open up a file explorer on your computer allowing you to save the data locally.
In order to run the notebook from your own Google Drive, make a copy of our notebook in your own drive. Once you've copied the file into your own Google Drive, running the code will save your data inside your own drive.
Alternatively you can download the notebook and run it locally on your computer using tools such as Jupyter Notebooks. Using this method will also save your retrieved data to your local folders.
After you've got your hands on the CSV file full of tweets from your search query, you can analyze this anyway you want including uploading it to Graphext. With Graphext you can run prediction models to understand the relationships between tweets, cluster them based on the similarity of their topics or identify key community members.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.