Analysis Types | Social Media

Social media analysis involves finding information in data gathered from social channels like Twitter, Facebook, Instagram, LinkedIn, Reddit amongst many others.

Revered in the data science community for the immeasurable volume of data published every second, social media channels are thought to be windows into public opinion on a topic. Studying social media data allows businesses to understand consumer opinion, track brand awareness and expose new opportunities for growth.

Our Social Media analysis types take the heavy lifting out of analyzing social media data. We've built context-specific algorithms for community network analysis, topic detection and finding influencers. You can also use Graphext to extract key language features like sentiment, entities and part of speech tags from social media data.

‍

‍

Graphext PRO users can use our custom-built tool - Tractor - to gather information from popular digitals channels in an analysis friendly format. All without writing a line of code or making any API integrations. Find out more.

‍

What is Social Media Analysis?

Social media analysis is constantly evolving. It's more than just measuring followers, retweets or finding trends in geographic regions. Data from social media is now used to enrich decision making in boardrooms across the world and in industries of all shapes and sizes.

NLP (Natural Language Processing) is a big part of social media analysis today and is a research field that covers any kind of computer manipulation of speech, text or other forms of natural language. Because a great deal of human activity on platforms like Twitter, LinkedIn and Facebook comes in the form of written language - data science techniques like topic detection, sentiment analysis and part-of-speech tagging can be used to derive insights from things people post.

‍

"I use social media as an idea generator, trend mapper and strategic compass for all of our online business ventures."

- Paul Barron | CEO, Foodable Network

‍

As well as understanding what people are talking about, social media data can now be used to map the communities behind trending conversations. Businesses can find the influencers driving conversations as well as monitoring how key messages are moving across geographies. Community network analysis is used to understand why and how information is spread by people online and can offer essential insights for marketing and PR teams.

Businesses will also often segment data collected from social media platforms in order to analyze the demographics of communities surrounding an online conversation. Finding detailed information about sub-communities can be an especially useful way to understand the specific demographics of people engaging with topics.

‍

Using Social Media Analysis Types

The technology behind the analysis types in Social Media has been built by our team or integrated with open-source machine learning projects. There are a number of different analysis types you can choose from here including topic detection and community network analysis. Whilst we support analysis of data gathered from all popular platforms, Graphext offers a number of Twitter-specific analysis types.

Inside each Social Media analysis type, Graphext will ask you to fill in a series of questions. These questions involve connecting key columns in your data to our pre-built scripts, choosing the language features you want to extract and setting the language of the posts that you will be analyzing.

Then, executing your project will start the analysis. When your project has been built, you'll see the output variables in the data.

‍

Types of Social Media Analysis in Graphext

Topics

Social Media | Topics projects use NLP to detect the main themes of text posted on social media platforms. Built using machine learning technology that can extract the significant terms from your text values, topic projects will group all of your data according to the similarity of the thematic content of your chosen text field.

When setting up a Topics project, Graphext will automatically answer the questions needed to start your analysis. Editing this configuration involves customizing the keywords, hashtags, entities or other language features you want to extract from social posts as well as choosing the way that Graphext recognises the language of the posts in your data.

‍

‍

What is Social Media Topic Analysis?

A huge volume of the content posted on social media comes in the form of written language. Topics projects make sense of this largely unstructured data by grouping social posts that reference similar ideas. In this way, Topic analysis of social media data allows analysts to quickly grasp the main themes of conversations taking place on digital platforms.

Topic detection analysis in Graphext works by embedding - or vectorizing - text in your dataset. These vectors represent your text as lists of numbers that can be used to link words. The closest vectors are linked to one another forming the basis of your clusters.

You can control the number of links each row in your data should have using the setup wizard. Using the links between text, Graphext assigns each row a position on your network visualisation - Graph. This results in the representation of groups of closely related text.

Next, we use our Louvain algorithm to create clusters using the links between text values. These get added to a separate variable in your project Clusters, in which each cluster is labelled according to the most common significant terms in all of the text belonging to it.

‍

How to Build Social Media Topic Analysis Projects?

It's easy to build a Topic analysis of social media data in Graphext. All the NLP and heavy lifting takes place behind the scenes. Graphext will automatically complete the setup of Topic analysis projects using intelligent recognition of the correct values and language in your social media data.

Nonetheless, you can edit this default configuration if you wish to customize which language features are extracted.

‍

Data Extraction

The first thing we need to pay attention to when setting up a Social Media | Topics project is the Data Extraction tab. This tab allows you to specify or infer the language of the text you want to analyze as well as extracting key information from it.

All of the information that you extract will be added as a new categorical variable to your project's dataset.

Setting the language of your text is a crucial part of starting your NLP analysis because all language models are trained using language-specific datasets. If you are confident that all of your text is in one language - specify this in the project setup wizard. You can also choose to use a column that contains the language of each text value or make use of a pre-trained model that can recognise the language of social posts in your dataset.

‍

‍

Text

The Text tab lets you further customize the setup of your Social Media | Topics projects. Here you can choose between the algorithm used to transform your data.

Choosing between speed and precision changes the algorithm used to transform your data. For the best results choose precision, which will deploy a Hugging Face transformer on social posts in your data.

‍

‍

Automatic Insights

Choosing to generate automatic insights in your Social Media Topic analysis projects instructs Graphext to prepare a few insight cards in your Insights panel drawing attention to key patterns in your analysis.

These include insights about the clusters detected in your data as well as insights about the key language features that Graphext was able to extract. Automatic insights can be a very useful way to kick start your analysis and point out interesting areas requiring further investigation.

‍

‍

What to Expect from Social Media Topic Analysis Projects?

Opening up a Topics project, the first thing you'll see is your Graph - network visualisation - where rows in your data are grouped together according to the thematic similarity of the text you analyzed. The clusters variable represents the topics that Graphext was able to detect in your social posts.

Each cluster is labelled using the key significant terms that define text inside the cluster. Click on the clusters and inspect the significant terms variable to check this yourself!

Depending on the information that you choose to extract, Graphext will have added new variables with this data to the data inside your project. Search for them using the search bar in your right sidebar.

‍

‍

Keywords

Keywords projects are focused on mapping the relationships between important terms posted by members of the public on social media. The underlying technology behind Keywords analysis will extract keywords from text in your data, measure the strength of association between each keyword and plot these relationships in a network visualisation, where each node in the network represents one keyword.

The key variables that emerge in Keywords projects are; Keyword - Cluster - Count. Keyword contains the keywords extracted from social media posts, Count measures how many times they appear and Cluster groups keywords together according to their associations.

‍

‍

What is Social Media Keyword Analysis?

Keywords analysis is a form of natural language processing that is intended to reduce your dataset to a collection of related keywords. These keywords are determined by Graphext to be the significant terms within the text you are analyzing and are grouped together to form clusters of closely related terms.

These projects work by firstly extracting the significant terms from the text in each row of your data. Then - according to the threshold that you set - Graphext will neglect keywords that don't appear at least n. times - the default value is 3.

When setting up your Social Media Keyword analysis project, you can choose whether to perform a similarity analysis or a co-occurrence analysis. The underlying algorithms behind this choice will transform your data in different ways.

Choosing Similarity Analysis will instruct Graphext to embed - or vectorize - this collection of keywords in such a way that they can be linked according to their semantic similarity.

Choosing Co-occurrence Analysis will instruct Graphext to embed - or vectorize - this collection of keywords in such a way that links are calculated based on keywords that often occur together.

Using the links between keyword vectors, Graphext assigns each keyword a position on your network visualisation - Graph. This results in the representation of connectivity between keywords with similar meanings.

Next, we use our Louvain algorithm to create clusters using the links between keywords. These get added to a separate variable in your project Clusters, in which each cluster is labelled according to the keywords contained within it.

‍

How to Build Social Media Keyword Analysis Projects?

Keyword analysis projects are quick and easy to deploy. When you build a Social Media Keywords project, Graphext will automatically set up your analysis. Nonetheless, you can edit the default setup and customize the way language is recognised in your data and set the number of times a keyword must appear to take it into account.

It's also crucial to choose whether you want to study semantic similarity or co-occurrence. You can configure this inside the Analysis tab of the project setup wizard.

Because your data is transformed to represent one keyword for each row - these projects are not suitable for other forms of text analysis like part of speech tagging, sentiment and entity recognition. Instead, you should choose Keyword analysis only if you want to study the important words in your data.

‍

Data Extraction

Inside of your project setup wizard for Social Media Keywords analysis projects, you'll find the Data Extraction tab. This tab allows you to set the text column you want to analyze and choose how to configure the language of social media posts in your data.

Setting the language of your content is an important part of starting your NLP analysis because all language models are trained using language-specific datasets. If you are confident that all of your social media data is in one language - specify this in the project setup wizard. You can also choose to use a column that contains the language of each text value or make use of a pre-trained model that can recognise the language of text.

‍

‍

Analysis

The Analysis tab lets you further customize the setup of your Social Media Keywords projects. Here you can choose whether to conduct a similarity analysis or a co-occurrence analysis and set a threshold for the appearance of keywords in your text.

Setting a threshold for the appearance of keywords in your data is a crucial part of controlling what your final project will look like. Any keyword that appears less than this threshold won't be included in your final project. If you are working with a large collection of social media posts - bigger thresholds are recommended to avoid noise in your collection of keywords.

‍

Similarity analysis groups keywords according to their semantic similarity whereas co-occurrence analysis will group keywords that occur frequently together.

‍

Choosing between speed and precision changes the algorithm used to transform your data. For the best results choose precision, which will deploy a Hugging Face transformer on your data. This choice is only available for similarity analysis projects.

You can also choose between analyzing keywords or hashtags. Choosing hashtags will mean that your dataset will be reduced to a collection of hashtags as opposed to keywords.

‍

‍

What to Expect from Social Media Keyword Analysis Projects?

Opening up a Keywords project, the first thing you'll see is your Graph - network visualisation - where each node in the network represents one keyword. These are the significant terms that Graphext extracted from social media posts in your data. Find and interact with the Keywords variable in your sidebar charts to inspect the keywords in more detail.

Your data will now include only 3 variables; Count - Keyword - Cluster. This is because Graphext reduced your data to a collection of keywords.

Each keyword is linked to others. You can inspect which keywords are linked to one another by clicking on one in the Graph and choosing Select Neighbours.

Graphext uses the Count variable to apply size mapping to nodes in your Graph. Larger nodes appear more frequently in your text than smaller ones.

Your Cluster variable groups related keywords. The labels assigned to each cluster represent the keywords in that cluster.

‍

‍

Community Connections

Community Connections projects are focused on mapping online communities. They use the interactions between Twitter authors to build a network of people engaging with one another.

Some of the key variables that emerge in this kind of analysis are; Degree - Cluster - Interactions per Tweet. Degree counts the number of connections a Twitter author has in the dataset. Cluster groups tweets according to the connections between the authors that interact with them. Interactions per Tweet measures the average number of times a tweet was interacted with.

You can use any kind of Twitter dataset to build these projects and although they aggregate your data by author, you can still explore a post's language features - sentiment, nouns, entities.

‍

What is Community Connection Analysis?

Community Connection projects are a form of network analysis that enable us to study the people involved in specific online conversations. Interactions between authors in a community are automatically defined as an engagement in the form of a mention, favorite or retweet. For this reason, Community Connection projects must be built using Twitter datasets.

These projects work by associating an author with the other authors in the dataset that engaged with them. In this way, links are created between the author of a tweet and the people that are interacting with it.

These links determine the position of nodes on your Graph making it possible to explore who is connected to who. Selecting the neighbours of a node will highlight all authors that interacted with a specific author. Filtering the dataset using the Cluster variable will highlight sub-communities of authors that interact with one another.

Clusters are created using our Louvain algorithm to group data according to the links between authors and the other authors interacting with them. You can adjust the size of sub-communities by configuring the strength of these clusters.

‍

How to Build Community Connection Projects?

Community Connections projects are quick to set up - particularly if you are using a dataset collected with Tractor. The most crucial part of the analysis is matching your dataset with the variables Graphext requires to map connections. But generally, this will be automatically configured.

You can use the Data Enrichment tab to add useful variables to your project such as analyzing the sentiment of tweets or classifying them using predefined categories.

Interactions are an important part of Community Connections projects and define how links are created. You can change how to define an interaction inside of the Tweet Data Extraction tab. Inside this tab, you can also select which language features of the post you'd like to extract.

‍

Tweet Text

The Tweet Text tab allows you to set the text column you want to analyze and choose how to configure the language of social media posts in your data.

Setting the language of your content is an important part of NLP analysis because all language models are trained using language-specific datasets. If you are confident that all of your social media data is in one language - specify this in the project setup wizard. You can also choose to use a column that contains the language of each text value or make use of a pre-trained model that can recognise the language of text.

Tweet Tweet Author

The purpose of the Tweet Author tab is to match your dataset with the variables Graphext needs to recognize authors and mentions. If you are working with a dataset from Tractor, Graphext will be able to complete the Tweet Author tab automatically.

The kind of variables you need to match here include; Author Bio - Verified - Number of Followers.

‍

‍

Tweet Interactions: Mentions and Retweets

Similar to the Tweet Author tab, the Tweet Interactions: Mentions and Retweets tab is a space to match your dataset with the variables Graphext needs to recognize mentions and retweets. If you are working with a dataset from Tractor, Graphext will be able to complete this tab automatically.

The variables mentioned inside this tab are key to creating the links between authors in your dataset. The kind of variables you need to match here include; List of Links - List of Mentioned Usernames - List of Mentioned User IDs.

‍

‍

Tweet Info

Tweet Info contains the final pieces of information required to match your dataset with the variables Graphext needs. If you are working with a dataset from Tractor, Graphext will be able to complete this tab automatically.

The variables mentioned inside this tab refer to metadata surrounding individual tweets. The kind of variables you need to match here include; Tweet Favourites - Tweet Date - Tweet ID.

‍

‍

Tweet Data Extraction

The Tweet Data Extraction tab lets you customize the information and language features that you want to extract from tweets in the dataset. You can use our built-in NLP algorithms to extract entities, significant terms and keywords from the post as well as creating new variables containing nouns, adjectives and other parts of speech.

Another important aspect of this tab is choosing how to define an interaction. This is automatically set to consider retweets and favorites as interactions but there are a number of options for you to pick from.

‍

‍

What to Expect from Community Connections Projects?

Opening up a Community Connections project, the first thing you'll see is your Graph - network visualisation - where each node in the network represents one tweet author. The original dataset you used to build the project will be aggregated by tweet authors but you can still explore the individual features of tweets using the variable sidebar charts.

Each tweet author is linked to others according to the interactions between them. Often, you will see a ring of nodes around the centre of your network. Generally, this represents tweet authors without a connection to other authors.

The centre of your network is where you will find people with the most interactions. These authors are key to the community in your dataset and often help to drive a conversation. Selecting an author and choosing Select Neighbours will highlight the other authors directly connected to that author.

Graphext uses the Degree variable to apply size mapping to nodes in your Graph. Larger nodes have more connections than smaller ones.

Your Cluster variable groups related authors. This is a good place to start inspecting who belongs to different sub-communities in your dataset.

‍

‍

Key Community Members

A Key Community Members project uses data from Twitter to analyze the people driving digital conversations. This analysis type is well suited to exposing influencers as well as identifying the demographics of people involved in specific conversations.

Some of the key variables that emerge in this kind of analysis are; Degree - Cluster - Engagement Rate. Degree counts the number of connections a Twitter author has in the dataset. Cluster groups authors according to the similarities between them. Engagement Rate calculates the average number of interactions an author receives divided by the number of followers they have.

You can use any kind of Twitter dataset to build these projects and although they focus on the authors behind tweets, you can still explore a post's language features - sentiment, nouns, entities.

‍

‍

What is Key Community Members Analysis?

Key Community Members projects are a form of social media network analysis that enable us to study influencers and the people driving conversations on Twitter. This type of analysis focuses on grouping the people that participate in these conversations. It highlights their influence using key metrics such as the number of connections or interactions that each author has received within the context of the dataset. For this reason, Key Community Members projects must be built using Twitter datasets.

‍

Graphext PRO users can collect data directly from Twitter using Tractor. Learn more here.

‍

In a Key Community Members analysis, Graphext will transform your original dataset, aggregating it so that each row (and node) in your project's dataset represents just one author. Authors are then clustered together based on their similarities.

Similarity is defined using all of the variables in your Twitter dataset. That means that data on the author's bio, retweet and favourite interactions and whether they are verified or not are all used as factors to cluster your dataset.

Links are then created to connect similar authors together inside of your project's Graph (network visualisation). Selecting the neighbours of a node in your Graph will use the links to highlight other authors most similar to a specific author.

You can also use the Cluster variable to filter your data and explore the features of groups of similar authors. Reconfigure the strength of clusters to inspect smaller or larger communities of Twitter authors.

‍

How to Build Key Community Members Projects?

Key Community Members projects are quick to set up - particularly if you are using a dataset collected with Tractor. It's important to match your dataset with the variables Graphext requires to calculate the similarity between authors. But generally, this will be automatically configured.

Use the Data Enrichment tab to add useful variables to your project such as detecting the emotion of tweets, performing sentiment analysis or classifying them using predefined categories.

You can configure the parameters of your project's clustering model inside the advanced settings dropdown. This involves setting a scaling ratio and a gravity measure for your network.

‍

Tweet Text

The Tweet Text tab allows you to set the text column you want to analyze and choose how to configure the language of social media posts in your data.

Setting the language of your content is an important part of NLP analysis because all language models are trained using language-specific datasets. If you are confident that all of your social media data is in one language - specify this in the project setup wizard. You can also choose to use a column that contains the language of each text value or make use of a pre-trained model that can recognise the language of text.

‍

‍

Tweet Author

The purpose of the Tweet Author tab is to match your dataset with the variables Graphext needs to recognize details about the authors of tweets in your data. If you are working with a dataset from Tractor, Graphext will be able to complete the Tweet Author tab automatically.

The kind of variables you need to match here include; Author Location - Following Count - Author ID.

‍

‍

Tweet Interactions: Mentions and Retweets

Similar to the Tweet Author tab, the Tweet Interactions: Mentions and Retweets tab is a space to match your dataset with the variables Graphext needs to recognize mentions and retweets. If you are working with a dataset from Tractor, Graphext will be able to complete this tab automatically.

The kind of variables you need to match here include; List of Links - List of Mentioned Usernames - List of Mentioned User IDs.

‍

‍

Tweet Info

Tweet Info relates to information on the Twitter posts in your data and contains the final pieces of information required to match your dataset with the variables Graphext needs. If you are working with a dataset from Tractor, Graphext will be able to complete this tab automatically.

Because this information is about posts and not authors, Graphext will aggregate the number of tweets, favourites and other post specific data by author. The kind of variables you need to match here include; Tweet Favourites - Tweet Date - Tweet ID.

‍

‍

Tweet Data Extraction

The Tweet Data Extraction tab lets you customize the information and language features that you want to extract from tweets in the dataset. This is an important part of any social media analysis and lets you use our built-in NLP algorithms to extract entities, significant terms and keywords from the post.

You can also create new variables containing nouns, adjectives and other parts of speech.

‍

‍

What to Expect from Key Community Members Projects?

Opening up a Key Community Members project, the first thing you'll see is your Graph - network visualisation - where each node in the network represents one tweet author. The original dataset you used to build the project will be aggregated by tweet authors but you can still explore the individual features of tweets using the variable sidebar charts.

Each tweet author is linked to others according to the similarity between them. This similarity is calculated using all possible features of your original dataset including the number of tweets posted by an author, the number of followers they have as well as the features of their posts.

Sizes of nodes in your network visualisation (Graph) will automatically be configured using the Degree variable, which shows the number of connections each author has. Larger nodes represent authors with more connections than smaller ones.

Degree and Engagement Rate are both very useful variables to immediately start spotting influencers but if there are extreme values (outliers) here - then select a more suitable value range using the interactive chart and zoom in on this range.

Your Cluster variable groups related authors. This is a good place to start inspecting who belongs to different sub-communities in your dataset and which features they share with one another.

‍

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.