Analysis Types | Text

Text analysis types in Graphext are all about analyzing language of all shapes and sizes. Many of the algorithms that lie behind Text analysis projects you build belong to the field of Natural Language Processing (NLP). NLP is a machine learning technique that is broadly defined as the automatic manipulation of speech and text by computer software.

We built our Text analysis types to give you quick and easy access to powerful NLP algorithms without having to use Python or R. Our analysis types can be used to detect the topics within text, analyze its sentiment, study keywords, extract language features and recognise named entities.

‍

‍

What is Natural Language Processing (NLP)?

The study of NLP has been around for more than 50 years and encapsulates a myriad of use cases - now as diverse and wide-ranging as the uses of language itself.

It is a research field that covers any kind of computer manipulation of speech, text or other forms of natural language. From tasks like counting word frequencies to translation to building the software behind Amazon's Alexa, natural language processing involves computational understanding of human utterance and the mechanics required to do so.

‍

"It is hard from the standpoint of the child, who must spend many years acquiring a language … it is hard for the adult language learner, it is hard for the scientist who attempts to model the relevant phenomena, and it is hard for the engineer who attempts to build systems that deal with natural language input or output.
These tasks are so hard that Turing could rightly make fluent conversation in natural language the centrepiece of his test for intelligence."


- Mathematical Linguistics | 2010

‍

The past 20 years have witnessed dramatic advancements in the use of statistical methods to analyze speech and text. Computational linguistics - rule-based modelling of language - have been woven into machine learning models allowing data scientists to derive insight from language much faster than humans can.

As these models have been developed they increased the accessibility of computational linguistics. The field is now a crucial part of many enterprise solutions and offers valuable intelligence to businesses across all industries.

‍

Using Text Analysis Types

The technology behind the analysis types in Text has been built by our team or integrated with open-source machine learning projects. Our idea here is to give you quick access to powerful NLP algorithms without having to write code.

Inside each Text analysis type, Graphext will ask you to fill in a series of questions. These questions involve connecting our pre-built scripts to your dataset, setting the language of text and choosing the features that you want to extract or analyze in greater detail.

Then, executing your project will start the analysis. When your project has been built, you'll see the output variables in the data.

‍

Types of Text Analysis in Graphext

Topics

Topics projects use NLP to detect the main themes of text in your data. Build using machine learning technology that can extract the significant terms from your text values, topic projects will group all of your data according to the similarity of the thematic content of your chosen text field.

When setting up a Topics project, Graphext will automatically answer the questions needed to start your analysis. Editing this configuration involves customizing the language features and keywords you want to extract from text as well as choosing the way that Graphext recognises the language of your text.

‍

‍

What is Topic Analysis?

Topic detection analysis in Graphext works by embedding - or vectorizing - text in your dataset. These vectors represent your text as lists of numbers that can be used to link words. The closest vectors are linked to one another forming the basis of your clusters.

You can control the number of links each row in your data should have using the setup wizard. Using the links between text, Graphext assigns each row a position on your network visualisation - Graph. This results in the representation of groups of closely related text.

Next, we use our Louvain algorithm to create clusters using the links between text values. These get added to a separate variable in your project Clusters, in which each cluster is labelled according to the most common significant terms in all of the text belonging to it.

‍

How to Build Topic Analysis Projects?

Topic analysis projects are powerful but easy to build in Graphext. All the NLP and heavy lifting takes place behind the scenes, leaving you a few simple questions to answer before starting your analysis.

Graphext will automatically complete the setup of Topic analysis projects using intelligent recognition of the correct values and language in your text content. Nonetheless, you can edit this if you wish to configure which language features get extracted.

‍

‍

Data Extraction

Inside of your project setup wizard for Topic analysis projects, you'll find the Data Extraction tab. This tab allows you to specify or infer the language of the text you want to analyze as well as extracting key information or language features from it.

All of the information or keywords that you extract will be added as a new categorical variable to your project's dataset.

Setting the language of your text is a crucial part of starting your NLP analysis because all language models are trained using language-specific datasets. If you are confident that all of your text is in one language - specify this in the project setup wizard. You can also choose to use a column that contains the language of each text value or make use of a pre-trained model that can recognise the language of text.

‍

‍

Text

The Text tab lets you further customize the setup of your Topics projects. Here you can choose between the algorithm used to transform your data and generate automatic insights in your project.

Choosing between speed and precision changes the algorithm used to transform your data. For the best results choose precision, which will deploy a Hugging Face transformer on your text values.

Automatic insights that you generate in Topics projects can be a really useful way to kickstart your analysis.

‍

‍

What to Expect from Topic Analysis Projects?

Opening up a Topics project, the first thing you'll see is your Graph - network visualisation - where rows in your data are grouped together according to the thematic similarity of the text you analyzed. The clusters variable represents the topics that Graphext was able to detect in your text.

Each cluster is labelled using the key significant terms that define text inside the cluster. Click on the clusters and inspect the significant terms variable to check this yourself!

Depending on the information and keywords that you choose to extract, Graphext will have added new variables with this data to the data inside your project. Search for them using the search bar in your right sidebar.

‍

‍

Use Case Guide | Analyzing Disneyland Reviews

This guide is intended to walk you through the process of analyzing customer reviews with Graphext. We will analyze a dataset of 42,656 reviews about 3 Disneyland branches. Looking to model the topics we will choose Text β†’ Topics as our analysis type and focus on analyzing the content of the reviews. We'll extract information from the reviews including significant terms, adjectives and nouns.

‍

Keywords

Keywords projects are focused on mapping the relationships between important terms in your data. The underlying technology behind Keywords analysis will extract keywords from text in your data, measure the strength of association between each keyword and plot these relationships in a network visualisation, where each node in the network represents one keyword.

The key variables that emerge in Keywords projects are; Keyword - Cluster - Count. Keyword contains important keywords in your text, Count measures how many times they appear and Cluster groups keywords together according to their associations.

‍

‍

What is Keyword Analysis?

Keywords analysis is a form of natural language processing that is intended to reduce your dataset to a collection of related keywords. These keywords are determined by Graphext to be the significant terms within the text you are analyzing and are grouped together to form clusters of closely related terms.

These projects work by firstly extracting the significant terms from the text in each row of your data. Then - according to the threshold that you set - Graphext will neglect keywords that don't appear at least n. times - the default value is 3.

Next, Graphext will embed - or vectorize - this collection of keywords. These vectors represent keywords as lists of numbers that can be used to link words according to their semantic similarity.

Using the links between keyword vectors, Graphext assigns each keyword a position on your network visualisation - Graph. This results in the representation of connectivity between keywords with similar meanings.

Next, we use our Louvain algorithm to create clusters using the links between keywords. These get added to a separate variable in your project Clusters, in which each cluster is labelled according to the keywords contained within it.

‍

‍

How to Build Keyword Analysis Projects?

Keyword analysis projects are quick and easy to deploy. When you build a Keywords project, Graphext will automatically set up your analysis. Nonetheless, you can edit the default setup and customize the way language is recognised in your data and set the number of times a keyword must appear to take it into account.

Because your data is transformed to represent one keyword for each row - these projects are not suitable for other forms of text analysis like part of speech tagging, sentiment and entity recognition. Instead, you should choose Keyword analysis only if you want to study the important words in your data.

‍

‍

Data Extraction

Inside of your project setup wizard for Keywords analysis projects, you'll find the Data Extraction tab. This tab allows you to set the text column you want to analyze and choose how to configure the language of text in your data.

Setting the language of your text is a crucial part of starting your NLP analysis because all language models are trained using language-specific datasets. If you are confident that all of your text is in one language - specify this in the project setup wizard. You can also choose to use a column that contains the language of each text value or make use of a pre-trained model that can recognise the language of text.

‍

‍

Analysis

The Analysis tab lets you further customize the setup of your Keywords projects. Here you can choose between the algorithm used to transform your data and set a threshold for the appearance of keywords in your text.

Setting a threshold for the appearance of keywords in your data is a crucial part of controlling what your final project will look like. Any keyword that appears less than this threshold won't be included in your final project. If you are working with a large dataset - bigger thresholds are recommended to avoid noise in your collection of keywords.

Choosing between speed and precision changes the algorithm used to transform your data. For the best results choose precision, which will deploy a Hugging Face transformer on your text values.

You can also choose between analyzing keywords or hashtags. Unless you are working with social media data - we'd recommend sticking with keywords.

‍

‍

What to Expect from Keyword Analysis Projects?

Opening up a Keywords project, the first thing you'll see is your Graph - network visualisation - where each node in the network represents one keyword. These are the significant terms that Graphext extracted from text in your data. Find and interact with the Keywords variable in your sidebar charts to inspect the keywords in more detail.

Your data will now include only 3 variables; Count - Keyword - Cluster. This is because Graphext reduced your data to a collection of keywords.

Each keyword is linked to others. You can inspect which keywords are linked to one another by clicking on one in the Graph and choosing Select Neighbours.

Graphext uses the Count variable to apply size mapping to nodes in your Graph. Larger nodes appear more frequently in your text than smaller ones.

Your Cluster variable groups related keywords. The labels assigned to each cluster represent the keywords in that cluster.

‍

‍

‍

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.