Technical Docs | Text Analysis

Language Support

Updated

June 16, 2021

Advancements in natural language processing mean that data science is ever more capable of analyzing text written (or spoken) in the many different languages of the world. Text analysis is a powerful tool. Whenever you are working with text data it is important to analyze the content of text fields in the appropriate language. Not doing so can result in noisy or incorrect results.

‍

"A different language is a different vision of life."

- Federico Fellini

‍

Language Models

We use two types of language models at Graphext; spaCy models and Stanza models. Although the implementation varies between the two models, the method you use to incorporate them into your project does not. To add in any language model, use the Data Extraction tab inside your project setup wizard.

The language models we use that are provided by spaCy are fast and robust. We use these for common languages and our team of engineers and data scientists have built them into Graphext.

‍

Languages Supported by spaCy Models

English

Spanish

French

Portuguese

German

Italian

‍

Stanza models are slower and less well tested. These are used for less common languages but enable us to offer a wider range of language support on request. Because Stanza models are less well tested, when choosing to work with one, you will be asked to confirm whether you want to work with an experimental language.

‍

Languages Supported by Stanza Models

Arabic

Catalan

Basque

Turkish

‍

Don't see what you are looking for?

We are able to include more language models on request. Send us an email with your requirements and we'll get back to you.

‍

Incorporating Language Support

When you build a text based project in Graphext, you will be asked to specify which text fields you want to analyze alongside setting the language of these fields. Graphext supports the ability to infer language directly from the text itself. Alternatively, you can explicitly inform Graphext about the language of the text that you will analyze.

Language support is incorporated into your projects as you are setting them up.

‍

How to Incorporate Language Support?

Choose a dataset with at least one text field to start analyzing it.
From inside the project setup wizard, choose 'Text' or 'Social Media' as your analysis type.
Inside of the 'Data Extraction' tab, choose how you would like to set the language of your text.
You can set languages manually or by inferring it directly from the text itself.
That's it. Now execute your project.
Done ... Setting the language of text makes your analysis more accurate.

‍

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.

🍪 Would you care for a cookie?

🍪 Would you care for a cookie?

Graphext Crash Course

Intro to Projects

Intro to Teams

Intro to Compare

Intro to the Graph

Intro to Insights

Intro to Details

Intro to Trends

Loading Files

Exporting Data

Connecting Data Sources

Scraping with Tractor

Setting Dataset Info

Setting Variable Types

Sampling Data

Aggregating Data

Enriching Data

Building Recipes

Exporting Recipes

The Code Editor

Managing Variables

Filtering with Variables

The Graph

Compare Charts

Plot Charts

Models

Correlations

Capturing Insights

Editing Insights

Sharing Insights

Exporting Charts

Publishing Projects

Sharing Projects

Embedding Projects

Projects

Teams

Username and Password

Clustering Supermarket Transactions

Predicting Stroke Probability

Predicting Employee Behaviour

Creating Graphs and Layouts

UMAP and Data Types

Advanced Filter Queries