What's New?

July 27, 2021

July 27, 2021

New Features

Graphext is now more powerful at text analysis. We've added support for the incredible range of NLP models at Hugging Face including intent detection and sentiment analysis. On top of this, we've built a new enrichment to group location values that are spelt differently

01. Support for Hugging Face Models

We've integrated Hugging Face models with Graphext. You can now build, train and deploy state of the art models for common NLP tasks including intent detection, sentiment analysis and token classification. Hugging Face also has models for translation, image classification and speech recognition.

Check the Hugging Face model documentation to browse the models you can now use in Graphext, check how to use them and try them out! We're so convinced about the usefulness of these models that we've updated the default embeddings in Text - Topics projects to use the Hugging Face SBERT transformer.

We'll soon be adding an easier way to deploy Hugging Face models on your text but for now - open up the code editor and paste in a code snippet from our docs.

How can I start using it?
  • Start building a project with data containing some text.
  • Open up the code editor.
  • Somewhere towards the end of the script - copy and paste the code snippet from our docs.
  • Replace the name of the model with the name of your chosen model.
  • Don't forget to add parameters if you need them!
  • Execute the project.

02. New Enrichment: Extract URL Components

When working with URLs in your data, it is often useful to extract new variables containing the domain, path and schema of the URL. Using this enrichment you can parse the URL values in your data and use the components of a URL to filter your data.

After you built a project that extracts URL components - look for the new variables in your data; path, domain, query, schema and more ...

Check the docs on enriching data to start extracting URL components.

How can I start using it?
  • Select Extract URL Components as an enrichment using the Data Enrichment tab during your project setup.
  • Tell Graphext which column contains the URL values in your data.
  • That's it. Open the project and look out for the new variables containing the components of your URL values.

03. New Enrichment: Standardize Locations

Variation in the way that people write and record location data can make for a messy analysis.

Similar to the way that our Group Similar Spellings enrichment works, standardizing location data means grouping variations that refer to the same place but are spelt differently.

For instance, without deploying this enrichment, 'Manchester' and 'Manchester, UK' would be considered as two separate places. Our enrichment has been designed to let you collect these two values and filter your data more accurately with locations.

Check the docs on enriching data to start standardizing locations.

How can I start using it?
  • Select Standardize Locations as an enrichment using the Data Enrichment tab during your project setup.
  • Tell Graphext which column contains the location values you want to group.
  • Set a threshold to control the strength of your merges.
  • That's it. Open the project and look out for the new merged variable.


Fixes and Improvements

- Fixed a bug preventing the saving of new team names.

- Fixed a bug causing quantitative filter ranges to jump unexpectedly.

- Fixed a bug allowing incorrectly formatted data sources to be referenced (not an URL).


Stories Worth Sharing

01. Using Mutual Information to Cluster Variables and Discover the Associations Between Survey Questions

Our team set out to build a type of analysis that could be used to measure the strength of association between variables in a dataset. Read more.

02. Mapping New York's Airbnb Listings

Our team set out to build a type of analysis that could be used to measure the strength of association between variables in a dataset. Read more.