What's New?

November 25, 2021

New Features

We've made significant improvements to Correlations! With the addition of relative mode, Correlation charts can be simpler & easier to read because they show the percentage distribution of values belonging to a variable.

Correlations: Relative Mode

Across Graphext, relative mode presents data as a proportional representation. In practise, this means you see data as a percentage distribution rather than an absolute count.

This is especially useful in Correlations charts because these use size and color to visualize the correlation between lots of values belonging to pairs of variables. With relative mode, the size and color range of bubbles in Correlations charts are restricted to a percentage distribution (either on the x or y axis). This makes it easier to spot patterns.

How can I start using it?

  • Open a project and head to the Correlations panel
  • Choose a variable to map your correlations charts or choose two to map specific correlation.
  • Using the Relative vs Absolute dropdown - switch to Relative mode.
  • Notice how the size and color of your bubbles adjust to present a percentage distribution.

Fixes and Improvements

Stories Worth Sharing

November 4, 2021

New Features

Sentiment analysis in Graphext just became much more powerful with our new enrichment - integrated with an industry-leading model from Cardiff NLP & hosted by Hugging Face.

We've also added detailed documentation about our analysis types on our website and in the app!

01. New Enrichment: Sentiment Analysis Integration with Cardiff NLP & Hugging Face

Our new sentiment analysis enrichment is built using an industry-leading model from the team at Cardiff NLP and hosted by Hugging Face. Sentiment analysis models predict whether text is positive, negative or neutral. Check out the documentation describing the mechanics of the model here.

Choose this enrichment using the Data Enrichment tab in your project setup wizard to start classifying the sentiment of news headlines, song lyrics, tweets and more text of all shapes and sizes.

How can I start using it?

- Open up a dataset that contains a text variable.

- Choose any type of analysis to perform.

- Inside your Data Enrichment tab, choose Analyze Text Sentiment - Cardiff NLP.

- Select the text variable you want to analyze the sentiment of.

- Complete your project setup.

- Once Graphext has built your project, open it up and explore your new Category of Sentiment (Cardiff NLP) variable.

02. Analysis Type Documentation

We've started adding documentation to help you make the most of our analysis types. You can find these in the app using the information icon inside the card for each analysis type or in the docs on our website.

We've written this to help you understand the best way to approach each of our analysis types. Expect walkthroughs, use case examples and exact directions on getting started.

How can I start using it?

- Choose a dataset to work with.

- Click on one of the information icons inside of your project setup wizard.

- Click the link at the bottom of the documentation to read more.

Fixes and Improvements

- Fixed a bug causing Graphext to freeze after a user saves a manual segmentation.

- Fixed a bug stopping Trends | Segmented Overview charts from presenting text variables.

- Fixed a design issue making it impossible to download datasets from Details when the dataset has an especially long name.

- Fixed an issue stopping relative mode from being available in published projects.

- Fixed a problem with the transfer of information between Insights and Compare

- Fixed a design issue with the alignment of chart legends inside Insights.

Stories Worth Sharing

01. An Introduction To NLP For Text Analysis | Data Academy

Our latest Data Academy instalment looks at how NLP can be used by businesses to analyze text. Starting from text analysis fundamentals and moving on to look at more complex recent developments in the field of NLP, this article is intended to introduce and equip business and data analysts with knowledge, techniques and tools to take forward in their text analysis projects. Read more.

02. Make or Break: After 5 Years Couples Are Less Likely To Break Up 

What's the most important milestone in a relationship? According to data from a Stanford study, it's a day like any other that occurs somewhere between the 4th and 5th anniversary of a relationship. We built a simple project with the intention of finding the moment in a relationship where breaking up is less likely than staying together. Read more.

October 6, 2021

New Features

You can now customize the values shown in Compare & Correlations charts. We've added a search bar to help you add important categories to these visualisations.

 We've also added a new color palette to your projects that uses a dynamic scale that updates depending on the number of values belonging to a variable.

01. Add Custom Categories in Compare & Correlations

Up until now, charts in Compare and Correlations presented only the most frequently occurring values from a variable. You can now choose which values to present in these charts using the search bar at the top right of each chart card.

Open up Compare or Correlations and choose a chart with hidden values. Click the search bar from the top of the chart and add in your new value.

How can I start using it?
  • Choose a Compare or Correlations chart featuring some hidden values.
  • Click the search icon at the top of your chart.
  • Start typing the name of the value you want to add into the chart.
  • Select the value and click the tick icon to add it into your chart.

02. New Dynamic Color Palette: RE

We've added a new color palette to your projects. Re is slightly different to Horus and Osiris in that it offers a dynamic scale of colors that will update depending on the number of values belonging to a variable. 

Re is particularly useful when exploring a small to medium range of categorical values. Its colors move from light blue through orange and red to purple on a scale that is calculated according to the number of values in a category.

How can I start using it?
  • Open up your project settings.
  • Navigate to the Appearance tab.
  • Choose Re from the dropdown list of color palettes.
  • Clicking Save will update the color palette in your project settings.

Fixes and Improvements

- When you upload a new dataset - Graphext will now be able to tell the difference between Categorical values vs Text values with greater precision.

- Your color palette will now be saved to your project settings. This means that closing then returning to the project will not affect your choice of color palette.

- We've improved the way URL variables are presented throughout your datasets and projects. URL variables will now be presented in the same way that categorical variables are presented.

Stories Worth Sharing

01. What is Exploratory Data Analysis?

For our first Data Academy release - we've gone back to basics with Exploratory Data Analysis. This article covers what cleaning, transforming and enriching data means as well as explaining why different visualisation types can be useful for studying different types of variable relationships. Read more.

September 15, 2021

New Features

You can now copy datasets and projects between Graphext teams. We've also made it easier to inspect text or quantitative values in greater detail with new text tooltips in data tables and the ability to save variables that capture zoom-ins on quantitative ranges.

We're also pretty excited to announce that you can now customize the thumbnails inside your project card - using uploads or new Graph captures.

01. Move + Copy Datasets | Copy Projects

You can now move or copy datasets across workspaces as well as making copies of key projects. Click the menu icon from your Graphext team workspace and choose 'Move to' or 'Make a copy in' to give other teams access to your data and analysis.

We've added this feature to make it easier for you to collaborate on and share important analyses that you create. Making changes in a copied project won't affect the state of your original project.

How can I start using it?
  • Find a dataset or project in one of your Graphext teams.
  • Click the 3 dots to bring up the menu options.
  • For datasets - choose either Move to or Make a copy in. Then, choose a team destination.
  • For projects - choose Make a copy in. Then, choose a team destination.
  • Click Accept and head over to your destination team to inspect your relocated resources.

02. Changing Project Thumbnails

You now can upload, regenerate or enlarge your project thumbnail images! Head to your project settings, click on the project image and choose how to set your new one!

The size of project thumbnails is set to optimal dimensions - meaning that any image you set is guaranteed to look snazzy!

How can I start using it?
  • Open your project info.
  • Click the current image associated with your project.
  • Choose to either upload, enlarge or regenerate your project thumbnail.
  • Save your changes and head to your workspace to check your changes.

03. Save Zoomed In Quantitative Ranges

Zooming in on specific value ranges isn't a new Graphext feature. But up until this point - any zoom-ins you make on quantitative variables will disappear as soon as you reload a project. Now ... they won't!

Zooming in on quantitative ranges helps you account for extreme values in your data. Zoom in on specific ranges to explore data distribution between two points.

How can I start using it?
  • Choose a quantitative variable in your project.
  • Set a filter range by clicking and dragging on the variable sidebar chart.
  • Click the 3 dots and choose Zoom In from the menu list.
  • That's it ... your new zoomed-in variable will be saved to your project.

04. Inspect Text with Tooltips

We've added tooltips to the table in your Details panel - helping you inspect the full content of text in your data. Hover over a text value to reveal its full content.

You can also copy the content of a text value by right-clicking on it and selecting Copy!

How can I start using it?
  • Head over to the Details panel of your project.
  • Find a text variable and hover over it.
  • Check out the full content of that value inside the tooltip.

05. Remove Any Variable

Now you can remove any variable from your project. Click the right menu next to the variable card in your project sidebar and choose Remove from the menu list.

Cleaning up your analysis is a useful habit to get into. Removing a variable from a project will delete any reference to it in all of your project panels.

Fixes and Improvements

Core Improvement

When filtering data in your projects, your sidebar charts will now jump to Relative mode by default. Relative mode means that data in your selection is shown in proportion to the distribution of values in your whole dataset.

- Added the ability to view and edit the project recipe from the project settings window.

- Removed automatic filtering on datasets of any size so that - by default - projects will be built using the full dataset.

- Fixed a bug stopping labels from appearing when users hover above nodes in the Graph.

- Fixed a bug stopping users from sending data to the trash from panels outside of the Graph.

- Fixed a bug causing mixed JSON data to crash on upload.

- Corrected a problem causing tagged variables to appear in the wrong variable collections inside Cluster projects.

Stories Worth Sharing

01. Sentiment Analysis & The Billboard Top 100: The Changing Mood of Popular Music

We used sentiment analysis to model 5100 Billboard chart-toppers between 1964 and 2015. Our analysis predicted whether song lyrics were positive, negative or neutral as well as detecting the topic and intent behind the most popular tunes in music history. Read more.

August 31, 2021

New Features

You can refresh and recreate Graphext projects created with data from Google Sheets or database integrations. On top of this, it's now simple to change the color of values from anywhere in your projects and you can switch color palettes inside of your project settings!

01. Refresh Data Integrations & Recreate Projects

We've added the ability to refresh and recreate projects built with integrated datasets from Google Sheets, SQL databases and more remotely hosted sources.

Find a project you've created with integrated data and choose to Refresh and Recreate the project. Graphext will then retrieve a new - up to date - dataset from your source and automatically create a new project using the data.

How can I start using it?
  • Make sure you've created a project using data that you've integrated with Graphext from Google Sheets or a remotely hosted database.
  • Find the project inside your Graphext workspace and click the 3 dots on the project card.
  • Choose 'Refresh and Recreate' from the menu list.
  • Graphext will recreate your project using up to date data from your integrated source.
  • Graphext will store a new dataset in your Datasets panel containing up to date data from your integrated source.

02. Changing Colors Across Graphext: Trends & Compare

Your Compare and Trends charts now support the full spectrum of variable colors. Not only this, but you can change the color of any categorical value across the interface.

Recently, we extended the number of automatically generated variable colors but - up until now - these weren't available in Compare or Trends charts. Now, you can see the full range of colors across all interface panels as well as changing these colors directly in either Compare or Trends charts.

How can I start using it?
  • Inside a project, open up Compare or Trends and add values into your charts.
  • Click on a color dot associated with a value in your charts.
  • Choose a new color using the color picker and click OK.

03. Switching Color Palette

You can now switch the color palettes used to represent data in your projects. Color is crucial to grouping and spotting connections between data. Head over to the Appearance tab inside your project settings to change color palettes.

Choose Horus for the standard Graphext color palette. Choose Osiris for a more vivid color palette. We'll be adding more color palettes to this list very soon!

How can I start using it?
  • Navigate to the Appearance tab inside your project settings.
  • Select a color palette from the dropdown list.
  • Click Save.
  • That's it. Check out the new colors representing data in your project.

04. Expand Charts in Compare and Correlations

You can now expand charts in Compare & Correlations. Because expanded charts are BIGGER, they let you inspect more values at the same time.

To expand Compare or Correlations charts, click on the 3 dots from the top right of your chart card and choose Expand chart. Insights that you save from expanded charts will also be bigger and contain more values than standard-sized charts.

How can I start using it?
  • Inside a project, generate charts in either Compare or Correlations.
  • Click on the 3 dots inside a chart card.
  • Choose Expand chart from the menu list.
  • Check out your chart in all of its glory!

Fixes and Improvements

Core Improvement

We've improved the way that data is presented inside Trends charts. You can now represent values in time-series charts using a Cumulative Sum - which works like a running total. Choosing Cumulative Sum - instead of a count or an average - means that the y-axis in your Trends charts can now represent the total sum of data as it grows over time.

- Added the ability for users to set up more than one SQL integration.

- Fixed an issue with Amazon S3 Data Integrations.

- Fixed a bug in the Text - Keywords analysis type.

- Fixed a bug with the Social Media - Analyze Author Bios analysis type.

- Fixed an issue with dataset names containing a long sequence of characters.

Stories Worth Sharing

01. A Beginners Guide to Market Segmentation

Market Segmentation means splitting your customer base into distinct communities based on the similarity of their features. This guide walks through the fundamental techniques, tools & types of market segmentation and shows you how to perform advanced market segmentation with Graphext. Read more.

02. A Guide to Clustering Supermarket Transactions

This guide is intended to walk you through the process of creating a clustering model to group your data. We'll build a project using a dataset of 1000 supermarket transactions from stores in Myanmar and expose the supermarket's most valuable market segment. Read More.

03. How to Study Brand Conversations with Advanced Text Analysis?

How can we use text analysis of data from Twitter to improve our understanding of markets? This is the question prompting Paul, a strategist in our business team, to scrape tweets about Lloyds bank and conduct a Twitter topic analysis using advanced NLP and network creation. Read More.

July 27, 2021

New Features

Graphext is now more powerful at text analysis. We've added support for the incredible range of NLP models at Hugging Face including intent detection and sentiment analysis. On top of this, we've built a new enrichment to group location values that are spelt differently

01. Support for Hugging Face Models

We've integrated Hugging Face models with Graphext. You can now build, train and deploy state of the art models for common NLP tasks including intent detection, sentiment analysis and token classification. Hugging Face also has models for translation, image classification and speech recognition.

Check the Hugging Face model documentation to browse the models you can now use in Graphext, check how to use them and try them out! We're so convinced about the usefulness of these models that we've updated the default embeddings in Text - Topics projects to use the Hugging Face SBERT transformer.

We'll soon be adding an easier way to deploy Hugging Face models on your text but for now - open up the code editor and paste in a code snippet from our docs.

How can I start using it?
  • Start building a project with data containing some text.
  • Open up the code editor.
  • Somewhere towards the end of the script - copy and paste the code snippet from our docs.
  • Replace the name of the model with the name of your chosen model.
  • Don't forget to add parameters if you need them!
  • Execute the project.

02. New Enrichment: Extract URL Components

When working with URLs in your data, it is often useful to extract new variables containing the domain, path and schema of the URL. Using this enrichment you can parse the URL values in your data and use the components of a URL to filter your data.

After you built a project that extracts URL components - look for the new variables in your data; path, domain, query, schema and more ...

Check the docs on enriching data to start extracting URL components.

How can I start using it?
  • Select Extract URL Components as an enrichment using the Data Enrichment tab during your project setup.
  • Tell Graphext which column contains the URL values in your data.
  • That's it. Open the project and look out for the new variables containing the components of your URL values.

03. New Enrichment: Standardize Locations

Variation in the way that people write and record location data can make for a messy analysis.

Similar to the way that our Group Similar Spellings enrichment works, standardizing location data means grouping variations that refer to the same place but are spelt differently.

For instance, without deploying this enrichment, 'Manchester' and 'Manchester, UK' would be considered as two separate places. Our enrichment has been designed to let you collect these two values and filter your data more accurately with locations.

Check the docs on enriching data to start standardizing locations.

How can I start using it?
  • Select Standardize Locations as an enrichment using the Data Enrichment tab during your project setup.
  • Tell Graphext which column contains the location values you want to group.
  • Set a threshold to control the strength of your merges.
  • That's it. Open the project and look out for the new merged variable.

Fixes and Improvements

- Fixed a bug preventing the saving of new team names.

- Fixed a bug causing quantitative filter ranges to jump unexpectedly.

- Fixed a bug allowing incorrectly formatted data sources to be referenced (not an URL).

Stories Worth Sharing

01. Using Mutual Information to Cluster Variables and Discover the Associations Between Survey Questions

Our team set out to build a type of analysis that could be used to measure the strength of association between variables in a dataset. Read more.

02. Mapping New York's Airbnb Listings

Our team set out to build a type of analysis that could be used to measure the strength of association between variables in a dataset. Read more.

June 30, 2021

New Features

It's been a colorful month ... we've added the ability to change the color of any categorical variable and extended the spectrum of colors automatically generated for your values. We've also added a new enrichment letting you fill missing values in your data!

01. Extended Variable Colors

We've increased the scale of our default color palette to include 30 colors!

On top of this - clicking to show more categorical values will add appropriate color to nodes in your Graph that would previously have been grey.

Color is a powerful analytical tool and lets you quickly identify the features of your data points inside visualisations. Up until now, we've used grey to color any categorical value beyond the 10 most frequently occurring.

We believe that more color means more clarity. Clicking to see more categorical values will extend the colors presented in your Graph.

How can I start using it?
  • Open a project with a categorical variable that contains many values.
  • Apply color mapping to this variable in your project's Graph.
  • Click to show more values.
  • Notice how your color palette extends inside the Graph and variable sidebar chart.

02. Changing Colors For Any Categorical Value

You can now change the color of any categorical value!

Although every value in your categorical variable will be automatically assigned a color - you can change these by editing the variable and selecting a new one using the color picker.

How can I start using it?
  • Open up a project and find a categorical variable.
  • Click the edit button at the top of your variable card.
  • Now, click the color picker icon.
  • Choose a new color and save your changes!

03. New Enrichment: Fill Missing Values

We've built an enrichment to fill missing values in your data. Missing values can be annoying, misleading and disruptive. Replacing them with specific values can help to clean up and prepare your dataset for analysis.

Choose Fill Missing Values from the data enrichment tab inside of the project setup wizard to start replacing missing values. Then, select a variable with missing values and tell Graphext how you would like to fill these values. You can choose from options like using a constant value, using the most or least frequently occurring value and using the column's minimum or maximum value. Look for the replaced variable in your transformed dataset.

If you'd prefer, you can always use a different enrichment to predict missing values!

How can I start using it?
  • Start building a project using a dataset with missing values.
  • Choose Fill Missing Values from the data enrichment tab inside your project setup wizard.
  • Tell Graphext which column contains your missing values.
  • Choose how you want to replace your missing values.
  • That's it. Look for the replaced variable in your transformed dataset.

04: Dataset Info: Sources & Descriptions

We've added space to describe your dataset and reference it's original source inside Graphext.

Context is always important but when dealing with data - it is essential. Referencing your data leaves behind a trail that other team members or researchers can trace to validate or continue your analysis.

To start describing and referencing a dataset, find it from inside of your team's Graphext workspace. Then open the dataset info menu using the 3 dots on the far right of your dataset card. Enter the source URL and write a description then click on the dataset to see this information listed above your data.

How can I start using it?
  • Find a dataset inside your Graphext workspace and select the three dots from the far right of the dataset card.
  • Choose Dataset Info from the menu list.
  • Enter a source URL and write a short description.
  • Save your changes.

Fixes and Improvements

- Added the ability to change the name of a team.

- Fixed an issue with info cards not appearing after clicking on a node in the Graph.

- Fixed a bug causing the creation of a new project to fail after moving some data to the trash.

- Added a menu button to instantly open a variable in the Correlations panel.

- Added a legend to list variables. A white circle indicates that white-coloured nodes refer to data belonging to more than one list category.

Stories Worth Sharing

01. Segmenting 1000 Supermarket Customers Using Sales Data

Our team clustered 1000 supermarket sales in order to segment customers according to their buying habits.

02. Graphext | Graphtex | Graphnext: Grouping Similar Spellings Using Chars2Vec and Agglomerative Clustering

'España' and 'Españha' are just spelling variations. We built a way of grouping words spelt differently but referring to the same concept and made it available alongside any type of analysis you perform with Graphext.

03. Getting Started Videos

We've created a collection of Getting Started videos to help guide you in using Graphext's interface panels and core features.

June 21, 2021

New Features

Our new Correlations panel lets you study the relationships between variables. Find it inside of any new Graphext project you create and start discovering the associations in your data.

What is Correlation?

Correlation is a statistical concept referring to the relationship between two variables. We can use correlation to understand whether observing a change in variable A will also mean observing a change in variable B.

Positive correlations refer to a relationship between two variables in which both variables move in the same direction. Negative correlations refer to a relationship between two variables in which an increase in one variable is associated with a decrease in the other.

“Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.”

- Randall Munroe

The Correlations Panel

Inside your project's Correlations panel you'll find a series of charts as you would inside the Compare panel. Choose a variable to study using the search bar and Graphext will generate charts showing the correlation between this variable and other variables in your data.

Use correlation charts to understand how the values of one variable are associated with the values of another. You can export charts from the Correlations panel or save them as insights.

Reading Correlation Charts

Charts in your Correlations panel reveal the number of data points where values from two variables meet. Your y-axis represents values from the variable in your search bar and the x-axis represents values from the correlated variable - labelled in the top right of each card.

The blue circles in your correlation charts represent the number of data points at each value intersection. Bigger and brighter circles represent a higher number of data points at an intersection whereas lower and duller circles represent fewer data points at an intersection.

A strong positive correlation would be signified by a trend of big & bright circles moving diagonally upwards from left to right 📈

A strong negative correlation would be signified by a trend of big & bright circles moving diagonally upwards from right to left 📉

The Docs

Correlation is a powerful tool but its key concepts aren't always self-explanatory. Here are a couple of articles to help you use and understand Correlations.

How To | Correlations

Start here. This article walks you around the new Correlations panel - pointing out the different features and showing you how to use them.

Technical Docs | Understanding Correlation

Read about the concepts key to understanding correlation. In this article, we explain how correlation works, the different types of correlation alongside pointing out how to measure degrees of correlation in Graphext.

How can I start using it?

  • Build a Graphext project.
  • Once your project is ready, navigate to the Correlations panel.
  • Chose a variable using the search bar.
  • Study the charts to inspect the correlation between this variable and other variables in your data.
  • Change the collection of variable charts presented using the dropdown menu at the top left of your Correlations panel.

Fixes and Improvements

Stories Worth Sharing

June 14, 2021

New Features

We're excited to say that we've added a new panel to your projects! Find Models after you've built a prediction model using the Models | Train & Predict analysis type.

About Models

Models is designed to help you understand more about the creation and performance of predictions models you build. You'll find three tabs - General - Training - Result - each containing distinct information about your model.

This information helps you unpick how your model was built, its strengths and weaknesses as well as how it performed in specific areas.

As the statistician, George E. P. Box wrote, "All models are wrong, but some models are useful." What he meant by that is that all models are simplifications of the universe, as they must necessarily be."

Nate Silver, The Signal and the Noise

Why We Built Models

Examining the mechanics behind a model is a crucial aspect of ensuring that it's application is appropriate. We've included references to the technology used to develop the model, descriptions of its primary use cases as well as notes from our team on its performance.

Evaluating a prediction model helps us to understand how good the model is. Using accuracy scores and other performance metrics, we get a sense of whether the model was able to make correct or inaccurate predictions. Not only this, but through evaluation of a model, we can understand how to improve it by changing its factors or parameters.

The Docs

Prediction can be pretty complex at the best of times. We've written a few articles to help you get your head around the concepts key to Models.

How To | Models

Start here. This article walks you around the new Models panel - pointing out the different features and showing you how to use them.

Technical Docs | Building Models

Learn why and how to build models. Here, we explicate the process of building a prediction model in Graphext - considering the reasons for doing so alongside some of the key concepts involved.

Technical Docs | Interpreting Models

Use as a dictionary to your Models panel. As well as explaining the usefulness of evaluating models, this article offers explanations for all of the technical terms used inside your project's Models panel.

How can I start using it?

  • Build a project using Models → Train & Predict as your analysis type.
  • Once your project is ready, navigate to the Models panel.
  • Check the mechanics of your model using the General tab.
  • Inspect how it was created using the Training tab.
  • Inspect its performance using the Result tab.

Fixes and Improvements

Stories Worth Sharing

May 31, 2021

New Features

Throughout May, we've been pouring our efforts into fixing bugs and making improvements in Graphext's UX. We've redesigned the project setup wizard, cleaning up the types of analysis you can conduct as well as improving key flows. We've also added a new enrichment option that groups similar spellings in text or categorical variables.

01. Cluster Variables Flow

We've made substantial improvements to our old Network of Columns analysis type. This flow lets you study the links between variables in your dataset.

Choose Models → Cluster Variables to build a project that maps the relationships between variables in your data.

Clustering your variables can be a useful way to understand which factors to feed into a model or to simply grasp which variables are strongly linked to one another.

When setting up a Cluster Variables project, choosing a target variable means that your project will focus on mapping other variables relationships to that target variable.

How can I start using it?

- Choose a dataset with at least two variables.

- Select Models → Cluster Variables as your analysis type.

- Pick which variables you want to cluster in your project.

- Refine your configuration using the questions in the setup wizard.

- Execute your project and study the relationships between your variables.

02. New Enrichment: Group Similar Spellings

Aimed at improving the way you conduct Text Analysis in Graphext, our latest data enrichment groups words with similar spellings. Simply put - the idea is to stop Graphext and Graphex from being considered as two separate entities.

Whether it be typos, misplaced punctuation or a missing letter or two, unintended variation in data is a common - and annoying - occurrence in text analysis. Motivated to overcome this common shortcoming, our team of data scientists and engineers built this algorithm to merge words with similar spellings and made it instantly deployable in Graphext using any type of analysis.

Chose Group similar spellings from the list of enrichment options in your data enrichment tab to start grouping similar text or categorical values. Then, set a threshold to configure the strength of the merges taking place.

How can I start using it?

- Start building a project using a dataset with a text column.

- Choose an analysis type and open the data enrichment tab.

- Select Group similar spellings from the list of enrichment options.

- Set a threshold to control the strength of your word joinings.

- Continue building your project.

- Open your project and check out the new merged variable.

Fixes and Improvements

Core Improvement

Redesign of the project setup wizard. Without removing any of our capabilities, we've tidied up the way that flows are presented. We've removed Employees and Survey analysis types and renamed Google Analytics to Marketing Attribution. You can build the same project using the Models analysis type. We've done this to make it simpler to find the right kind of analysis for your project.

- Fixed a bug preventing users from segmenting data using a direct selection of nodes in the Graph.

- Fixed a bug stopping users on some Mac OS from extracting CSV files downloaded from Graphext.

- Fixed an issue causing some minor Graph UI features to overlap on Safari browsers.

- Disabled a users ability to create insights inside of projects embedded on external websites.

- Fixed a bug stopping users from changing the color of a segmentation whilst - at the same time - renaming the segmentation.

Stories Worth Sharing

01. Good Risk vs Bad Risk: Deconstructing the Feature of 1000 German Loans

Attempting to discover the most influential features of a loan application when considering risk, our team built a model using the features of a loan application to predict whether an applicant would have a good or bad risk rating.

02. Predicting Stroke Probability

In this guide, María and Paul walk you through the process of building a prediction model that analyzes a dataset of 5110 healthcare patients. The model we help you to build will use factors detailing the lifestyle and existing health conditions of a person in order to predict the likelihood of that person suffering a stroke.

April 30, 2021

New Features

We've been improving the flow of key Graphext features to make them more instinctive to work with. You can now save insights to your recipe so that you don't lose them when you recreate a project. It's also easier to save and edit new segmentations. We're also getting ready to introduce you to some substantial - and colorful - new features next month!

01. Saving Insights to a Recipe

Insights are key. They help store your discoveries and build data-driven narratives. You don't want to lose them if you recreate a project using a different flow. We've added the ability to save insights to your recipe. Choose key aspects of your analysis, toggle the save switch and move your analysis forward without losing your findings.

Insights that you save will appear in the Insights panel of your recreated project. You can change the configuration of your recreated project as much as you like - this won't affect the insights that you save.

How can I start using it?

- Start by saving some insights in a project you have built.

- Then, click the green recipe icon at the bottom of the insights that you want to save.

- Toggle the switch to save the insight to your recipe.

- Recreate the project using the settings menu located on the top left of your screen.

- Edit your projects configuration until you are happy with the changes. Execute it.

- Head over to the new project's insights panel to inspect your saved insights.

02. Editing Segmentation Properties

Creating manual or automatic segmentations is a powerful way to discover sub-communities in your data. We've made it easier to customize the properties of segmentations by improving the logic with which you edit a segmentation. We've also added a button to help you cancel the changes that you make and made it simpler to undo steps like renaming or coloring segments.

Segmentations act like a new variable in your dataset, dividing your values along lines that let you see your data from different perspectives. Segmentations are communities in your data and it is important to control how they appear.

Changes we've made to the flow of editing a segmentation are designed to give you greater freedom in renaming, coloring or removing segmentations. Make use of the undo and cancel buttons to control your changes precisely and quickly.

How can I start using it?

- Create a segmentation in one of your Graphext projects.

- Click on the Edit Segmentation icon from inside the segmentations sidebar card.

- Use the icons to rename, color and remove segments.

- Undo and cancel any changes you make using the icons next to the Save button.

Fixes and Improvements

- Fixed a bug preventing sub-cluster region labels from appearing after sub-clustering segments.

- Fixed an issue with the product basket analysis flow.

- Fixed an issue with the network of columns flow.

- Fixed a bug in the keywords flow when using datasets containing URL variables.

- Improved the display of categorical values in your sidebar variable charts. Now - no matter the number of values - categorical variables are always presented in bar charts, not lists.

Stories Worth Sharing

01. Jake's Project: Investigating the Data Behind a Good Day

Andy and María met with Jake to talk about a dataset he's building about himself. From skating to people he sees to whether he eats meat or not - Jake's data diary offers a unique and deeply personal insight into his life. But what makes the difference between good and bad days?

02. Analyzing Reviews

Businesses fighting to understand feedback from hundreds or thousands of customer reviews can analyze what people are saying using NLP techniques but it takes time and resources to do so. This guide is intended to walk you through the process of analyzing customer reviews with Graphext. We will analyze a dataset of 42,656 reviews about 3 Disneyland branches using the Text → Topics flow.

March 26, 2021

New Features

This month, we've been working on making Graphext easier and more intuitive to use. We've added features making it easier to spot relationships between variables using compare charts and inspect data on your Graph quickly by controlling size mapping with greater precision.

We've also added a new type of analysis making it possible to analyze the relationships between recurring items in your data. On top of this, we've extended the list of language support options we have so that you can analyze text written in Turkish and Arabic.

01. Automatic Grouping in Compare Charts with Prediction Models

It just got much easier to start analyzing the variables highlighting relationships in your data when you build a prediction model in Graphext.

To make it faster for you to spot key variables, we've re-configured the way that the Compare panel generates charts explaining differences or similarities between values in your dataset.

Now, when you generate compare charts after building a model using the train and predict analysis type, Graphext will automatically display only important variable charts, hiding the variable charts that aren't as relevant to your analysis. You can change this by switching between the new categories of charts that we've created.

Use the dropdown menus at the top of your Compare panel to start toggling between these variable collections.

Compare chart collections available in train and predict projects:

All - All variables

Target - Modelled variable

Factors - Variables used to create the model

Other Variables - Variables not used as factors or target

Important - Target(s) and Factors

Internal Variables - Variables created by Graphext

None - Select a variable individually

How can I start using it?

- Start by building a project using the train and predict analysis type.

- Then, from the Compare panel of your project, choose some variables to compare.

- Graphext will automatically display the Important collection of variable charts.

- Toggle between collections using the dropdown menu at the top of your Compare panel.

02. Node Size Ranges

Node sizes are a great way of exploring quantitative values in your Graph at a glance. You can now control the range of sizes that nodes are given using a sliding scale.

Select the node size icon from the icon collection at the top of your project's Graph to start customizing the range of node sizes presented in your project. You can control the top and bottom of the range using the slider.

You can also control your node sizes from your project settings. If you want to save your node size configuration, you can do this using the node size slider inside of the project settings window.

How can I start using it?

- Open your project and navigate to the Graph.

- Select the node size icon from the icon list at the top of your Graph.

- Click the three dots from the top right of the variable card.

- Move the range presented in the slider to change the size of your nodes.

03. Co-occurrence Flow in Models

We've added a new type of analysis called Co-occurrence. You can find it within the Models section of the project setup wizard.

Co-occurrence analysis lets you find relationships between recurring items in your dataset. It helps you identify which items are most associated with other items.

Whilst it's already possible to conduct co-occurrence analysis with Text and Product Basket analysis types in Graphext, we built this flow to work with any kind of data. Use Models → Co-occurrence analysis to discover the associations between a range of entities which might be people, products or places.

How can I start using it?

- Upload a dataset containing recurring items.

- Start building your project using the setup wizard.

- Choose Models as your type of analysis.

- Choose Co-occurrence as your sub-analysis type.

- Tell Graphext how you want to aggregate your data and which column contains your items.

- Execute the project and start discovering how items are related.

04. Support for New Languages

Our team of engineers and data scientists have been busy improving Graphext's Natural Language Processing capacity so that it is possible to analyze content written in more languages using our text analysis flows.

You can now analyze text written in Arabic and Turkish on top of the existing language support options we already have built-in. We've also made it much easier to extend the list of languages we support!

Read more about language support at Graphext here.

How can I start using it?

- Start with a dataset containing at least one text field and choose Text or Social Media as your analysis type using the project setup wizard.

- Inside of the Data Extraction tab, choose how you would like to set the language of your text.

- You can set languages manually or by inferring it directly from the text itself.

- That's it. Execute your project and delve into your analysis.

Fixes and Improvements

- Fixed a bug causing the building of projects to fail after a user decided to delete data points and recreate the project without these points.

- Solved issues surrounding an inability to save new manual segmentations.

- We've made it easier to build projects using datasets that match our built-in analysis types. When you build a project matching an analysis type, Graphext will make assumptions on the way you want to set up your project. You can still edit the configuration of projects matching your dataset using the new edit button that we've added - instead of moving backwards through the wizard.

Stories Worth Sharing

01. The Moneyball Method: Using Data to Build a Football Dream Team (On a Budget)

Our team set out to build an exceptional football team for less than 100M Euros. Using data provided in the FIFA 2020/2021 dataset - the video game - we built a prediction model in order to find the key performance attributes for each position. Then, we used this to pick out a team of excellent but undervalued players.

02. Market Basket Analysis

Maria and Paul analyzed a dataset of products from a bakery in Edinburgh to discover the associations between menu items. In this video, they walk through the process of conducting a simple product association analysis that could be used for any e-commerce or retail business.

March 1, 2021

New Features

We've been focusing on improving our data exploration capabilities and have added some features making it easier to build projects with big datasets and dive straight into important aspects of your analysis. On top of this, we are working on making Graphext a more powerful data cleaning and preprocessing tool.

01. Bigger Projects

Projects in Graphext just got bigger. Now, you can create projects using datasets with hundreds of thousands of rows like this one that Victoriano created using 215 thousand rows of data about salary structures in Spain.

To achieve this, we hide the links between nodes when building larger network visualizations. For the technically minded among us - we moved the storage of network links from JSON into our own database and only draw them for local neighbourhoods.

This means that you can still show connections between a node and its neighbours on larger Graphs. We are really excited about the possibilities that this feature opens up.

How does it work?

- Start from your team's Dataset panel.

- Upload a large dataset.

- Build any type of project using it.

- Start discovering communities inside of your enormous network!

02. Shortcut to Compare

Using the dropdown menu inside of your sidebar variable cards, you can now jump straight into the Compare panel to discern which other variables best explain the difference between values belonging to this variable. Select Open in Compare from the menu list to start understanding your data using compare charts.

We added this feature to make it quicker and simpler for you to jump into a more intricate investigation of the distinguishing features of values in your data.

How does it work?

- Start from your project's Graph, Details or Trends panel.

- Find the variable you want to inspect.

- Click the three dots from the top right of the variable card.

- Choose Open in Compare from the menu list.

- Use the compare charts to pick out the defining features of your values.

03. New Variable Types

We've added the ability to set the type of your variables in more detail. Boolean, Sex and Currency are among the new variable types that you can now make use of in Graphext. From inside your team's Dataset panel, inspect a dataset and use the dropdown under a variable name to set its type to one of the nine options now available.

How does it work?

- Start from your team's Dataset panel.

- Inspect a dataset.

- Click on the dropdown menu underneath a variable name to change its type.

- Choose a new type from the menu list.

- The type of this variable will now update.

04. More Projects for Public Users

We've been delighted with the number of new people using Graphext recently. As a result, we've decided to open up the limit of projects that users can create with a free account. Graphext Public users can now create up to 4 projects.

How does it work?

- Sign up for a Graphext Public account .

- Check out our guides on Getting Started.

- Start analysing your data using Graphext.

Fixes and Improvements

- Corrected a problem with clustering configuration in Text → Keyword Co-Occurrence projects.

- Fixed an issue with segment names when performing intersection operations.

- Solved a query text error that was occurring when users searched inside the Graph.

- Added functionality so that longer variable names appear complete rather than incomplete in the Compare panel.

- Fixed issue with dataset vectorization - layout_datset step - as this was occasionally failing on some datasets.

Stories Worth Sharing

01. Super Bowl Ads

Inspired by an analysis by Ryan Best at FiveThirtyEight, Victoriano and Andy clustered 20 years of Super Bowl commercials. They were interested in which popular brands used characteristics like comedy, sex, patriotism and animals to sell their products. Read More.

02. Predicting Employee Behaviour

Our team have been working on a guide to explain how Graphext can be used to interpret the characteristics, attitudes and preferences of employees. This guide looks at how a prediction model built-in to Graphext might be used to understand why sub-communities of people left their jobs. Read More.