May 25, 2021

May Update

🎁 New Features


Throughout May, we've been pouring our efforts into fixing bugs and making improvements in Graphext's UX. We've redesigned the project setup wizard, cleaning up the types of analysis you can conduct as well as improving key flows

We've also added the ability to source and describe datasets as well as adding a new enrichment that groups similar spellings.


01. Cluster Variables Flow

We've made substantial improvements to our old Network of Columns analysis type. This flow lets you study the links between variables in your dataset. 

Choose Models → Cluster Variables to build a project that maps the relationships between variables in your data. 



This can be a useful way to understand which factors to feed into a model or to simply grasp which variables are strongly linked to one another.

When setting up a Cluster Variables project, choosing a target variable means that your project will focus on mapping other variables relationships to that target variable. 


How Can I Start Using It?

  • Choose a dataset with at least two variables.
  • Select Models → Cluster Variables as your analysis type.
  • Pick which variables you want to cluster in your project.
  • Refine your configuration using the questions in the setup wizard.
  • Execute your project and study the relationships between your variables.


02. New Enrichment: Group Similar Spellings

Aimed at improving the way you conduct Text Analysis in Graphext, our latest data enrichment groups words with similar spellings. Simply put - the idea is to stop Graphext and Graphex from being considered as two separate entities.



Whether it be typos, misplaced punctuation or a missing letter or two, unintended variation in data is a common - and annoying - occurrence in text analysis. Motivated to overcome this common shortcoming, our team of data scientists and engineers built this algorithm to merge words with similar spellings and made it instantly deployable in Graphext using any type of analysis.

Chose Group similar spellings from the list of enrichment options in your data enrichment tab to start grouping similar text or categorical values. Then, set a threshold to configure the strength of the merges taking place.


How Can I Start Using It?

  • Start building a project using a dataset with a text column.
  • Choose an analysis type and open the data enrichment tab.
  • Select Group similar spellings from the list of enrichment options.
  • Set a threshold to control the strength of your word joinings.
  • Continue building your project.
  • Open your project and check out the new merged variable.  


🐞 Bug Fixes & Improvements


  • Improved the design of the project setup wizard. Without removing any of our capabilities, we've tidied up the way that flows are presented. We've removed Employees and Survey analysis types and renamed Google Analytics to Marketing Attribution. You can build the same project using the Models analysis type. We've done this to make it simpler to find the right kind of analysis for your project.
  • Fixed a bug preventing users from segmenting data using a direct selection of nodes in the Graph.
  • Fixed a bug stopping users on some Mac OS from extracting CSV files downloaded from Graphext. 
  • Fixed an issue causing some minor Graph UI features to overlap on Safari browsers.
  • Disabled a users ability to create insights inside of projects embedded on external websites.
  • Fixed a bug stopping users from changing the color of a segmentation whilst - at the same time - renaming the segmentation.


📖 Stories worth Sharing


Good Risk vs Bad Risk: Deconstructing the Feature of 1000 German Loans

Attempting to discover the most influential features of a loan application when considering risk, our team built a model using the features of a loan application to predict whether an applicant would have a good or bad risk rating.


Predicting Stroke Probability

In this guide, Maria and Paul walk you through the process of building a prediction model that analyzes a dataset of 5110 healthcare patients. The model we help you to build will use factors detailing the lifestyle and existing health conditions of a person in order to predict the likelihood of that person suffering a stroke