Prepare Data

As part of building a recipe to analyse your data, you can enrich your data with information and functions built into Graphext. Enriching your data increases the information available which Graphext can use to build your recipe.

This can improve the accuracy of your predictions and refine your clusters as well as creating useful additional variables based on the existing content of your data.

‍

There are two ways of enriching your data in Graphext; using our built-in enrichments or using enrichments that we've integrated with the platform.

‍

Built-in Enrichments are created by our team of data scientists and engineers and don't require any API integrations. You can simply add the enrichment to your project setup, customize it and execute it as part of building the project.

‍

Integrated Enrichments use technology created by organisations other than Graphext and may require you to supply an API key - accessed external to Graphext. They contain powerful algorithms which can be really useful to data science projects and, with a few additional steps, can offer extra value to your analysis.

‍

Adding Data Enrichment

To extract additional information from your dataset using data enrichment, start building a recipe using the project setup wizard.

Integrated enrichment options require you to provide a unique APIΒ key.

‍

‍

How to Enrich Your Data?

  1. Start building a recipe using the project setup wizard.
  2. Pick a type of analysis.
  3. Choose an option from the sidebar to further indicate the type of analysis you want to conduct.
  4. Inside the recipe wizard belonging to the type of analysis you have chosen, open the 'Data Enrichment' form.
  5. Select the 'Enrich your data' dropdown.
  6. Choose an item or items from the dropdown.
  7. Complete the forms that appear below to configure your data enrichment.
  8. Done ... Your enrichment selections will be processed once you execute your project!

‍

‍


‍

Built-in Enrichments

Extract Date Components

Use a date variable to create new columns containing precise information about the month, week and day of a given date value.

‍

‍

How to Extract Date Components?
  1. Select Extract Date Components as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains date values.
  3. Choose which date components that you want to extract.
  4. That's it. Executing your project will extract these date components as new variables.

‍

Analyze Text Sentiment

Use an industry-leading sentiment analysis model from the team at Cardiff NLP and hosted by Hugging Face. Sentiment analysis analyzes text in order to predict whether it has a positive, neutral or negative sentiment. Explore the model's documentation here.

‍

How to Analyze Text Sentiment?
  1. Select Analyze Text Sentiment as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains the text you want to analyze.
  3. That's it. Executing your project will add a new variable with the sentiment of your text.

‍

Detect Emotion in Text

Use a model hosted by Hugging Face to detect emotion in text from a range of 28 emotions as defined by the GoEmotions dataset from Google. Explore the model's documentation here.

‍

How to Detect Emotion in Text?
  1. Select Detect Emotion in Text as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains the text you want to analyze.
  3. That's it. Executing your project will add a new variable with the emotions detected in your text.

‍

Fill Missing Values

Missing values can be annoying, misleading and disruptive. Replacing them with specific values can help to clean up and prepare your dataset for analysis.

To fill missing values, select a variable with missing values and tell Graphext how you would like to fill these values. You can choose from options like using a constant value, using the most or least frequently occurring value and using the column's minimum or maximum value. Look for the replaced variable in your transformed dataset.

‍

‍

How to Fill Missing Values?
  1. Select Fill Missing Values as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains your missing values.
  3. Choose how you want to replace your missing values.
  4. That's it. Look for the replaced variables in your transformed dataset.

‍

Predict Missing Values

As well as filling missing values you can train a model to predict missing values in your data. Once your project has been built - you will have a new variable with the result of the model's predictions.

‍

‍

How to Predict Missing Values?
  1. Select Predict Missing Values as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains your missing values.
  3. Choose how you want to replace your missing values.
  4. That's it. Look for the replaced variables in your transformed dataset.

‍

Group Similar Spellings

This enrichment will group words with similar spellings. Simply put - the idea is to stop 'Graphext' and 'Graphex' from being considered as two separate entities.

Whether it be typos, misplaced punctuation or a missing letter or two, unintended variation in data is a common - and annoying - occurrence in text analysis. Setting a threshold for joining words will control the strength of the merges taking place.

Read more about how this enrichment works and the method we used to build it here.

‍

‍

How to Group Similar Spellings?
  1. Select Group Similar Spellings as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains the values you want to group.
  3. Set a threshold to control the strength of your word joinings.
  4. That's it. Open the project and look out for the new merged variable.

‍

Group Similar Semantics

This enrichment will group words with similar meanings. This is a useful way of transforming your dataset so that words like 'table' and 'desk' can be associated and used collectively to filter your data.

Setting a threshold for grouping similar semantics will control the strength of the merges taking place.

‍

‍

How to Group Similar Semantics?
  1. Select Group Similar Semantics as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains the values you want to group.
  3. Specify the language of your text values.
  4. Set a threshold to control the strength of your word joinings.
  5. That's it. Open the project and look out for the new merged variable.

‍

Extract URL Components

When working with URLs in your data, it is often useful to extract new variables containing the domain, path and schema of the URL. Using this enrichment you can parse the URL values in your data and use the components of a URL to filter your data.

‍

‍

How to Extract URL Components?
  1. Select Extract URL Components as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains the URL values in your data.
  3. That's it. Open the project and look out for the new variables containing the components of your URL values.

‍

Standardize Locations

Variation in the way that people write and record location data can make for a messy analysis. Similar to the way that our Group Similar Spellings enrichment works, standardizing location data means grouping variations that refer to the same place but are spelt differently.

For instance, without deploying this enrichment, 'Manchester' and 'Manchester, UK' would be considered as two separate places. Our enrichment has been designed to let you collect these two values and filter your data more accurately with locations.

‍

‍

How to Standardize Locations?
  1. Select Standardize Locations as an enrichment using the Data Enrichment tab during your project setup.
  2. Tell Graphext which column contains the location values you want to group.
  3. Set a threshold to control the strength of your merges.
  4. That's it. Open the project and look out for the new merged variable.

‍

Integrated Enrichments

Add Demographic Data for Spain Using Coordinates

Require Google Geocoding API Key

Use latitude and longitude variables to enrich your data with census information. Geographies identifiable by coordinates in your dataset will be associated with Spanish location-based demographic data such as age, marital status and education level.

‍

Analyze Text Sentiment - Google

Requires Google NLP API Key

Analyze the sentiment of text fields using the Google NLP API. This assigns positive or negative ratings to text in your data.

‍

Analyze Text Sentiment - Meaningcloud

Requires Meaningcloud NLP API Key

Analyze the sentiment of text fields using the Meaningcloud NLP API. This assigns positive or negative ratings to text in your data.

‍

Extract Text Topics

Requires Google NLP API Key

Identify the topic of text fields using the Google NLP API. This enrichment identifies the core theme of text values in your data.

‍

Add Google Places Info

Requires Google Places API Key

Extract information about the most relevant places around a location using the Google Places API.

‍

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.