As part of building a recipe to analyse your data, you can enrich your data with information and functions built into Graphext. Enriching your data increases the information available which Graphext can use to build your recipe.
This can improve the accuracy of your predictions and refine your clusters as well as creating useful additional variables based on the existing content of your data.
There are two ways of enriching your data in Graphext; using our built-in enrichments or using enrichments that we've integrated with the platform.
Built-in Enrichments are created by our team of data scientists and engineers and don't require any API integrations. You can simply add the enrichment to your project setup, customize it and execute it as part of building the project.
Integrated Enrichments use technology created by organisations other than Graphext and may require you to supply an API key - accessed external to Graphext. They contain powerful algorithms which can be really useful to data science projects and, with a few additional steps, can offer extra value to your analysis.
To extract additional information from your dataset using data enrichment, start building a recipe using the project setup wizard.
Integrated enrichment options require you to provide a unique API key.
Use a date variable to create new columns containing precise information about the month, week and day of a given date value.
Use the first names of people in your data to make predictions about their gender. This enrichment uses Graphext's own prediction algorithm.
Missing values can be annoying, misleading and disruptive. Replacing them with specific values can help to clean up and prepare your dataset for analysis.
To fill missing values, select a variable with missing values and tell Graphext how you would like to fill these values. You can choose from options like using a constant value, using the most or least frequently occurring value and using the column's minimum or maximum value. Look for the replaced variable in your transformed dataset.
As well as filling missing values you can train a model to predict missing values in your data. Once your project has been built - you will have a new variable with the result of the model's predictions.
This enrichment will group words with similar spellings. Simply put - the idea is to stop 'Graphext' and 'Graphex' from being considered as two separate entities.
Whether it be typos, misplaced punctuation or a missing letter or two, unintended variation in data is a common - and annoying - occurrence in text analysis. Setting a threshold for joining words will control the strength of the merges taking place.
This enrichment will group words with similar meanings. This is a useful way of transforming your dataset so that words like 'table' and 'desk' can be associated and used collectively to filter your data.
Setting a threshold for grouping similar semantics will control the strength of the merges taking place.
When working with URLs in your data, it is often useful to extract new variables containing the domain, path and schema of the URL. Using this enrichment you can parse the URL values in your data and use the components of a URL to filter your data.
Variation in the way that people write and record location data can make for a messy analysis. Similar to the way that our Group Similar Spellings enrichment works, standardizing location data means grouping variations that refer to the same place but are spelt differently.
For instance, without deploying this enrichment, 'Manchester' and 'Manchester, UK' would be considered as two separate places. Our enrichment has been designed to let you collect these two values and filter your data more accurately with locations.
Use latitude and longitude variables to enrich your data with census information. Geographies identifiable by coordinates in your dataset will be associated with Spanish location-based demographic data such as age, marital status and education level.
Analyze the sentiment of text fields using the Google NLP API. This assigns positive or negative ratings to text in your data.
Analyze the sentiment of text fields using the Meaningcloud NLP API. This assigns positive or negative ratings to text in your data.
Identify the topic of text fields using the Google NLP API. This enrichment identifies the core theme of text values in your data.
Extract information about the most relevant places around a location using the Google Places API.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.