Technical Docs | Models

You can build prediction models in Graphext. Models use existing features (factors) of your data to predict the value of a target variable. They do this by analyzing the existing relationships between factors and your target variable. This understanding gained by the model about these relationships forms the basis of the model's predictions. It predicts new values for your target variable based on the values inside of the factors.

‍

"Prediction, not narration, is the real test of our understanding of the world."

- Nassim Nicholas Taleb

‍

Factors are the features of your data you want to use to calculate relationships to your target variable.

Targets are the variables that your model will make predictions about. Their relationships to factors define how a model calculates a prediction.

‍


What Happens When I Build a Model?

When you build a model in Graphext, you will be asked to specify which variables in your data you wish to use as factors and which variables you wish to set as targets. When you execute your project, Graphext will begin training the model to make predictions on your target variable. It does this by analyzing the existing relationships between your factors and your target variable.

By analyzing these relationships the model is able to calculate the probability of a target value changing if a factor value changes. In this way it forms an understanding of the importance of each factor when considering target values.

When making predictions on new target values, the model can use its understanding of the way that factors affect target values to predict what the target value will be given a set of factors. The target value given by a prediction can only be a product of the existing relationships between factors and targets in your data. This is why quantity and quality both matter in relation to predictive models.

‍


Train | Test

Training here means using data to learn which values of the model's parameters generate predictions that best match known labels (i.e. the values of the target variable).

After you deploy a model, it will analyze the value of each factor in relation to the target value for each data point. Then, moving through the data, it will attempt to gain an understanding of how the change in the value of a factor affects the value of a target.

In Graphext, we use all labelled rows in your dataset to train the model. The whole story is somewhat more complex, however. If we did indeed use all data in the training process, we would create a model that is very good at predicting known data, but we wouldn't have any way to ensure it can also predict data it hasn't seen before.

For that reason, the training procedure can be configured such as to use parts of the labelled data to train the model, and reserve other parts of the data to evaluate the model's performance on new data. These are usually referred to as train and test splits of the data. The performance of the model on the test split(s) provides an estimate of the performance you can expect when using the model to predict new data later on.

‍


Why Create a Model?

Predicting Missing Values

You can train models on target columns that have missing values. A model will use the relationships between factors and existing target values to make predictions about unknown target values.

‍

Use Case Example

You have two datasets with identical column names and variable types but one features values for your target variable and the other doesn't. Zip these files together and upload them to Graphext. You can predict the unknown values for the target field.

‍

Train a Model to Reuse Later

Models you train will be stored within your Graphext workspace. After training a model on a dataset with some target values, you can reuse this model to make new predictions on other datasets with identical column names and variable types.

‍

Use Case Example

You have a dataset of accommodation reviews for last month and took the time to annotate this with a 'category' field with values capturing the main focus of the review. You can reuse this model in future months to make predictions on the 'category' of new reviews.

‍

‍

Understand Relationships in Your Data

Despite the fact you already have complete values in your target field, you can use a model to better understand how each factor is related to the target variable. Your predictions will tell you more about which feature of your dataset is most important when considering the value of your target variable.

‍

Use Case Example

You have a dataset of featuring employees alongside their performance level and a set of characteristics. To find out which characteristics are most strongly related to good performance, you can use a prediction model to analyse the strength of the relationships between your employee's characteristics and their performance.

‍


Creating a Model

Start building a model using the project setup wizard. First, choose a dataset to work with and decide which variable to use as factors and which to set as a target. Then select 'Models' as your type of analysis.

‍

How to Create a Model?

  1. Start from the 'Datasets' panel of your Graphext workspace.
  2. Select a dataset to start working with it.
  3. Pick 'Models' as your type of analysis from the left sidebar.
  4. Select the option to 'Train and predict' a model.
  5. Inside the 'Clusters and Network Creation' card that appears, start by adding a target.
  6. To add a target, select the variable from the list on the right side of the 'Clusters and Network Creation' card.
  7. Click 'Send Here' on the first box under the 'Target' column.
  8. Next, add a factor by selecting a variable from the list on the right side of the 'Clusters and Network Creation' card.
  9. Click 'Send Here' on the first box under the 'Factors' column.
  10. Add more factors by repeating steps 8 - 9.
  11. Review the other cards in the project setup sidebar before selecting 'Continue'.
  12. Done ... Your model will make predictions on your target variable once you have executed your project.

‍


Reusing a Model

You can reuse a model you have trained on new datasets with identical column names and variable types. When you build a model, it will be stored in your Graphext workspace making it accessible in other projects that you create within the same team.

To reuse a model, you need to set the name of the model you want to use inside of the advanced editor. You can find which models are available to you by inspecting the autofill menu that appears after you start typing inside of the "model" parameter field.

‍

How to Reuse a Model?

  1. Start from the 'Datasets' panel of your Graphext workspace.
  2. Select a dataset to start working with it. This must have identical column names and variable types to the dataset you used to train your model.
  3. Pick 'Models' as your type of analysis from the left sidebar.
  4. Select the option to 'Train and predict' a model.
  5. Inside the 'Clusters and Network Creation' card that appears, add your variables as factors and targets. These should be the same as the factors and targets you used to train your model.
  6. Select 'Open Advanced Editor.
  7. Inside the advanced editor, find the 'train' step.
  8. Directly underneath, start typing 'predict'.
  9. Click the 'predict' step from the autofill menu.
  10. The first parameter currently reads 'data'. Copy and paste the first parameter from the 'train' step in place of the first parameter of the 'predict' step. Both should now include a reference to your targets and factors.
  11. Inside the second parameter of the 'predict' step, enter the name of your model.
  12. Rename the output of your model so that it reads 'ds.gx_prediction' rather than 'ds.predicted'.
  13. Delete the 'train' step.
  14. Done ... Click continue to make predictions on your new dataset using your old model.

‍

  1. Start from the 'Datasets' panel of your Graphext workspace.
  2. Select a dataset to start working with it. This must have identical column names and variable types to the dataset you used to train your model.
  3. Pick 'Models' as your type of analysis from the left sidebar.
  4. Select the option to 'Train and predict' a model.
  5. Inside the 'Clusters and Network Creation' card that appears, add your variables as factors and targets. These should be the same as the factors and targets you used to train your model.
  6. Select 'Open Advanced Editor.
  7. Inside the advanced editor, delete the train step.
  8. Start typing predict in the same place that you deleted the train step and select the predict option from the autofill menu that appears.
  9. Inside of the predict step, replace data with ds.
  10. Inside of the predict step, delete the model parameter.
  11. Start typing " , in place of the model parameter.
  12. Choose your model from the autofill list. Make sure to add another " to close the name of the model.
  13. Inside of the predict step, replace ds.predicted with ds.gx_prediction.
  14. Done ... Click continue to make predictions on your new dataset using your old model.

‍

‍

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.