You can build prediction models in Graphext. Models use existing features (factors) of your data to predict the value of a target variable. They do this by analyzing the existing relationships between factors and your target variable. This understanding gained by the model about these relationships forms the basis of the model's predictions. It predicts new values for your target variable based on the values inside of the factors.
"Prediction, not narration, is the real test of our understanding of the world."
- Nassim Nicholas Taleb
Factors are the features of your data you want to use to calculate relationships to your target variable.
Targets are the variables that your model will make predictions about. Their relationships to factors define how a model calculates a prediction.
When you build a model in Graphext, you will be asked to specify which variables in your data you wish to use as factors and which variables you wish to set as targets. When you execute your project, Graphext will begin training the model to make predictions on your target variable. It does this by analyzing the existing relationships between your factors and your target variable.
By analyzing these relationships the model is able to calculate the probability of a target value changing if a factor value changes. In this way it forms an understanding of the importance of each factor when considering target values.
When making predictions on new target values, the model can use its understanding of the way that factors affect target values to predict what the target value will be given a set of factors. The target value given by a prediction can only be a product of the existing relationships between factors and targets in your data. This is why quantity and quality both matter in relation to predictive models.
Training here means using data to learn which values of the model's parameters generate predictions that best match known labels (i.e. the values of the target variable).
After you deploy a model, it will analyze the value of each factor in relation to the target value for each data point. Then, moving through the data, it will attempt to gain an understanding of how the change in the value of a factor affects the value of a target.
In Graphext, we use all labelled rows in your dataset to train the model. The whole story is somewhat more complex, however. If we did indeed use all data in the training process, we would create a model that is very good at predicting known data, but we wouldn't have any way to ensure it can also predict data it hasn't seen before.
For that reason, the training procedure can be configured such as to use parts of the labelled data to train the model, and reserve other parts of the data to evaluate the model's performance on new data. These are usually referred to as train and test splits of the data. The performance of the model on the test split(s) provides an estimate of the performance you can expect when using the model to predict new data later on.
You can train models on target columns that have missing values. A model will use the relationships between factors and existing target values to make predictions about unknown target values.
Use Case Example
You have two datasets with identical column names and variable types but one features values for your target variable and the other doesn't. Zip these files together and upload them to Graphext. You can predict the unknown values for the target field.
Models you train will be stored within your Graphext workspace. After training a model on a dataset with some target values, you can reuse this model to make new predictions on other datasets with identical column names and variable types.
Use Case Example
You have a dataset of accommodation reviews for last month and took the time to annotate this with a 'category' field with values capturing the main focus of the review. You can reuse this model in future months to make predictions on the 'category' of new reviews.
Despite the fact you already have complete values in your target field, you can use a model to better understand how each factor is related to the target variable. Your predictions will tell you more about which feature of your dataset is most important when considering the value of your target variable.
Use Case Example
You have a dataset of featuring employees alongside their performance level and a set of characteristics. To find out which characteristics are most strongly related to good performance, you can use a prediction model to analyse the strength of the relationships between your employee's characteristics and their performance.
Start building a model using the project setup wizard. First, choose a dataset to work with and decide which variable to use as factors and which to set as a target. Then select 'Models' as your type of analysis.
You can reuse a model you have trained on new datasets with identical column names and variable types. When you build a model, it will be stored in your Graphext workspace making it accessible in other projects that you create within the same team.
To reuse a model, you need to set the name of the model you want to use inside of the advanced editor. You can find which models are available to you by inspecting the autofill menu that appears after you start typing inside of the "model" parameter field.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.