Setting Up

This tutorial will walk you through the process of creating your first Graphext project. Projects form the basis of analyzing your datasets in Graphext. We will be working with a dataset called 'Fictional-Employees.csv', which contains information about the performance and characteristics of a set of 100 imaginary employees. Our project will attempt to group similar employees based on the scores they have for the set of 10 characteristics in the dataset. Download the dataset here.

"The goal is to turn data into information, and information into insight."

- Carly Fiorina

The aim of a project is to turn your raw data into useful insights. When you create a project in Graphext, you choose a set of steps to process your data. Picking these steps using the setup wizard builds an algorithm that ultimately determines how your data is transformed and represented within your project.

To build a project, you first need to choose a type of analysis to perform. There are 12 types of analysis in Graphext, each intended to transform different types of data in different ways. For instance, if you are working with news reports and wanted to analyse their text, 'Text' analysis would be a better choice than 'Geospatial' analysis.

Once you've chosen a type of analysis, you will begin to build steps to transform your data. Within each type of analysis is a set of options prompting you to be more specific about the kind of project you want to build. Every choice you make from this point on will process or transform your data in some way.

Adding a step customises template code built into Graphext, allowing you to perform complex analytics on your dataset without doing the heavy lifting.

Step 1. Selecting a Dataset

Pick a Dataset

First, make sure you have access to the dataset Fictional-Employees.csv. Upload it to your personal team in Graphext if you haven't done so already and select it from the datasets panel of your workspace. This will bring up your project setup wizard on the right-hand side of your screen where you can see the available types of analysis.

Filtering and Sampling

In the top right of the project setup sidebar, you should see a summary of the number of rows in your dataset alongside an icon featuring 4 horizontal bars. This icon gives you the ability to filter or sample your data before you start analyzing it. For bigger datasets, this is a useful way of making your analysis more efficient. Since 'Fictional Employees' only contains 100 rows of data, we don't need to sample it. You should be working with 100% of the dataset.

Step 2. Choosing a Type of Analysis

Take a look through the types of analysis that are in the setup wizard. It's important to choose one that matches the kind of dataset you are working with. We are working with data on Employees so it makes sense to choose the 'Employees' option.

Remember the aim of our project is to group employees based on the similarity of their characteristic scores.

After selecting 'Employees' you will be presented with two options prompting you to be more specific about what kind of project you want to build. We can either choose to 'Cluster' our employees or 'Aggregate and cluster' them. Since 'Fictional Employees' only contains data on 100 employees it doesn't make sense to aggregate them. We could aggregate them by their gender but then we'd only have two rows in our data ... let's avoid that for now.

Choose 'Cluster'.

Choose 'Employees' - 'Cluster' as your type of analysis.

Step 3. Customizing Steps

By choosing to cluster your dataset of employees, you've started to build some steps towards transforming your data. Next, you'll customize the steps involved in creating your project. This will help Graphext to transform aspects of 'Fictional Employees', understand how to calculate your clusters and represent them in your Graph.

Inside of the project setup wizard, you should see three sections; 'Data Enrichment', 'Clusters and Network Creation' and 'Network Visualization'. Completing these forms tells Graphext how you will customize the steps involved in building your project. Let's walk through them.

Data Enrichment

Enriching your data means adding more information to your dataset based on the values that are already present. For more information on how you can do this, see our article on data enrichment.

'Fictional Employees' is quite a simple dataset and our text variables contain a relatively small amount of text. For this reason, we won't use any enrichment so leave the 'Data Enrichment' dropdown closed.

Cluster and Network Creation

Now it's time to customize how Graphext will group your employees. Inside of the 'Cluster and Network Creation' section, you should add the variables you want to use as factors in calculating your network alongside a target variable.

Factors are variables that will be considered when creating links in your data.

Targets are variables that are key performance indicators. They are the variables that you want to gain a deeper understanding of.

Adding Factors

For our imaginary employees, we have a set of scores for 10 characteristic variables. These will be our factors used to create links between our employees. Add them to the list of factors by checking their boxes and sending them to the 'Factor' column.

Adding a Target

We can choose to set a target variable or leave it blank. Is there a variable in 'Fictional Employees' that it might be useful to gain a deeper understanding of in relation to our employee's characteristic scores? Potentially the 'Performance level' variable. Add this variable as a target. This way, we might be able to gain an understanding of how the performance level of our employees varies between groups that share similar strengths and weaknesses.

Network and Clusters

Now that you've added factors and a target to your network, a new section will appear in the setup wizard sidebar called 'Networks and Clusters'. This section represents the settings of your network. You can customize the number of links each data point will have as well as controlling the more complex aspects of your project. We won't do this in this tutorial so leave this section closed.

Network Visualization

Open up the 'Network Visualization' section. Here you can customize the way your data is represented inside of your project's Graph and sidebar panels.

To start with, inside of the first dropdown menu, set the 'Name' of employees as the variable you want to identify your employees by. This way the names of employees will appear as labels above the nodes on your Graph, making it easy to identify which data point related to which employee.

Next, add some more variables to view as pinned in your sidebars. These variables will appear on the your project's left sidebar making them more easily accessible than other variables. Choose the most important variables in your dataset. On top of 'Performance level', which is already pinned, 'Gender' could be important so add this to the pinned list using the dropdown menu.

Step 4. Executing the Project

When you've finished pinning your variables, you are ready to execute your project. By completing the forms in step 3, you've told Graphext how to calculate your clusters and also added a few customized touches to your network visualisation. Executing the project will put these steps into action.

Click 'Continue' from the bottom of your project setup wizard. Now all that's left to do is name your project and set the wheels in motion. Name your project something like 'Clustering Fictional Employees' and select 'Execute'.

Graphext will now build your project. Your project setup wizard will close and you will be able to see the project being created inside the workspace of your personal team. This should only take a minute or so but check the text inside of the project card to see how much time is left until your project is ready.

When Graphext has built your project, the text inside the project card will read 'Created by you' alongside information about your team and how long ago you created the project. You can click on the card to explore your first project in Graphext!

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.