In this tutorial, we will explore the features of the Graph, the panel inside projects that displays your data in a network visualization. To start working with the features of the Graph we need a dataset and we need to create a project using it. We will be working with the same Fictional-Employees.csv dataset we used in our Setting Up tutorials. We'll use this to create a project focused on clustering employees based on their characteristic scores.
Once we have our Graph ready to explore, you will learn how to select data inside of it using different methods. Additionally, you will learn how change the appearance of the Graph to highlight features of the dataset using color mapping, size mapping and labels.
"The time to build a network is always before you need one."
The position of nodes on your Graph represents their relationship to one another. Strongly related nodes are pulled towards one another. The strength of this pull depends on the strength of the similarity. You define how nodes are linked when you set up a project.
Inside of the Graph, you can start selecting elements using the sidebars, the direct selection tool or by clicking on the node itself and then selecting its neighbours. The labels appearing over nodes and regions in the Graph can be changed to represent different variables in your data. Additionally, you can use color mapping or size mapping to change the appearance of your network depending on the values within a variable.
If you followed our Setting Up introductions and have already setup a project clustering the Fictional Employees dataset using their characteristics as factors, move onto step 2.
Otherwise, make sure you have the Fictional-Employees.csv file downloaded to your computer. Upload this to the workspace of your Personal team in Graphext and select it to bring up the project setup wizard on the right-hand side of the screen. For a more detailed tutorial on datasets, have a look at our Intro to Datasets tutorial.
Choose Employees as your analysis type and then Clustering. Then, inside of the Clusters and Network Creation menu, set the Performance level variable as your target and set the 10 characteristic variables as your factors.
Finally, inside the Network Visualisation menu, set Name as the variable to identify your nodes by and add Gender to the list of pinned variables. That's it. Continue to name your project something like Fictional Employees: Clustering and execute it. Head back to the Projects panel of your Graphext workspace and hold on for a couple of minutes whilst Graphext builds your project.
For a more detailed tutorial on setting up a project, follow our tutorial Intro to Projects.
Starting from the Projects panel of your Graphext workspace, find the Fictional Employees: Clustering project and open it. Your project will open up to display the Graph panel.
The first glimpse of your Graph is always an exciting moment. There is a lot to take in so take a moment to digest what you see and zoom in and out to get a feel for the way that nodes are grouped and related to one another.
You should see a collection of nodes linked to one another inside a network. Each node represents one employee and is related to 15 other employees based on the similarity of the scores they have been given for the 10 characteristics you used as factors.
There are 5 clusters inside your Graph, which are currently represented by the color of the nodes. Clusters are groups of nodes that have been closely linked by the similarity of the values they have for each of your factor variables. Use the search bar at the top of your right sidebar to search for a variable called 'umap_cluster'. Here you can see that the colors in the chart correspond with the colors in your Graph. Click the three dots from the top right of the variable card and choose Pin column. This moves the variable chart to your left sidebar, where it is more accessible.
Let's start selecting some nodes inside the Graph to inspect them a little closer. There are a few different ways of selecting nodes. One of them is to use the variable filters inside your sidebars to select nodes that have specific features.
Select the green bar representing Cluster 3 from the Umap_Cluster card that you just moved to the left sidebar. You'll see that the Graph has zoomed in to focus on Cluster 3, leaving nodes in other clusters blurred.
Take a moment to browse the changes that have occurred in your sidebar variable charts. The size of the blue bars inside each chart has shortened. This is because these blue bars are now only representing the values inside the filter you just applied, whereas the grey bars represent the distribution of that value in your entire dataset.
From the top of your left sidebar, change the representation of data in your variable charts from Absolute to Relative. Relative representation displays your values proportionally to the selection you've made. The way that your variable sidebar charts vary between different groups of your data makes it easy to spot the characteristics of these groups.
Scroll through the characteristic variable charts inside your left sidebar. Notice that the values inside the Intelligence chart are heavily weighted on the lower end of the range. Not many employees in this cluster have a high intelligence rating. This could well be a significant reason why they were clustered together.
It seems like there is only one node belonging to the Cluster 3 with an Intelligence rating of ten. Check this out by clicking on the bar inside the Intelligence chart representing the value of 10. Yep, it looks like Sarah is the only employee in this cluster with an intelligence rating of 10. You can check this by referring to the count of nodes from the top of your right sidebar. The number should be 1, indicating that only one employee has been selected.
Let's try to investigate the employees that have been linked to Sarah and why. Click on the Clear button to reset your active filter. Now enter 'Sarah' into the search bar at the top of your Graph to find Sarah again. This looks up any nodes with a label matching the query you enter.
Now, click on Sarah's node directly in the Graph then clear the search query. Then, from the window that appears, click Select Neighbors. This should highlight the nodes that are connected to Sarah. If it doesn't, make sure you have no other filters active.
Use the sidebars to look for patterns in Sarah's community. Spot any trends? It looks like employees in Sarah's community have high ratings for Passion but relatively low ratings for Competitiveness.
From the toolbar at the top of your Graph, you have a range of tools available to help you use the Graph. The one that is at the bottom represents a direct selection tool allowing you to drag your cursor over nodes in the Graph to select them. This can be a great way to isolate smaller clusters in your Graph and discover patterns in these sub-communities.
Clear all of your active filters so the full dataset is in focus. Now, click on the direct selection tool and find the small group of nodes on the far right of the network containing Lucy, Alex and Celia. Use the direct selection tool to select these three nodes at the same time. Got it?
Now, scroll up to the top of your right sidebar. Pay attention to the bars inside the Passion variable chart. Notice that the employees inside our direct selection all have very low ratings for the Passion characteristic. This kind of discovery is pretty easy when you have a hunch about a small cluster, directly select the nodes inside of it and then inspect your variable charts.
Changing the appearance of your Graph using color and size mapping is a useful way to recognize data points that have specific features. Currently, our nodes are all the same size and colored by the Umap_Cluster variable. Let's play around with the color of them to see how this affects our ability to recognize the characteristics of groups within the Graph.
You can color map your Graph by clicking on the raindrop icon from the top right of a sidebar variable card. Find the Performance Level variable card from the top of your left sidebar. It could be interesting to discover whether we have some high or low performers clustered together in a sub-community.
Click the Performance Level raindrop icon and notice that the nodes inside your Graph have immediately changed color to represent their value for this variable. High performers are blue, low performers are orange and medium performers are green. You can immediately see some smaller clusters of low performing employees on the fringes of your Graph. Mohammed and Sarah belong to a low performing sub-community on the bottom left side of the Graph while Maria and Cesar belong to a low performing sub-community on the upper left side.
Performance Level is a categorical variable meaning that the colors used in color mapping fall nicely into categories. Quantitative variables behave a little differently and present colors as a range. Let's explore this. Find Creativity from the right sidebar and click the raindrop icon. Notice that the colors used are a spectrum going from deep purple to bright yellow. It seems as though our least Creative employees are grouped at the top left of the Graph.
Next to the raindrop icon representing color mapping on each of the variable charts is an icon representing size mapping. This is only available for quantitative variables and looks like 2 circles inside a larger circle.
Resizing nodes according to their values for a quantitative variable is a useful way to get a grasp of the values on either end of the quantitative range.
Let's use size mapping to get an idea of the employees with the highest and lowest values for Competitiveness. First, for the sake of clarity, reassign Umap_Cluster as the variable used to map colors on your Graph. Then, find the Competitiveness variable chart from your right sidebar and click on the size mapping icon.
Notice that the size of nodes inside your Graph have changes to reflect their value for the Competitiveness variable. Larger nodes are employees with a high Competitiveness score, whilst smaller nodes represent employees with low Competitiveness scores.
You can see that both Cluster 3 and Cluster 5 have lots of smaller nodes in comparison to the other clusters in your project. Using size mapping we managed to get an immediate picture of how Competitiveness scores were used to cluster groups of employees.
Right now, the labels in your Graph represent the name of employees. Whilst this is useful, by this stage we aren't very interested in who the employees are and are more interested in their characteristics.
There are two types of labels that you can place on your Graph; region labels and node labels. Region labels, as you might have guessed, mark areas of the Graph, whist node labels mark individual nodes.
Let's change the labels that appear above nodes to represent the Description of employees. Select the project settings icon from the top of your Graph. Then under the Labels heading select the dropdown representing node labels. Here, you can choose another variable to use as your node labels. Scroll down the menu and select Description. Then, save your settings and return to the Graph.
As you can see, rather than the Name of employees, their Description is now being used to create labels.
There aren't any region labels in your Graph right now. Let's change that.
Region labels that you add to your Graph should be categorical variables. This is because regions visually cover groups of nodes and categorical variables are a way of defining those groups. In the 'Fictional Employees' dataset, there are now 3 categorical variables; Gender, Performance Level and Umap_Cluster.
Since the Graph's clusters are nicely defined, it makes sense to use Umap_Cluster to define our region labels. Select the project settings icon from the top of your Graph. Then, under the Labels heading you should see a dropdown menu for region labels. Click inside the dropdown and choose Umap_Cluster from the menu list. Save the settings and return to the Graph.
Sure, the region labels have appeared but the Graph looks a little confusing now. There are lots of node labels alongside region labels. Let's hide the node labels to make things simpler. Select the Tags icon from the top of your Graph. Then, drag the slider all the way down so that all of your node labels disappear. Zoom out and admire the clarity of your region labels.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.