Setting Up

This tutorial will show you how to upload your first dataset to Graphext, inspect it and then manage it within your Graphext workspace. We will be working with a dataset called 'Fictional-Employees', which contains information about the performance and characteristics of a set of 100 imaginary employees. Download the 'Fictional-Employees' dataset here.

"In God we trust. All others must bring data."

-W. Edwards Deming

Datasets provide the raw materials for you to analyze in Graphext. They are organised into rows and columns. Each row contains one data point that typically has a value for each column in your dataset. Columns are also known as variables and hold a specific type of information on each data point.

To work with datasets in Graphext, upload them to the workspace of a specific team. This can be your personal team or a workspace you share with other team members. Datasets that you upload will only be visible to Graphext users within that team.

About the Data

'Fictional-Employees' is a CSV file consisting of 101 lines of data separated by commas. The first line contains our column or variable names. Each subsequent line represents one employee. Each line or row in the CSV file has a value for each one of the variables listed in the first line of the file.

There are different types of variable in this dataset. When you upload a dataset to Graphext, variable types are automatically recognised giving you the ability to perform different actions on different types of variable. For more on this, see our article on variable types.

Categorical Variables: 'Gender', 'Performance Level'

Quantitative Variables: 'Coachability', 'Curiosity', 'Work Ethic', 'Intelligence', 'Prior Success', 'Passion', 'Preparation', 'Adaptability to Change', 'Competitiveness', 'Creativity'

Text Variables: 'Name', 'Description'

Download the dataset here.

CODE: https://gist.github.com/andyclarkemedia/0ef95e5536beb80e309cd9a6d4ea3c88.js

Step 1. Upload

Make sure you have downloaded Fictional-Employees.csv or have another dataset ready to upload. Although we are working with a CSV file here, you can upload a range of different dataset file types to Graphext. For more information on this see our article on supported file types.

Start from the datasets panel of your personal team's workspace and select new dataset from the top right of the screen.

This will open a window prompting you to add a dataset. Since you are uploading 'Fictional-Employees.csv' from your computer, select 'Browse File'.

Next, locate the file inside your computer's folders and click 'Open'. This will automatically close the window and the dataset will be uploaded to your personal team's workspace.

Things to Consider

  • Graphext sets a limit of 300MB for datasets. Before you upload a dataset, check its file size. If it's larger than this limit, remove some variables to reduce the size of the file.
  • You can combine multiple datasets as you upload them to Graphext. For more information on this see our article on loading datasets.
  • Larger datasets will take longer to upload and process. 'Fictional-Employees.csv' is a small dataset and will upload very quickly.

Step 2. Inspect

Now you've uploaded your first dataset to your personal team in Graphext it will be available for you to inspect and explore from the datasets panel of your workspace. Select the name of the dataset, 'Fictional Employees', from inside of the datasets panel.

This brings up a table with the columns of your dataset displayed at the top of the table and the values for each row displayed underneath each column name. Scroll up and down this table to explore every value in your dataset.

Variable Types

Additionally, underneath the colored bar for each variable there is menu currently displaying the type of variable represented in this column. Graphext automatically configures your variable types when you upload a new dataset.

Notice that the table also includes a colored bar underneath your column or variable names. This bar represents the likelihood that all of the values inside of a column are listed with the correct variable type.

Row and Column Count

Lastly, before you move on to work with your data it is important to check the number of columns and variables that are present. This information is displayed at the top of the datasets panel. 'Fictional Employees' has 100 rows and 14 columns.

Step 3. Manage

Changing Types

Still working inside of 'Fictional Employees' within the datasets panel of your personal team, inspect the type of each variable. Spot anything strange? Description is listed as a categorical variable. This is incorrect since the Description variable holds text values.

Graphext has calculated that this variable is categorical because our dataset is relatively small and the employee descriptions do not contain much information. But descriptions aren't categorical, they are text values! We can change this.

Select the variable type dropdown for the Description variable. From the menu list, click Text. Graphext will now consider values inside the Description column as text values rather than categorical values. The bar will now change to red but this is only because the text values in our Description column are short. Ensuring that a variable type correctly represents the type of values in that variable is important as the type of a variable controls the way that you can filter it inside of a project.

Reordering

When your inspecting a dataset with lots of variables, you want to keep things in a logical order. The most important variables should be on the far right of the dataset table so that you can see them! In 'Fictional Employees', the 'Performance level' variable is arguably the most important column but in our table its difficult to see it because it is the last column.

You can move columns to different positions inside of the table by clicking and dragging the 8 dots on the left of the column name. Move 'Performance level' so it's next to the 'Name' variable. We want to get a better idea of exactly who is performing well.

Need Something Different?

We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.