This tutorial will show you how to upload your first dataset to Graphext, inspect it and then manage it within your Graphext workspace. We will be working with a dataset called 'Fictional-Employees', which contains information about the performance and characteristics of a set of 100 imaginary employees. Download the 'Fictional-Employees' dataset here.
"In God we trust. All others must bring data."
-W. Edwards Deming
Datasets provide the raw materials for you to analyze in Graphext. They are organised into rows and columns. Each row contains one data point that typically has a value for each column in your dataset. Columns are also known as variables and hold a specific type of information on each data point.
To work with datasets in Graphext, upload them to the workspace of a specific team. This can be your personal team or a workspace you share with other team members. Datasets that you upload will only be visible to Graphext users within that team.
'Fictional-Employees' is a CSV file consisting of 101 lines of data separated by commas. The first line contains our column or variable names. Each subsequent line represents one employee. Each line or row in the CSV file has a value for each one of the variables listed in the first line of the file.
There are different types of variable in this dataset. When you upload a dataset to Graphext, variable types are automatically recognised giving you the ability to perform different actions on different types of variable. For more on this, see our article on variable types.
Categorical Variables: 'Gender', 'Performance Level'
Quantitative Variables: 'Coachability', 'Curiosity', 'Work Ethic', 'Intelligence', 'Prior Success', 'Passion', 'Preparation', 'Adaptability to Change', 'Competitiveness', 'Creativity'
Text Variables: 'Name', 'Description'
Make sure you have downloaded Fictional-Employees.csv or have another dataset ready to upload. Although we are working with a CSV file here, you can upload a range of different dataset file types to Graphext. For more information on this see our article on supported file types.
Start from the datasets panel of your personal team's workspace and select new dataset from the top right of the screen.
This will open a window prompting you to add a dataset. Since you are uploading 'Fictional-Employees.csv' from your computer, select 'Browse File'.
Next, locate the file inside your computer's folders and click 'Open'. This will automatically close the window and the dataset will be uploaded to your personal team's workspace.
Now you've uploaded your first dataset to your personal team in Graphext it will be available for you to inspect and explore from the datasets panel of your workspace. Select the name of the dataset, 'Fictional Employees', from inside of the datasets panel.
This brings up a table with the columns of your dataset displayed at the top of the table and the values for each row displayed underneath each column name. Scroll up and down this table to explore every value in your dataset.
Additionally, underneath the colored bar for each variable there is menu currently displaying the type of variable represented in this column. Graphext automatically configures your variable types when you upload a new dataset.
Notice that the table also includes a colored bar underneath your column or variable names. This bar represents the likelihood that all of the values inside of a column are listed with the correct variable type.
Lastly, before you move on to work with your data it is important to check the number of columns and variables that are present. This information is displayed at the top of the datasets panel. 'Fictional Employees' has 100 rows and 14 columns.
Still working inside of 'Fictional Employees' within the datasets panel of your personal team, inspect the type of each variable. Spot anything strange? Description is listed as a categorical variable. This is incorrect since the Description variable holds text values.
Graphext has calculated that this variable is categorical because our dataset is relatively small and the employee descriptions do not contain much information. But descriptions aren't categorical, they are text values! We can change this.
Select the variable type dropdown for the Description variable. From the menu list, click Text. Graphext will now consider values inside the Description column as text values rather than categorical values. Ensuring that a variable type correctly represents the type of values in that variable is important as the type of a variable controls the way that you can filter it inside of a project.
When your inspecting a dataset with lots of variables, you want to keep things in a logical order. The most important variables
In 'Fictional Employees', the 'Performance level' variable is arguably the most important column but in our table its difficult to see it because it is the last column.
You can move columns to different positions inside of the table by clicking and dragging the 8 dots on the left of the column name.
Move 'Performance level' so it's next to the 'Name' variable. We want to get a better idea of exactly who is performing well.
The three dots next to each column name give you the ability to sort your data according to values inside of a variable. This changes the order in which rows are displayed. It's often useful to be aware of the extreme values in a dataset so that we can consider this during our analysis and quickly spot or overcome any errors.
Find out who are the most passionate employees in our dataset. First drag the 'Passionate' column to the left of the table so we can see it at the same time as the 'Name' column.
Next, click the three dots on the right of the variable name and select 'Desc'. Now your data is ordered with the most passionate employees at the top of your table and the least passionate at the bottom.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.