Start filtering your data by interacting with the sidebar charts that represent your variables.
Filters affect what data is shown in your Graph, Trends and Details panels. Filtering is a useful way of zooming in on aspects of your data and offers a free-flowing way to investigate details behind specific segmentations.
Depending on the type of analysis you are doing, Graphext will often extract text from your dataset. Extracted terms appear as a variable in the left or right sidebar with a list of the words and phrases presented in order of frequency under the variable title.
Text filters can be used to create segmentations of your data featuring specific words or phrases. This might be useful if you are working with a dataset of customer reviews and want to focus on reviews that mention your infamous "pumpkin spiced latte".
You can reorder extracted terms in your variable list using the up-down arrow icon. Search for a specific term if a variable list is really long.
Quantitative variables represent data expressing a certain numerical quantity, amount or range. Height and weight are good examples of quantitative variables. Applying a quantitative filter will restrict the data shown to those whose value for that variable falls within your filter.
Drag your mouse inside the variable filter chart to create a filter range. Pull the boundaries left or right to adjust them. Click on the number above and below a filter range to explicitly set the filter's upper and lower boundaries respectively.
Categorical variables represent types of data that are divided into groups. Nationality is a good example of categorical data. Filtering categorical data will select all data points belonging to a certain group. This allows for a closer examination of the data in that group.
Categorical variables are represented in bar charts within your project's sidebars. You can switch this display to view them as a list. To include more than one category in your filter, hold down shift whilst selecting.
So you have created a filter and now you want to save the data inside of it as a segmentation?
Saving filters creates new segmentations of your data. These groups become a new variable which you can use to recreate your filter, find trends within its groups or to create new, more detailed segmentations.
Segmentations are a useful way of organising your data into important groups.
You can delete segmentations that you've created using the more options dropdown menu belonging to each variable chart.
So you have created a filter but the data inside is useless?
Deleting selections will remove unwanted data points from your project. Its a two-step process which recreates your project without the data you have deleted.
After making a selection using filtering, you can automatically cluster the data that lies within the selection. This involves breaking your selection up into smaller and more precise groups. This can be useful in identifying specific patterns within sub-communities and is a good way of inspecting your selection in greater detail.
When editing your segmentation, use the magic wand icon to reconfigure the strength of connections between data points in your newly clustered selection.
When you are working with quantitative filters, you control the range of data displayed inside of your project. Sometimes values in quantitative variables can be bunched together. Moreover, you might want to pick out a smaller range and examine it in greater detail. Using the dropdown menu for a quantitative variable sidebar chart, you can zoom in on small value ranges.
Clicking Zoom In whilst you have an active quantitative selection means that Graphext will create a new variable containing only the range of values inside of your selection. Inside of your zoomed in variable, your values will be distributed across a larger range of bars meaning that you can inspect them with greater precision.
When you have lots of options in your variable sidebar charts, filtering can become confusing. If there are hundreds or thousands of text or categorical filters to choose from, it can be difficult to find the value you need.
You can sort the filter options in your variable sidebar card so that more relevant options are presented first. There are 4 ways of sorting the categories or text filter values belonging to a variable; by everything, by selection, by uplift and by TF-IDF. To sort filter options, select the up and down arrow icon from the top right of a variable card.
Sorting by everything means that the categories or text values appearing at the top of your list or chart will be the ones most frequently appearing within your entire dataset. This order will remain consistent despite any selections that you make.
Sorting by selection means that the categories or text values appearing at the top of your list or chart will be the ones most frequently appearing within your selection. The order of the list will update dynamically for any new selections that you make.
TF-IDF, or term frequency-inverse document frequency, is a method of sorting your values that is intended to reflect the importance of a single category or text value in relation to the entire set values belonging to a variable. Sorting by TF-IDF means that categories will appear at the top of your list if they are found more often in your selection than would be expected from their occurrence in the whole dataset. The more over-represented a category in your selection, the higher up it will appear in your list.
Uplift measures the percentage change in frequency of a category in your selection relative to its frequency in the whole dataset. Sorting by uplift means that the values in your variable list or chart will be presented in order of the biggest difference between the number of times a value appears in the whole dataset and the number of times that it appears in a selection you make. Values that don't appear often in the entire dataset but are frequent in your selection will be the first presented.
You can discard categories from your variable to reduce the number of filter options available to you. This can be particularly useful when you are sorting filter options using TF-IDF or uplift.
When you discard filter options, Graphext will ask you to set a threshold to limit the number of filters presented in your variable sidebar chart or list. This threshold means that filters appearing in a number of nodes smaller than your threshold will not be presented. For instance, you can specify that your filter should only contain values that appear in 10% or more of your data points. This results in your filter list being reduced to present only the values that meet this criteria.
Discarding from everything means that the threshold you set will match up against all values in your dataset regardless of any filters that you apply.
Discarding from your means that the threshold you set will update dynamically depending on the number of values in any new selection you make.
There are two ways to clear filters. Either use the 'Clear' button at the top of your left sidebar to remove all filters. Alternatively you can clear a specific filter using the 'Clear filter' icon within the sidebar variable card.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.