Market basket analysis is behind most of our favourite shopping sites. Its a key technique used by e-commerce businesses and leverages data science and mathematical equations to focus the attention of customers on products that they might actually be interested in. Retailers use market basket analysis to uncover the associations between products.
This is done by looking for combinations of products that are often bought together. For instance, if a customer purchases milk and coffee beans in a single transaction - this would constitute an association between milk and coffee beans. Knowledge of product associations can enrich the way that businesses conduct internal analytics, offer deals or discounts as well as improving the product recommendations that are presented to customers.
Retailers use market basket analysis to uncover the associations between products.
Market basket analysis typically involves creating a set of rules to underline what an association looks like. It takes time and technical resources to develop these rules into an algorithm that can process datasets of products. This is why we built a market basket analysis flow into Graphext. To start analyzing the associations between products in a dataset with Graphext, choose Commerce → Basket Analysis as your analysis type, answer 4 questions about the structure of your dataset, execute the project and let the associations roll in.
This guide is intended to walk through the building of a market basket analysis project using Graphext. We will be working with a dataset of purchases made at a bakery in Edinburgh. The aim of the project will be to identify communities of products that are related to one another and to transform this information into useful business insights.
First, we need to upload the bakery transaction dataset to Graphext and inspect it. Make sure you have access to the dataset on your local computer. Then, start from the Datasets panel of your Personal team in Graphext. Select New Dataset and either browse for the file on your computer or drag and drop it into the Adding Dataset window.
Graphext should process your upload within a couple of minutes. Once it's ready, select it from the Datasets panel to start inspecting it. You should see 20,507 rows and the following 5 columns; Transaction, Item, date_time, period_day, weekday_weekend.
For market basket analysis, the variables we are most interested in are Transaction, a number grouping purchased items together (like a receipt), and Item, the name of the item.
To start building the project choose Commerce as your analysis type from the project setup wizard in your right sidebar. Then, choose Basket Analysis. You should now see two tabs; Data Enrichment and Settings. We don't need to enrich the data in this project so turn your attention to the Settings tab.
Using the dropdown menu under the first question inside of the Settings tab, set Transaction as the variable representing your order ID. As you answer questions using the project setup wizard, Graphext begins to calculate the steps needed to perform your analysis and will bring up the next set of questions to complete.
Next, make sure date_time represents your order date and choose Item as the variable you want to represent your product name or product id.
That is pretty much it with regard to setting up the project. The method is deceptively simple as most of the processing power goes on behind the scenes. Before selecting Next, you can change the configuration of association rules. By default, Graphext will calculate 5 related products with each product and only consider a product if it appears 10 or more times in the dataset. Feel free to change this configuration, but for the purposes of this guide, we will stick with the default options.
That's it. Click Next and name your project something like Market Basket Analysis. Executing your project will tell Graphext to build the model according to your instructions. It should take around 5 minutes to build your project.
Time to open up your project! From the Projects panel of your personal team, find the Market Basket Analysis project and open it.
The first thing you will see will be your network visualisation or Graph. Just by scanning the Graph, it's possible to discern some interesting product associations in this dataset. The color of nodes on the Graph represents the cluster that they are grouped in. These clusters represent communities of products that are frequently bought together.
One thing to note immediately is the size of the dataset. Rather than representing 20,507 rows of product purchases as there were when we set everything up, you can check the stats from the top of your left sidebar to see that there are only 58 data points in the project. This is because Graphext has aggregated your dataset. Instead of having each product purchase as a data point, we added a variable called Number of purchases, which contains the total number of times a product was purchased.
The Number of Purchases of Product variable currently controls the size of nodes in the Graph. This means that bigger nodes are more popular. It's clear to see that coffee and bread are the most frequently bought items in the dataset.
Bigger nodes are easy to spot but the difference between smaller and medium-sized nodes is not as clear. We can improve this by zooming in on a range of values using the Number of Purchases of Product sidebar variable chart, then applying size mapping to this smaller range for comparison.
Find the variable sidebar chart representing Number of Purchases of Product. Drag your cursor over the chart so that the active filter ranges from 0 - 1000. This should exclude and blur out the 4 most popular products in the dataset; Bread, Cake, Tea and Coffee.
With the filter active, click the 3 dots representing More Options from the top right of the variable chart. Then, click Zoom In. This will create a new variable chart called Number of Purchases of Products between 0 and 1000K. Now, use the size mapping icon to apply size mapping to this variable. The outliers will be excluded making it much easier to spot the difference in popularity between the bulk of products in the dataset.
Let's dive a little deeper into some of the associations Graphext created whilst calculating clusters. Clear all of your filters and reapply size mapping to the Number of Purchases of Product variable so that your Graph looks the way it did when you first opened the project.
Now, find the Cluster variable chart from your left sidebar observe that there are 5 clusters inside of the project, each with its own representative color.
Cluster 5 - colored purple - represents the smallest community of associated products in your dataset. To look closer at this cluster, select the bar inside of the Cluster variable chart representing Cluster 5.
Graphext will automatically zoom in on the three nodes inside of this cluster but we can't see the names of the products yet! Use the labels icon from the top of your Graph to bring up a slider to control the number of labels in your Graph. Increase the number of labels shown so that you can see that Cluster 5 contains Postcard, Tshirt and Valentine's card.
We can already start to form some kind of hypothesis with the associations between these three items but let's see if we can work anything else out from the characteristics of this cluster.
With the filter applied so that your Graph is still focused exclusively on Cluster 5, navigate to the right sidebar and select the Absolute / Relative dropdown menu from the top of the sidebar. Select Relative from the dropdown to change the values inside of your variable charts so that they represent values in proportion to the select that you've made.
Now, scroll through the variable charts inside of your right sidebar until you get to the Weekdays chart. Pause here and observe that the values inside of the chart reveal that the majority of Cluster 5 product purchases were made on Saturday.
This is an interesting discovery! Considering the nature of the products and the fact that they were mostly purchased together on a Saturday, it's possible that tourists were responsible for purchasing the majority of these products Let's save the chart as an insight for later. Click the three dots to bring up the More Options dropdown menu. Then, click Save as Insight to store this chart inside of the Insights panel of your project.
Selecting the neighbours of nodes inside of your Graph is a powerful way of recognising the products directly associated with a specific item.
Before you start selecting neighbours, clear all of your active filters so that your Graph represents 100% of your dataset.
Focusing on the Valentine's card product we can find out exactly which products are directly associated with this item. Select the Valentine's card node from inside your Graph. This should mean that only this data point is active inside your project.
Now that you've selected the item, a card should appear above the node giving you the option to select the neighbours of the Valentine's card product. Select this option.
After having selected the neighbours of this item, you can see that Graphext has highlighted the following 4 items that are directly related to this product. Postcard and Tshirt are not surprising as they are grouped inside of Cluster 5 as we already know. Cake and Salad are more surprising - especially Salad!
Using these associations we might start to draw up a plan such as offering discounts on Cake when customers buy Valentine's cards. Save the associations of Valentine's card as an insight so that you can revisit this analysis later. To do this, click the insights icon from the top of your Graph and name your insight something like Valentines Card - Direct Associations.
The aim of market basket analysis is always to translate insights found in your data into actions that can help improve your business operations. Let's take advantage of the product communities that have been calculated in this project to create a deal that might help this bakery encourage customers to buy related products that they might be interested in.
Sandwiches could well be one of the most expensive products that the bakery sells. In addition, customers will often want to buy sandwiches alongside other items at lunchtime making it a suitable candidate for product association.
Start by searching for 'Sandwich' using the search bar at the top of your Graph. This should highlight the node representing the Sandwich product. Now, clear your search so that 100% of your data is visible again and now select the Sandwich node.
Select the neighbours of Sandwich and take a look through the associated items. Observing the collection of related items, there seems to be a clear indication that sandwiches are often bought alongside drinks like Tea, Coffee, Juice, Smoothies and Mineral Water.
It would make sense to create an offer using these insights that gave customers discounts on drink orders if they purchase a sandwich. The data highlights that these products are frequently bought in association and it might only be a case of encouraging customers to purchase drinks at a cheaper price alongside their sandwiches in order to drive up sales of drinks items.
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.