July 5, 2021
Using Mutual Information to Cluster Variables and Discover the Associations Between Survey Questions
Our team set out to build a type of analysis that could be used to measure the strength of association between variables in a dataset. Here's how we did it ...
June 29, 2021
A Market Segmentation of 1000 Supermarket Customers Using Data on Sales, Income and Demographics
Our team clustered 1000 supermarket sales in order to segment customers according to their buying habits. Our market segmentation analysis uses data on the demographics, income and geography of customers to identify key buyer personas and inform marketing strategies and campaigns.
June 11, 2021
Graphext | Graphtex | Graphnext: Grouping Similar Spellings Using Chars2Vec and Agglomerative Clustering
'España' and 'Españha' are just spelling variations. We built a way of grouping words spelt differently but referring to the same concept.
June 8, 2021
The Method Behind Our Investigation of Reports of Adverse COVID-19 Vaccine Events
Taking on an investigation into the adverse reactions associated with the COVID-19 vaccination rollout in the USA, our team were aware of the increased need for transparency whilst conducting our analysis. This article documents the methodology behind our study of Vaccine Adverse Event Reporting System (VAERS) data.
June 8, 2021
Conspiracies, Complexity and Clustering: Investigating Reports of Adverse COVID-19 Vaccine Effects
Modelling data from the Vaccine Adverse Event Reporting System (VAERS) - a US government-sponsored vaccine reaction monitoring service - our team set out to investigate reports of adverse health effects related to the seismic rollout of the COVID-19 vaccination programme in the USA.
May 6, 2021
Good Risk vs Bad Risk: Deconstructing the Features of 1000 German Loans
Attempting to discover the most influential features of a loan application when considering risk, our team built a model using the features of a loan application to predict whether an applicant would have a good or bad risk rating.
April 26, 2021
Jake's Project: Investigating the Data Behind a Good Day
Andy and María meet with Jake to talk about a dataset he's building about himself. From skating to people he sees to whether he flosses or not - Jake's data offers a unique and deeply personal insight into his life. But what makes the difference between good and bad days?
April 16, 2021
Simple Solutions to Prevent Customer Churn
Our team analyzed 7043 current and former customers of a telecoms provider in order to better understand what types of people are most likely to cancel their contracts.
April 7, 2021
How Data Can Help You Keep Your Workers
To showcase how a company could reduce employee turnover, our team clustered a dataset containing information about IBM employees to discover the reasons why employees left their jobs.
March 29, 2021
Menhir & Graphext: Analyzing the Intangible Value of Financial Assets
Working at the intersection of data science and finance, Menhir is using Graphext to understand the composition of financial portfolios, performing analysis that typically takes analysts between two and three weeks in just two days.
March 24, 2021
The Moneyball Method: Using Data to Build a Football Dream Team (On a Budget)
Our team set out to build an exceptional football team for less than 100M Euros. Using data provided in the FIFA 2020/2021 dataset - the video game - we built a prediction model in order to find the key performance attributes for each position. Then, we used this to pick out a team of excellent but undervalued players.