Andy and María meet with Jake to talk about a dataset he's building about himself. From skating to people he sees to whether he flosses or not - Jake's data offers a unique and deeply personal insight into his life. But what makes the difference between good and bad days?
After building a prediction model to analyze Jake's year in data, the team dive deep into looking at the factors most likely to distinguish Jake's good days from his bad ones. They investigate the suspicions that Jake himself had about the data and bust a few myths along the way.
Did working out help Jake have a good day? Why was he spending so much time at the skatepark? What actually is his favourite TV show? Who did he see on good days? Why were there so many bad days in Winter?
About the Data
During our conversation, Jake spoke about why he has begun to create a data diary cataloguing the minutia of his life. After finishing a course in data journalism and visualisation, he found himself without the luxury of a steady stream of datasets helping him to practise his newly acquired analysis skills.
Jake documented who he saw, the weather, the season, the activities he did and whether he worked out or not.
Instead of throwing in the towel - or mining his way through Kaggle - Jake opted for a more personal approach and started to record information about his life using an Excel sheet. With each row in the table representing one day, Jake documented who he saw, the weather, the season, the activities he did and whether he worked out or not. As the year grew older, so did the number of variables he diligently monitored each day.
What's left is a fascinating insight into the life of a young Canadian before, during and - as he continues to extend the dataset - after the pandemic.
About the Project
Seeing that Jake was already ploughing his way through pivot tables on TikTok, Maria spotted the opportunity to conduct a powerful prediction project using Jake's data. The team started the investigation with the hope of understanding more about which features of Jake's day were most strongly related to the rating variable - whether he had a good or bad day.
It was possible that the model would reveal to Jake the people or activities that he should avoid if he wants to have a good day.
In an attempt to reverse engineer the dataset, we built a prediction model setting each of Jake's observations about his own experience as factors and the rating variable as the target. In this way, we hoped our model would be able to identify the strongest relationships between the target and factors of the dataset and perhaps even point out to Jake - people or activities that he should avoid if he wants to have a good day.
Then, arranging a blind date between Jake and the Graphext project we built, the team interviewed Jake about his project. Check out our conversation to see how Jake's year evolved along with his reasons for a good day or bad day.