How We Hacked Private Data for a Social Experiment With Ballantines

December 17, 2020
Data News
Andy Clarke
Andy Clarke

There's a gap between what I do and what I think I do. Contradictions between how we perceive ourselves and the realities that are represented by our digital footprints have been laid bare by the release of "Me and My Other Me", a social experiment from Ballantines. A team from Graphext worked on the project with Juan Ramos-Cejudo, Associate Professor in Psychology, to excavate and interpret data concerning the behaviour of people on digital platforms, the internet and social media.

The experiment culminated with a film documenting face-to-virtual-face confrontations between four people and their digital selves, during which both are interviewed about their characteristics and habits. What results is an exposition of how the digital data that is gathered on us every day describes us in a very different light to how we might describe ourselves.

Interviewer: “Do you like showing off?”

Man: “Not really. I’d prefer people to see me as I really am.”

Avatar: “In 92% of the pictures I upload with someone, I’m next to a celebrity.”

The aim was to create a detailed and comprehensive digital portrait of participants.

Our team approached the experiment as any data analyst might; with scepticism and doubts about their ability to gain real insights into people's psychology using their digital data. Coming up with credible conclusions here required a significant investment of time, money and effort and the risk was that the data could reveal nothing. These concerns were shared by Maxi Itzkoff of Slap Global, the creative company that crafted the idea, who also feared that there might be "no contradiction between the person and their data". Worse still he worried that participants would feel embarrassed and leave. The stage was set, everything could go wrong.

"I thought about the

episode of Black Mirror

in which the husband is resurrected using his WhatsApp conversations. After watching it, I used Graphext to explore my personal data and learned things that I didn't know about myself."

Taking full advantage of new GDPR laws on data privacy, our team began collecting historical data on each participant from Google, Amazon, Netflix, Spotify alongside other digital platforms. This was then integrated with intricate sets of private data taken from sources including Google search history, private WhatsApp conversations and job searches on LinkedIn. The aim was to create a detailed and comprehensive digital portrait of participants.

The digital sources we used to gather data on participants.

To discover patterns in the behaviours of participants, information extracted from the services in the table above was transformed, processed and analysed in Graphext. Leaving no stone unturned, our team used a combination of machine learning techniques including clustering, natural language processing and building predictive models in an attempt to dig out the characteristics that were driving the online behaviour of participants in the experiment.



"Interpreting the data was another major challenge", said Ramos-Cejudo, whose knowledge of social desirability as a phenomenon in psychology was crucial in determining how to understand behaviour patterns that were shown in the data. He explains social desirability to refer to people who are more willing to seek approval and to be accepted than others. Both he and Victoriano were interviewed for a second film capturing a 'behind the scenes' perspective on the experiment.



The unexpected and striking revelations made in the experiment have triggered a move to investigate how tools like Graphext as well as the processes our team employed to analyze participant's data might be used as part of new studies supporting psychometric testing and diagnosis. Utilizing the data we scatter across the internet to cross-examine our digital behaviours and understand people's personalities is a relatively new field of research that has already seen promising studies. For instance, a paper from Michal Kosinski, a computational psychologist at Stanford University, has demonstrated that Facebook likes can be used to accurately predict sensitive personal attributes including religious and political views, intelligence, happiness and use of addictive substances.

Since finishing the project, our team are looking forward to working again with Juan Ramos-Cejudo, his clinic and data they continue to gather. Moving forward, working with larger samples of data will enable us to build better models, make faster diagnoses and ultimately develop people's awareness of themselves and their digital doppelgangers.

Additionally, with the intention of sharing our work so that it might be reproduced, very soon we will be sharing the Google Collab notebooks containing some of the scripts we used to preprocess data gathered on participants. Here's a sneak peek!

Our Google Collab script to preprocess WhatsApp messages gathered on participants.


The Data

Explore Yourself

Key Variables

Type of Analysis

Relevant Industries

Other stories

Ready To Get Started?

Ready To Get Started?

Let's dive into your data with Graphext. It's super simple, and you'll get your project ready in a few minutes.