Testing out our brand spanking new integration with Hugging Face models for NLP, we analyzed speech from characters in all 9 series of the US Office. Added into our Graphext project, the language models focused on classifying the dialogue of Michael, Dwight, Pam, Jim, Daryll and all the other characters according to the detection of sentiment, emotion, offensive language, irony and hate speech.
Bears. Beets. Battlestar Galactica. There are 54,626 lines of dialogue in The Office (US). You'd need 74 hours (3.1 straight days) to binge the whole 9 series in one go. That's 9.3 working days of comedy gold or slightly less depending on whether you remember to skip the torturous intro theme.
30% of The Office dialogue is ironic. 20% of it is positive - 20% is negative. Joy is the strongest emotion in the series and offensive language features in 6% of all lines. How do we know this? We analyzed what characters say with Graphext and applied Hugging Face NLP models to classify dialogue from the hit series as part of our project.
"I'm not superstitious but I am a bit stitious."
- Michael Scott, Season 4, “Fun Run”
Variable Sidebar Charts: The distribution of intent in 54,626 lines of dialogue from The Office (US).
1. The Most Negative Character
Stanley Hudson is the most negative character in The Office! But thinking about it ... who else could it be? A third of Stanley's 671 lines are filled with some form of moaning, misery or mumbling about his co-workers. 5% of the time, Stanley's grumbles are aimed at Michael - other times he's complaining about his finances, Dunder Mifflin's incompetent sales team or something to do with food.
Our text analysis also suggested that Stanley is also Dunder Mifflin's most angry employee.
Sentiment Analysis: The sentiment of lines delivered by Stanley Hudson.
“I wake up every morning in a bed that's too small, drive my daughter to a school that's too expensive, and then I go to work to a job for which I get paid too little, but on Pretzel Day? Well, I like pretzel day.”
– Stanley Hudson, Season 3, “Initiation”
According to the sentiment predictions made by the Cardiff NLP model - Kelly, Angela and Meredith are among the next most negative characters.
2. The Most Positive Character
After her appearance as the new receptionist in series 5, Erin bursts into the series as The Office's most positive character. 23% of her dialogue is positive and she is more often than not expressing some form of joy in her lines ... even when she is contending with the challenges of her troublesome relationship with Gabe.
Sentiment Analysis: The sentiment of lines delivered by Erin Hannon.
"I eat lunch in the car now. It's my alone time. It's just nice to have some time away from Gabe."
- Erin, Season 7, "Michael's Last Dundies"
But Erin's positivity isn't unrivalled. Both Andy and Michael closely follow her in the pecking order of positivity.
3. The Most Offensive Character
The results of the offensive language detection model were especially interesting to us, given that comedy shows almost always contain some form of offensive language and it's often these blurred lines between what's acceptable to say and what's not that get the most laughs. In short, offensive characters can make us laugh out because we are uncomfortable.
The Office has a number of prime candidates for the show's most offensive character. Dwight, Jan and Kelly were strong contenders here but it is Todd Packer - best known for his sleazy one-liners - who claims the title.
"Where is Michael Snot? Sniffing some dude's thong? Probably."
- Todd Packer, Season 2, "Sexual Harassment"
Offensive Language Detection: Offensive language in lines delivered by Todd Packer.
Incredibly, Packer only has 66 lines in the whole 9 series but 20 of them were classified as offensive. Most of these were in series 2 (2005) and take the form of derogatory comments towards women in some way and - unsurprisingly - are less likely to have featured in later series.
4. The Saddest Character
One of the most complex personalities in the show, Jan is also the saddest employee at Dunder Mifflin. 28% of Jan's dialogue expresses sadness and no wonder considering the journey her character takes throughout the 9 series, falling from a high flying management position to a crazed self-employed candle maker.
A quarter of Jan's lines containing some kind of sadness refer to Michael.
Emotion Detection: The emotions detected in lines delivered by Jan Levinson-Gould
5. The Most Joyous Character
When you think of joyous moments in The Office, you think of Pam and Jim's relationship or Michael's motivational speeches ... you don't think of Creed. But the office oddball is unwaveringly joyous - more so than any of the other characters in the show. And it checks out ...
Just under 50% of Creed's lines were classified as joyous and the emotion is consistent throughout almost all seasons of the show. Perhaps because of his eccentric manner and apparent disassociation with the office's goings-on, Creed often pops into an episode for a moment, spreads some joy around and then retreats to his zany corner.
"Cool beans, man, I live by the quarry. We should hang out by the quarry and throw things down there."
- Creed Bratton, Season 5, "Frame Toby"
Emotion Detection: The emotions detected in lines delivered by Creed.
What About Michael & Dwight?
It's true ... neither of the public's most celebrated characters are considered to be the show's most extreme. In part, this could be because they are the two characters with the most lines, therefore have the opportunity to express the most diverse - or complex - emotions.
For all of the energy that Dunder Mifflin's regional manager exudes on screen, he is one of the more well-balanced characters in the show. Compared with his co-workers, Michael more commonly expresses anger or optimism. He's slightly more offensive than the average character but surprisingly less ironic.
“Would I rather be feared or loved? Easy. Both. I want people to be afraid of how much they love me.”
- Michael Scott, Season 2, “The Fight”
Variable Sidebar Charts: The defining features of Michael's dialogue.
Perhaps it's the constant pranks that Jim plays on Dwight or the frustration he encounters each time he goes for the assistant regional manager position. Either way, Dwight is one of the angriest characters in the office. He's also more negative than most of his peers and more offensive too!
“I signed up for Second Life about a year ago. Back then, my life was so great that I literally wanted a second one. Absolutely everything was the same…except I could fly.”
- Dwight Schrute, Season 4, "Local Ad"
Variable Sidebar Charts: The defining features of Dwight's dialogue.
Analyzing Language with Hugging Face Models
These few examples of models that detect intent within text give a taste of the models from Hugging Face but barely scratch the surface of what is on offer. We've integrated Hugging Face with Graphext to make your text analysis much more powerful.
Browse the full list of models here and start deploying them by adding the code snippet below to your project recipe.
}) -> (ds.hf_sentiment)
When pasting the snippet into your project's code editor, make sure the input column points to the text you want to analyze. Then, add the correct model name as the "model" parameter. To learn more about adapting your project to use Hugging Face NLP models, check out this article in our technical documentation.
Special acknowledgement to the Cardiff NLP team who developed the models used in this investigation! Check out more of their models here.