Each year, the Information Design and Visualization class at Penn works with an outside organization to create visualizations for actual data sets. This year, the organization was the Positive Psychology Center at Penn, which wanted to find new ways of looking at the data it collected through its World Well-Being Project.
As a brief introduction, the World Well-Being Project is an analysis of the language used by tens of thousands of Facebook users in their statuses. In addition to offering their statuses, Facebook volunteers also gave demographic information and took a personality test to measure their “Big Five” personality traits (“OCEAN”: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). Researchers used this data to determine which words are most highly correlated with and used most frequently by which traits and demographics. For instance, there is a stong correlation between using words like “<3,” “shopping,” and “excited” and being female. Conversely, words like “fuck,” “wishes he,” and “ps3” have strong correlations with being male.
The researchers already generated word cloud visualizations of the data (example below), but they challenged us to take different approaches and create different designs with the same data in order to see if other insights might result.
In my case, I was interested further examining the words themselves, not for their meaning but for their sounds. Sure, “fuck” is strongly correlated with being male because men tend to curse more, but is there a reason why men gravitate toward the word? Why this curse word in particular instead of others? The word’s meaning no doubt is major reason, but I also wondered about preference for certain sounds.
Unfortunately, it’s difficult to break up words into sounds without an extensive linguistics background, so I settled for the next best alternative: letters. What letters and characters do people of certain demographics and personality traits use more often? And what better way to visualize letters than with a keyboard – how would the keyboards of these different types of people differ?
I took the top 100 highest correlated words with each gender and each of the OCEAN traits and counted the number of times each letter and character appeared. Then, I mapped it to a keyboard in order to show which keys are used the most and which keys are rarely touched at all.
However, it’s difficult to discern the subtle differences in shading (3, 6, and 0 are actually colored in in the picture above, but I doubt you can tell even after I’ve mentioned it). So instead I chose to use a cartogram, wherein the size of the key is proportional to the frequency with which it is used.
To determine the size of each key, I divided the number of times that the letter/character appeared by the total number of letters and characters in the list of top 100 words. Then for all the letters, I compared this frequency with the dictionary frequency (the proportion with which the letter appears in the dictionary) in order to see if certain types of people favor (red) or avoid (blue) certain letters more than average. I also pulled out words from the list of 100 that were exemplary of why the keyboard looks the way that it does.
A few interesting insights resulted:
- Females favor letters with softer sounds, like s, m, y, and h
- Males favor letters with harsher sounds, like c, n, and k
- People who were more open post lots of quotes, hence the mysterious words of .”- and : “
- For whatever reason, a lot of tagalog (sana, wala, naman) is highly correlated with being less open, which put an unproportional emphasis on the a
- More conscientious people use r–e–a–t–g–h–y in combination, in words like ready and great
- Less conscientious and less extraverted people use lots of emoticons involving the >, –, and : keys
- Less conscientious people also share a lot of youtube videos, as almost a quarter of the top 100 words involved part of a youtube URL
- More agreeable people heavily favor the letter a
- A full quarter of the top 100 words correlated with less agreeable people was a variation of fuck
- More neurotic people share other posts frequently, as 12/100 words involved put this as your status
See the full presentation of the methodology as well as comparisons between the genders and traits here:
For a more detailed look at individual traits and genders, look here: