Using R to Overanalyze my 2020 Spotify Playlists

Isabella Fons
14 min readJan 22, 2021

Spotify Wrapped is undoubtedly, one of my favorite parts of the year. Every December, I love seeing everyone around me get so excited about a data product, listening to the top songs of my closest friends, and self-indulging in the aggregations that defined my listening habits for the year.

But currently, Spotify Wrapped focuses on individual songs and tracks listened to by a user, rather than the playlists that a user creates. The thing though, is that I define periods and phases of my life by the playlists I created during them. Listening to a playlist I made junior year of high school instantly takes me back to studying for the SATs and riding the bus to cross country meets. I wanted to indulge a bit more and analyze the playlists, rather than the individual tracks, that defined my 2020.

Luckily, I found the spotifyR package created by Charlie Thompson. This package contained everything I needed to analyze my own playlists, as well as many other features of individual tracks as provided by the Spotify API, such as key, duration, and fun things like danceability.

For this project, I was heavily inspired by Simran Vatsa’s tayloR analysis, which dove deep into analyzing the Taylor Swift discography, as well as Caitlin Hudon’s Blue Christmas, which was a search to find “the most depressing Christmas song.” Evidently, there’s a lot of really interesting and unique things to do with the spotifyR package, and I’m excited to do more.

My full R code for the project, including how I cleaned the dataset, can be found here.

So, let’s dive into my own version of Spotify Wrapped.

The Questions

The spotifyR package provides an immense amount of information, and at first it was a bit overwhelming to decide exactly what information I wanted to extract from my playlists. To decide what to analyze, I came up with seven questions I wanted to answer over the course of this project. And, I did choose the number seven after Taylor Swift’s homonymous song:

  1. Which of my playlists is the most negative? The most positive?
  2. Which of my playlists is the most “danceable”?
  3. Which is the most energetic?
  4. Which of my playlists is the most cohesive?
  5. What was the happiest song I put onto a playlist? The most negative?
  6. Which song appeared across the most playlists?
  7. What songs are written in Waltz?

Negativity and Positivity

My first question asks about which of my playlists were the most positive and negative on average, which I was especially interested in this year given how emotional 2020 was. Luckily, the Spotify API gives tracks a valence score, defined as:

“A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).”

Here’s how I found my most negative and most positive playlists of the year:

all.playlists %>% group_by(pname) %>% summarise(mean(valence)) %>% 
arrange(dplyr::desc(`mean(valence)`)) %>% #tail(5) %>%
kable(.,col.names = c('Playlist', 'Mean Valence')) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1:5, background = "#181818", color = "#15b358")

The top 5 most positive playlists:

Intrestingly, my top 5 most positive playlists were not playlists I created to listen to often, but were instead curated for very specific purposes.

  • ‘back in the day’ is my Hannah Montana / High School Musical / Selena Gomez nolstalgia playlist
  • ‘rock on’ is 90s music— I made it after watching Little Fires Everywhere
  • ‘my roommates and i are watching a 224 video klaine playlist’ is my… giant Glee playlist that I made when my roommates and I were doing just that
  • ‘my tumblr music player that i found html for in 7th grade’ is 2014 tumblr. Marina and the Diamonds, Lana del Rey, Sky Ferreira.
  • ‘sam!’ is the playlist I made for my Big when she graduated

The bottom 5 most negative playlists:

Now these playlists are more quintessential of things I listen to on the daily. I am a pretty slow-paced, emotional kind of gal, so you will definitely find me reaching for these playlists more often. For context:

  • ‘written in waltz’ is my attempt at making a playlist full of songs written in three-quarter time. Late on, I’ll use R to simply… do this for me
  • ‘it smells like cross country and i hope it never leaves me!!’ is just 4 Ben Howard songs because I would always listen ot Ben Howard on the bus to cross country meets in high school and they remind me of fall. And the smell of fall will always remind me of cross country. It is interesting that one of my most negative playlists is songs I listened to on the bus on the way to cross country meets, but maybe that’s why I rarely raced well.
  • ‘coming of age tunnel: work edition’ is my coming-of-age film score playlist, filled with the scores from Call me By Your Name, Lady Bird, Little Women, and a few others!
  • ‘because it is there.’ is a playlist I made right after a big job interview I had in January 2020, before everything in the world changed. It’s title is a reference to the George Mallory quote.
  • It is actually pretty surprising to me that ‘arches’ is the most negative playlist I made in 2020, as it’s filled with most of my favorite songs such as Dog Years by Maggie Rogers and Stubborn Love by The Lumineers. It’s also called arches because I made it while doing work for my History of Architecture class. But after going back and checking the date I made it—March 22, 2020—it definitely makes sense that I was feeling down at that time.

Danceability

Which of my playlists is best to blast in the living room and jump up and down to with my roommates? Some ideas come to my mind, but that could be definitively answered with Spotify’s measure of danceability, a quantifiable variable described as:

“Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.”

Rock on! I found the most and least danceable playlists of 2020 using the same logic as valence:

all.playlists %>% group_by(pname) %>% summarise(mean(danceability)) %>% 
arrange(dplyr::desc(`mean(danceability)`)) %>% tail(5) %>%
kable(.,col.names = c('Playlist', 'Mean Danceability')) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1:5, background = "#181818", color = "#15b358")

The top 5 most danceable playlists:

These playlists are definitely not what I expected… I definitely thought my ‘back in the day’ Hannah Montana / HSM playlist that I jump around with my roommates to would be up there. But instead…

  • ‘gratitude attitude’ is one of my favorite playlists and is filled with feel-good bops from artists like Maggie Rogers, Taylor Swift, Clairo, and Florence + The Machine. It’s definitely more uptempo and the songs have a good beat strength I would say, so I can see how it was deemed my most danceable playlist
  • ‘rock on’ is again, my 90s song playlist. It’s mostly Alanis Morisette, so definitely ‘danceable’
  • ‘sam!’ is again, the playlist I made for my big. My big and I have distinctly opposite music tastes, so this is mostly country (?) and Hilary Duff.
  • ‘maggie rogers + chanel’ is the one that I absolutely did not expect to be on here, but it makes sense. In January 2020, when things were still normal, Maggie Rogers wore a vintage Chanel to the Grammys. So it was only fitting that I paired Chanel fashion show tracks with a corresponding Maggie Rogers song, resuliting in a unique blend of Maggie Rogers and Chanel. Given that Maggie Rogers is a pretty ‘danceable’ artist by Spotify standards, and that Chanel show tracks are pretty upbeat and have a good rhythym, I can see now how this playlist cracked the top 5 for most danceable playlists.
  • ‘new years eve 2019’ is a reference to songs I would like to play on the dancefloor at my wedding, becuase on New Year’s Eve 2019 I was at a wedding. Definitely makes sense tthat my wedding dancefloor songs are… danceable

The bottom 5 least danceable playlists:

The playlists here make sense for not being danceable, without going into detail for each of them since they are all essentially made up of the same songs but in a different order:

  • ‘sunsets in my bedroom,’ ‘arches,’ and ‘seabright’ are mostly just… Hozier
  • ‘floored,’ is one I made in December 2020, mostly of pretty little songs that Floor me.
  • ‘the weary world rejoices’ was my attempt at making a Christmas music playlist

Energy

I’m notorious amongst my friend group for listening to low-energy, soft music. From my perspective,I’m definitely a slow, coffee shop music type of listener. Which of my playlists broke my usual mold, and which followed it closely? I used Spotify’s measure of energy to find out. Spotify quantifies energy as:

“Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.”

I used the same logic I used for valence and danceability when finding my highest and lowest energy playlists:

all.playlists %>% group_by(pname) %>% summarise(mean(energy)) %>% 
arrange(dplyr::desc(`mean(energy)`)) %>% #tail(5) %>%
kable(.,col.names = c('Playlist', 'Mean Energy')) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1:5, background = "#181818", color = "#15b358")

Top 5 playlists with the highest energy:

Although in a different order… these are the EXACT same playlists as my top 5 most positive playlists. It does make sense, but interesting to see nonetheless.

The bottom 5 playlists with the lowest energy:

My lowest energy playlists, however, do differ from my most negative playlists, some highlights for playlists not described early:

  • ‘i miss pittsburgh’ is mostly… the Legally Blonde the Musical soundtrack. Not entirely sure how that relates to Pittsburgh but I think it was along the lines of I go to college in Pittsburgh -> Legally Blonde takes place in college -> I miss college
  • ‘everything left to come’ is a playlist I didn’t finish, but is mostly songs from the musical Parade, which my roommate from sophomore year loved
  • ‘jazz and clean sheets because there’s nothing else quite that good’ is an all time favorite! It is my favorite jazz classics—Ella Fitzgerald, Duke Ellington, Frank Sinatra. It’s the perfect low energy playlist for winding down at night.

Cohesion

I originally planned to ask about cohesion after seeing it in Vatsa’s tayloR analysis, mentioned earlier. However, Spotify does not provide its own definition of cohesion from its API, so cohesion would have to be defined on my own terms.

I decided to measure cohesion in terms of the range of the valence of a playlists’ tracks. By doing this, I am able to see playlists that lean all negative or lean all positive, as well as playlists that are a bit all over the place. I could also incorporate danceability and energy into this measure, but I just stuck to valence as it encompasses the emotion of a playlist, which is ultimately what I am most interested in.

To measure this, I used a custom function for Range then followed the same steps I had been using for valence, danceability, and energy above:

Range <- function(max, min){
abs(max - min)
}
all.playlists %>% group_by(pname) %>% summarise(Range(max(valence),min(valence))) %>%
arrange((`Range(max(valence), min(valence))`)) #%>% head(5) %>%
kable(.,col.names = c('Playlist', 'Range of Valence')) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1:5, background = "#181818", color = "#15b358")

The most cohesive playlists:

Unshockingly, my most cohesive playlists are quite small, with most playlists having 7–9 songs. While ‘it smells like cross country and i hope it never leaves me!!,’ ‘glue,’ and ‘dainty’ are all small playlists with similar soft music, I am most surprised that ‘written in waltz’ makes the cut for one of the most cohesive playlists. It was my attempt at making a waltz playlist—but just because a song is written in three quarter time doesn’t implicate whether if it is positive or negative. However, I mostly included Taylor Swift on that playlist, so it makes sense in hindsight.

The least cohesive playlists:

And unsurprisingly, my least cohesive playlists are the larger ones, with ‘cottage core’ having 91 songs. In depth:

  • ‘cottage core’ was made in the dog days of summer when the cottage core trend was taking over TikTok. I made a playlist of songs that remind me of what I would listen to while sewing my own clothes and baking bread, but then Taylor Swift released Folklore. And all of a sudden my once cohesive Sufjan Stevens-Bon Iver-The Lumineers playlist just became… Taylor Swift
  • ‘coming of age tunnel:work edition’ and ‘do u know’ are film score playlists, as mentioned earlier
  • ‘coming of age tunnel’ is essentially a film score playlist because it is songs from the trailers of coming of age movies. For reference, the coming of age tunnel refers to the Fort Pitt Tunnel which can be seen in the tunnel scene from The Perks of Being a Wallflower. It’s also the tunnel I drive through when coming back from the airport to college… very coming-of-age-y I would say

Happiest and Saddest Songs

Departing from analyzing playlists for a bit, I wanted to see which songs were the most and least positive that I added to any playlist. So, not necessarily the most and least positive song I listened to in 2020, but rather the happiest and saddest that were impactful enough to be included on a playlist.

To do this, I found the maximum and minimum valence from all the tracks that were on my 2020 playlists and found the track that had a valence equivalent to that value. Here is what that looked like for the most positive song:

mos.pos = all.playlists %>% filter(valence == max(valence)) %>% 
select(trackid)%>% get_track()
kable(data.frame(mos.pos$name,mos.pos$artist$name),
col.names = c('Song','Artist'), caption = 'Most Positive Song on a 2020 Playlist') %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1, background = "#181818", color = "#15b358")

‘Crayola Doesn’t Make a Color for Your Eyes’ by Kristin Andreassean is a song I heard at a dance competition when I was 10 years old… and it’s about the cutest song ever. With a beat more along the lines of a nursery rhyme, but words that describe how we feel about the eyes of our crushes, it’s a song that has stuck with me from my childhood to my 20s.

Similarly, this is what that looked like for the most negative song:

The most negative song on a 2020 playlist was Looking Forward by Jon Brion, which is from one of my favorite films, Lady Bird. However, I was finding that most of the songs with lower valence were coming from my playlist “coming of age tunnel: work edition,” which is filled with songs solely from coming-of-age movie film scores, which lean more negative, slow, and soft in nature. Looking Forward by Jon Brion was one of the songs on said playlist. I wanted to find the most negative song *not* on this specific playlist out of curiousity, which resulted in the following filter. I also filtered out a playlist called “do u know,” which is just a saved version of the Call Me By Your Name score:

min.not.film = all.playlists %>% filter(pname != 'coming of age tunnel: work edition' & pname != 'do u know') %>% 
select(valence) %>% min()
#the same track is on two playlists, hence the head(1)
neg.not.film = all.playlists %>% filter(valence == min.not.film) %>%
select(trackid) %>% head(1) %>% get_track()
kable(data.frame(neg.not.film$name,neg.not.film$artist$name[2]),
col.names = c('Song','Artist'),
caption = 'Most Negative Song on a 2020 Non-Film Score Playlist') %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1, background = "#181818", color = "#15b358")

This song, ‘Miroirs: III. Une barque sur l’ocean’ by Andre Laplante, also happens to be on a coming of age film score, this time from ‘Call me by Your Name.’ However, this song in particular made its way throughout many of my playlists, so it qualifies as my most negative song not on a solely film score playlist.

Song on the Most Playlists

I wanted to see which song was so important to me that I put it on the mostt playlists. I feel like this song would be distinctly different from my most listened to song of the year—which is given to me by Spotify Wrapped. The song that appears on the most playlist isn’t just a bop that I put on repeat every so often, but rather a song that invokes a wide range of emotions for me and is applicable in many different emotional contexts.

To find the song that appeared on the most playlists, I used a mode function from Stack Overflow and simply found the mode track ID from my data frame of all tracks and their corresponding playlists:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
most_common = get_track(Mode(all.playlists$trackid))kable(data.frame(most_common$name,most_common$artist$name),
col.names = c('Song','Artist'),
caption = 'Most Common Song on my 2020 Playlists') %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F) %>%
row_spec(row = 1, background = "#181818", color = "#15b358")

One of my all time favorite songs, Dog Years by Maggie Rogers, appears across the most playlists. For reference, my most played song of 2020 was Dog Days are Over by Florence + The Machine, while Rogers’ Dog Years was fourth. Both songs have ‘dog’ in the title, a fun litttle coincidence.

Written in Waltz

I adore when songs are written in waltz. I love the unique rhthym and dare I say…danceability of the time signature. A few months ago, I attempted to make a playlist called ‘Written in Waltz’ which featured all the songs I liked that were writtten in three-quarter time. This proved to be pretty difficult, and it ended up featuring songs that were blantantly written in waltz, such as Taylor Swift’s ‘Lover’ and ‘Dear John.’ However, the spotifyR package provides information about the most common time signature of each song. I decided to just use R to build this playlist for me, featuring songs I already know I will enjoy since I added them to playlists this year:

To do this, I filtered the songs that were written in three-quarter time and added them to a list:

waltz = all.playlists %>% filter(time_signature == 3)
w.ids = waltz$trackid
w.name = list()
w.artist = list()
for(i in (1:length(w.ids))){
w = get_track(w.ids[i])
w.name = append(w.name,w$name)
w.artist = append(w.artist,w$artist$name[1])

}
waltz.playlist = cbind(data.frame(song = unlist(w.name)),
data.frame(artist = unlist(w.artist)))
waltz.playlist %>% distinct %>% arrange(song) %>% kable(., caption = 'Songs Written in Waltz') %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = T) %>%
row_spec(row = 1:65, background = "#181818", color = "#15b358")

And here is the resulting playlist!

Because I was attempting to make this playlist anyways, and just figured how to use R to do it for me, you can find my waltz playlist here:

And there we have it! A complete deep dive into my own 2020 Spotify playlist habits using R. I absolutely loved doing this analysis to learn more about myself, the playlists I make, and of course the songs I like that are written in waltz.

I’m always happy to talk about Spotify, R, or both, so please reach out at any time!

Cheers,

Isabella

--

--