Taylor Swift Data Analysis : Is Taylor Swift’s Song Making Your Mood?

R Taylor Swift Data Analysis

Taylor Swift Data Analysis : Is Taylor Swift’s Song Making Your Mood?

To conclude my data analytics professional certification learning path, I will do my own case study projects. We all know that Taylor Swift’s new album “Midnight” was released on October 21, 2022, therefore I chose to analyse a Taylor Swift song on Spotify. Hi, Taylor. Look what you made me to do: I conducted a R Taylor Swift Data Analysis study of your masterpiece. Lets connect with me on Kaggle.

Disclaimer Because this is my first case study, the analysis will not be as gorgeus as it might be, but I will try. To conduct this analysis project, I inspired by [Rhisika Panwar],Joy Pham, and Aneri Dalwadi. I will do my best to describe the data analysis process. This project also allows me to develop my R fundamentals and data analytics skills.

Limitations This analysis project have limited to Taylor Swift Songs Streams on spotify 2017-2021


Taylor Swift has 33.2 billion streams and 58.4 million monthly listeners, making her the Top 10 greatest artist in Spotify history. Taylor Swift fans come from all over the world and eagerly await the release of her new album every year. To learn more about what Taylor Swift’s listeners are like based on the songs’ characters.

Before we jump into song character analysis, here is the explanation for each variabel song characters (source spotify develover ):

Danceability: The danceability of a song is driven by a variety of musical aspects, including pace, rhythm stability, beat power, and overall regularity. A value of 0.0 is the least danceable, while 1.0 is the most danceable..

Energy: A measure ranging from 0.0 to 1.0 that represents a perceptual measure of intensity and activity. Typically, energetic tracks have a fast, loud, and noisy feel to them. Death metal, for example, has a high energy level, whereas a Bach prelude has a low energy level. This attribute is influenced by perceptual characteristics such as dynamic range, perceived loudness, timbre, onset rate, and general entropy.

Valence : A scale from 0.0 to 1.0 that describes the musical positivity conveyed by a track. Tracks with a high valence sound more positive (for example, happy, cheerful, and euphoric), whereas tracks with a low valence sound more negative (e.g. sad, depressed, angry).

Tempo : A track’s overall estimated tempo in beats per minute (BPM). Tempo is the speed or pace of a piece in musical terms, and it is derived directly from the average beat duration.

Popularity : The track’s popularity. The popularity is determined by an algorithm and is based primarily on the total number of plays the track has received and how recent those plays are. In general, songs that are being played a lot now will be more popular than songs that were played a lot in the past. Track popularity is used to calculate artist and album popularity. It is important to note that the popularity value may lag behind actual popularity by a few days because it is not updated in real time

Ask Phase

Is it suitably danceable? Is it boosting the listener’s happy vibes? Let’s split those questions down into concerns.

  • Top 10 Taylor Swift Songs Streamed Worldwide 2017-2021
  • Listener preferences for the top ten mostTaylor Swift songs based on music character
  • Taylor Swift’s Top 10 Most-Streamed Songs’ Song Character and Total Streams Correlation

Prepare Phase

Taylor Swift’s Songs Stream Data on Spotify 2017-2021 and Taylor Swift Data Single and discography are the data sources for this Taylor Swift Analysis project. The dataset is hosted on GitHub and Kagle. Because the Taylor Swift song stream data on Spotify from 2017 to 2021 is so enormous (3 Gigabytes), I use Powerquery to extract the dataset into the needed dataset by converting the csv data set file into a connection file (I don’t have access to mySQL yet, and the SQL sandbox has a restricted dataset size).  To access the raw datasets, go here, SpotifyCharts Dataset and Taylor Swift Song Characteristic dataset.

Read : Negotiation Case Analysis For SMEs

Process Phase

During the process phase, Excel (powerquery), Tableu, and R will be used to cleanse, aggregate, analyze, and visualize data. This R Taylor Swift Data Analysis made use of the following R packages and libraries:

  • dplyr
  • janitor
  • viridis
  • tidyverse
  • ggplot2
  • ggridges
  • data.tables
  • ggside

Installing Packages and Opening Libraries


Importing datasets

Tswift_Spotify_Streams <- read_excel("../input/tswift-spotify-streams/Tswift_Spotify_Streams.xlsx")
Tswift_song_characteristics <- read_excel("../input/tswift-song-characteristics/Tswift__song_characteristics.xlsx")
Tswift_song_characteristics_by_year <- read_excel("../input/tswift-song-characteristics-by-year/Tswift__song_characteristics_by_year.xlsx")

Preview Datasets

R Taylor Swift Data Analysis

Cleaning and formatting

#change column name

colnames(Tswift_Spotify_Streams)[colnames(Tswift_Spotify_Streams) == "Column1"] ="year"
colnames(Tswift_song_characteristics)[colnames(Tswift_song_characteristics) == "name"] ="title"
colnames(Tswift_song_characteristics_by_year)[colnames(Tswift_song_characteristics_by_year) == "name"] ="title"
#change year column type into factor
Tswift_Spotify_Streams$year <- as.factor(Tswift_Spotify_Streams$year)
Tswift_song_characteristics_by_year$year <- as.factor(Tswift_song_characteristics_by_year$year)
Tswift_song_characteristics$danceability <- as.numeric(Tswift_song_characteristics$danceability)

#choose selected column

Tswift_Spotify_Streams = select(Tswift_Spotify_Streams, -2, -3, -4)

#remove the empty row

Tswift_Spotify_Streams %>% remove_empty(whic=c("rows"))
Tswift_song_characteristics %>% remove_empty(whic=c("rows"))
Tswift_song_characteristics_by_year %>% remove_empty(whic=c("rows"))

Merging Datasets

For merging datasets process, please see 4.2. Listener preferences for Taylor Swift songs based on the top 10 most streamed song characters section.

Analyze and Share Phase

The figure below depicts the total data streaming of Taylor Swift songs on Spotify from 2017 to 2021 globally (I use Tableu for Public to visualize this data) :

According to the visualization above, American listeners dominate the Taylor Songs audience.

I created a new dataframe that groups Taylor Swift streams data by title and year of streaming. Let’s see which year Taylor Swift’s songs were most streamed. Here is a visualisation of the total number of Taylor Swift songs streamed from 2017 to 2021.

Tswift_Spotify_Stream_by_title <- Tswift_Spotify_Streams %>% group_by(title, year) %>% summarise(streams = sum(streams))
Tswift_Spotify_Stream_by_title %>% group_by(year) %>% summarise(Total_Streams = sum(streams/1000000)) %>% ggplot(., aes(x = year, y = Total_Streams, fill = year)) + scale_fill_viridis(discrete = T) + labs(title= "Taylor Swift Spotify Streams 2017-2021", x="Year",y="Total Streams (Mio)") + geom_col() + theme_light()
R Taylor Swift Data Analysis

Taylor Swift’s songs were more enjoyed by listeners in 2020 than in any other year. Lets analyze more specific, I’d like to know which Taylor Swift song was the 10 most streamed on Spotify globally between 2017 and 2021.

Top 10 Taylor Swift Songs Streamed Worldwide 2017-2021

In order to identify the top ten most streamed Taylor Swift songs on Spotify globally, I arrange the Tswift_Spotify_Streams_by_title dataset descendingly by total streams for each title and slice the top ten.

Tswift_top_10_Global <- Tswift_Spotify_Stream_by_title %>% group_by(title) %>% summarise(total_streams = sum(streams)) %>% arrange(desc(total_streams)) %>% slice(1:10)

I created a new dataset by selecting a specific row value from the Taylor Swift top 10 Global dataset to choose the selected song title row to get a breakdown of total streams by year for each title.

Tswift_top_10_songs_year <- Tswift_Spotify_Stream_by_title %>% select(title, year, streams) %>% filter(title %in% c("Look What You Made Me Do", "ME! (feat. Brendon Urie of Panic! At The Disco)", "You Need To Calm Down", "willow", "Lover", "cardigan", "Delicate", "...Ready For It?", "exile (feat. Bon Iver)", "All Too Well (10 Minute Version) (Taylor's Version) (From The Vault)")) 

From 2017 to 2021, here is a visual representation of the top ten most streamed Taylor Swift songs.

ggplot(Tswift_top_10_songs_year, aes(fill=year, y=title, x=streams)) + geom_bar(position="stack", stat="identity") + scale_fill_viridis(discrete = T) + ggtitle("Taylor Swift Top 10 Spotify Streams 2017-2021") + labs(title= "Taylor Swift Top 10 Spotify Streams 2017-2021", x="Total Streams",y="Song Title") 
R Taylor Swift Data Analysis

Listener preferences for Taylor Swift songs based on the top 10 most streamed song

To determine listener preferences for Taylor Swift songs, I examined the song characters of Taylor Swift songs. The dataset is as follows:

Tswift_top_10_song_characters <- merge(Tswift_song_characteristics, Tswift_top_10_Global, by="title")
Tswift_top_10_song_characters %>% mutate(total_streams_in_million = total_streams/1000000) %>% ggplot(aes(y = energy, x = valence, color = title, size = total_streams_in_million)) + geom_point(alpha = 0.5) + scale_size(range = c(2, 20), name="Total_Streams_in_Mio") + scale_color_viridis(discrete = TRUE) + labs(title = "Energy vs Valence", caption = "Is Taylor Swift's song encouraging cheerful, euphoric, and happy feelings?") + ylab("Energy") + xlab("Valence") + theme_light ()
R Taylor Swift Data Analysis

The top ten most streamed Taylor Swift songs indicate a high level of energy and valence; additionally, the more energetic and valence the song character, the higher the number of song streams, implying that most listeners are more likely to enjoy the energetic and valence Taylor Swifts songs. Is it true that all of the characteristics listed above are related? Let’s get into more specifics.

The Song Characters Correlation with Total Streams of Taylor Swift’s Top 10 Most-Streamed Songs

Let’s remove the non-characters variable column to find the song variable correlation for each character.

Tswift_correlation_matrix = select(Tswift_song_characteristics_by_year, -1, -2)

Change the total number of streams to million

I remove the streams column because there are two columns that show the total value of the streams (streams and streams in million column).

Tswift_corr_matrix = select(Tswift_corr_matrix, -1)
Tswift_cormat <- round(cor(Tswift_corr_matrix),2)

Now obtain the correlation matrix’s lower and upper triangles.

get_upper_tri <- function(Tswift_cormat){Tswift_cormat[lower.tri(Tswift_cormat)]<- NA
get_lower_tri<-function(Tswift_cormat){Tswift_cormat[upper.tri(Tswift_cormat)] <- NA
upper_tri <- get_upper_tri(Tswift_cormat)
melted_cormat <- melt(Tswift_cormat)

Insert the correlation into the heatmap.

melted_cormat <- melt(upper_tri, na.rm = TRUE)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
 geom_tile(color = "white")+
 scale_fill_gradient2(low = "yellow", high = "navy", mid = "hotpink4", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
   name="Pearson\nCorrelation") + labs(title = "Top 10 Song's Characteristic vs Total Streams", caption = "Taylor Swift Spotify Data 2017-2021")
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
    size = 12, hjust = 1))+

All of the variables of song characters and total streams show a positive correlation. Specifically, the variables energy, valence, tempo, and popularity.


According to the correlation analysis, the most streamed songs have a high level of energy, valence, tempo, and popularity. Valence, energy, and popularity are all strongly linked. There is a positive correlation between Energy and Valence, implying that energetic songs promote cheerful, euphoria, and happy feelings.

According to the findings, Taylor Swift songs from 2017 to 2021 which are fast (tempo), loud, and noisy (energetic) are more likely to be streamed by Spotify listeners.

Tswift_song_characteristics_by_year %>% mutate(Streams_in_million = streams/1000000) %>% ggplot(aes(y = popularity, x = tempo, color = year, size = Streams_in_million)) + geom_point(alpha = 0.5) + scale_size(range = c(2, 20), name="Streams in millions") + scale_color_viridis(discrete = TRUE) + labs(title = "Popularity vs Total Streams", caption = "Top 10 Taylor Swift Songs on Spotify 2017-2021") + ylab("Popularity") + xlab("Streams_in_million") + theme_light()
R Taylor Swift Data Analysis


Taylor Swift’s songs make the listener feel upbeat, euphoric, and content. As a result, Taylor Swift’s songs are enjoyable to listen to.

This analytic study is set to analyze the top ten most streamed Taylor Swift songs on Spotify from 2017 to 2021. However, I found that in 2018, the stream number is the lowest and the audience is driven by Americans. To dig deeper and optimize the audience in other regions, I propose conducting a follow-up analysis with listeners based on region, lyrics, and album. As a result, the Taylor Swift song recommendation will be in line with the audience’s preferences.

I’m looking forward to doing an R Taylor Swift Data Analysis latest album, “Midnight.” Will the song’s character continue to elicit positive emotions?

PS. Taylor Swift breaks two records with “Midnights,” becoming the most-streamed artist on Spotify in October 2022.

Lets connect on Kaggle : R-Taylor Swift Songs Spotify Stream Preference
R Taylor Swift Data Analysis
Please follow and like us:
B. Economics MBA-Entrepreneurship Data Analytics Certified Reach me at saraswatisepti@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top
Translate ยป