Data scraping with R : DisneyPlusID vs NetflixID tweets

Data scraping is a method of extracting information from websites, documents, and other data sources using analytical techniques. Data scraping can be conduct by R to improve marketing and social media research by focusing on text analysis. In this post, I will analyze Disney+ Indonesia and Netflix Indonesia by scraping their tweets with R. Disney and Netflix increasing their investment in original content in order to better engage their audiences. We can learn about the type of text content they share and how they frame it by studying the popularity of their tweets. This article will only cover the basic concepts of scraping Twitter data with R and will not go into detail.

Data Scraping : connect twitter API

Please ensure you have a Twitter API to scrape data. If you do not yet have a Twitter API, read this post to learn how to get one.

Let’s get start the data scraping in R studio :

Connect your Twitter API key to begin the data scraping process.

api_key = 'Your API Key'
api_secret = 'Your API Secret'
access_token = 'Your Access Token'
access_token_secret = 'Your Token Secret'

setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

Install the required packages and libraries

install.packages("rtweet")
library(rtweet)
install.packages("httpuv")
install.packages('twitteR')
install.packages('tm')
library(twitteR)
library(tm)
install.packages("tidyverse")
library(tidyverse)
install.packages("tibble")
library(dplyr)
install.packages("tidytext")
library(tidytext)

Comparing popularity by users (DisneyPlusID vs NetflixID)

To scrape Twitter data, we must enter the appropriate username account. In this analysis, I’ll use the Disney Plus Indonesia “DisneyPlusID” and Netflix Indonesia “NetflixID” Twitter accounts as the objects.

Now that direct API authentication is active, we can begin the process of scraping data. Then, create a dataframe using the results of the data scraping.

Note: This data was collected on November 15, 2022.

netflixid_tweets<- userTimeline("NetflixID", n = 3200)
netflixid_df <- tbl_df(map_df(netflixid_tweets, as.data.frame))

disneyplusid_tweets<- userTimeline("DisneyPlusID", n = 3200)
disneyplusid_df <- tbl_df(map_df(disneyplusid_tweets, as.data.frame))

Data Scraping Results

Combined Tweets

To compare the popularity of tweet text-based content between DisneyPlusID and NetfixID, we must combine both dataframes.

combined.tweets <- rbind(netflixid_df, disneyplusid_df)

Let’s compare the popularity of tweets based on favorite and retweet counts.

combined.tweets %>%
  ggplot(aes(x = log(favoriteCount), y = log(retweetCount), colour = screenName)) +
  geom_point()

Data Scraping — **DisneyPlusID vs NetflixID Tweets Popularity**

According to the findings, Netflix has a higher popularity engagement for their audience than DisneyPlusID.

Descriptive Statistics

To back up the findings, we can use descriptive statistics to determine the mean, median, and maximum number of favorite counts.

combined.tweets %>%
  group_by(screenName) %>%
  summarise(mean(favoriteCount), median(favoriteCount), max(favoriteCount))

Brand Voice Throught Textual Analysis

The brand voice and persona can sometimes be seen in how the brand delivers their message in order to engage with the audience. For instance, consider the words and sentences used in each of their contents. A glimpse of how to analyze the brand voice using R is through textual analysis.

Textual Analysis DisneyPlusID

disneyplusid.word <- disneyplusid_df %>% 
  select(id, text) %>% 
  unnest_tokens(text, text)
head(disneyplusid.word)

disneyplusid.count <- disneyplusid.word %>% 
  count(text, sort = TRUE) %>% 
  head(30) %>% 
  mutate(text = reorder(text, n))

disneyplusid.count %>%
  ggplot(aes(x = text, y = n)) + 
  geom_col() +
  coord_flip() + 
  theme_minimal()

As we can see, some of the words are only links or have no actual significance. Consequently, we should remove certain words.

new_items <- c("https", "t.co", "di")

stop_words_new <- stop_words %>%
  pull(word) %>%
  append(new_items)

disneyplusid.count <- disneyplusid.word %>% 
  filter(!text %in% stop_words_new) %>%
  count(text, sort = TRUE) %>% 
  head(30) %>% 
  mutate(text = reorder(text, n))

ggplot(disneyplusid.count, aes(x = text, y = n)) + 
  geom_col() +
  coord_flip() +
  theme_minimal()

Textual Analysis NetflixID

netflixid.word <- netflixid_df %>% 
  select(id, text) %>% 
  unnest_tokens(text, text)

head(netflixid.word)

netflixid.count <- netflixid.word %>% 
  count(text, sort = TRUE) %>% 
  head(30) %>% 
  mutate(text = reorder(text, n))

netflixid.count %>%
  ggplot(aes(x = text, y = n)) + 
  geom_col(fill = "#D81F26") +
  coord_flip() + 
  theme_minimal()

As we can see, some of the words are only links or have no actual significance. Consequently, we should remove certain words.

new_items <- c("https", "t.co", "di")

stop_words_new <- stop_words %>%
  pull(word) %>%
  append(new_items)

netflixid.count <- netflixid.word %>% 
  filter(!text %in% stop_words_new) %>%
  count(text, sort = TRUE) %>% 
  head(30) %>% 
  mutate(text = reorder(text, n))

ggplot(netflixid.count, aes(x = text, y = n)) + 
  geom_col(fill = "#D81F26") +
  coord_flip() +
  theme_minimal()

Conclusion

In conclusion, I discovered that the NetflixID text-based Twitter content is more popular and engaging than the DisneyPlusID. There are also differences in the most frequently used words used by DisneyPlusID and NetflixID to demonstrate their brand voices. The Netflix ID emphasizes audience engagement by using more informal words like “nggak, kalo, bikin, bakal, lagi, apa,” whereas the DisneyPlus ID emphasizes their product or service and tagline by frequently using words like “streaming, Disney Plus HD, ekslusif, serial, original, Marvel”.

As mentioned in the introduction, this is not a deep analysis. Furthermore, in a future post, I will go into greater detail about data scraping, cleaning text datasets, and text analysis. Thank you for reading my post. Please do share with others and let me know your thought, comment with your opinion, or don’t hesitate to say “Hi”.

Read : Taylor Swift Data Analysis : Is Taylor Swift’s Song Making Your Mood?

Let's connect

Data scraping with R : DisneyPlusID vs NetflixID tweets

Data scraping with R : DisneyPlusID vs NetflixID tweets

Data Scraping : connect twitter API

Let’s get start the data scraping in R studio :

Install the required packages and libraries

Comparing popularity by users (DisneyPlusID vs NetflixID)

Data Scraping Results

Combined Tweets

Descriptive Statistics

Brand Voice Throught Textual Analysis

Textual Analysis DisneyPlusID

Textual Analysis NetflixID

Conclusion

ssaras

Leave a Reply Cancel reply

medium.com/@hai.ssaras

kaggle.com/septisaraswati

github.com/haissaras

Data scraping with R : DisneyPlusID vs NetflixID tweets

Data Scraping : connect twitter API

Let’s get start the data scraping in R studio :

Install the required packages and libraries

Comparing popularity by users (DisneyPlusID vs NetflixID)

Data Scraping Results

Combined Tweets

Descriptive Statistics

Brand Voice Throught Textual Analysis

Textual Analysis DisneyPlusID

Textual Analysis NetflixID

Conclusion

ssaras

Leave a Reply Cancel reply

Related Posts