Visualization Exercise

Author

Leah Lariscy

Political donation data from FiveThirtyEight

Data is from the FiveThirtyEight article Inside The Political Donation History Of Wealthy Sports Owners

Load packages

library(tidyverse) #for data cleaning
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(here) #for setting file paths
here() starts at /Users/leahlariscy/Desktop/MADA2023/leahlariscy-MADA-portfolio
library(ggthemes) #for loading ggplot themes
library(skimr) #for skimming data sets

Set figure size

knitr::opts_chunk$set(fig.width=10, fig.height=8) 

Load data

You will see that the data set includes the following variables -> Owner,Team,League, Recipient, Amount, Election Year, and Party

here()
[1] "/Users/leahlariscy/Desktop/MADA2023/leahlariscy-MADA-portfolio"
donation <- read_csv("data/sports-political-donations/sports-political-donations.csv")
Rows: 2798 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Owner, Team, League, Recipient, Amount, Party
dbl (1): Election Year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim(donation)
Data summary
Name donation
Number of rows 2798
Number of columns 7
_______________________
Column type frequency:
character 6
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Owner 0 1 9 43 0 158 0
Team 0 1 9 59 0 115 0
League 0 1 3 14 0 16 0
Recipient 0 1 3 96 0 1274 0
Amount 0 1 3 10 0 244 0
Party 0 1 3 33 0 7 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Election Year 0 1 2017.93 1.6 2016 2016 2018 2020 2020 ▇▁▇▁▇

Clean and wrangle data

These are the following steps I took to prepare the data.

  1. Filter for rows containing only the 6 leagues, leaving out ones that were cross-listed among many leagues. This removed a large amount of the data, meaning that a majority of team owners own multiple teams across various leagues.

  2. Filter for rows containing only democrat or republican, leaving out bipartisan donations. I would have left these in but I had issuing renaming all variations to just bipartisan.

  3. Parse the numerical values from Amount so it can then act as a numerical.

  4. Group by League, Election Year, and Party

  5. Sum the Amount of each group to find the total dollar amount of donations from each of the 6 leagues to each of the two parties in a given year.

  6. Ungroup. Data is now ready to plot.

donation_clean <- donation %>% 
  filter(League ==c("MLB","NASCAR","NBA","NFL","NHL","WNBA")) %>% 
 
  filter(Party ==c("Democrat","Republican")) %>% 
  
  mutate(Amount = parse_number(Amount)) %>% 
  
  group_by(League,`Election Year`,Party) %>% 
  
  summarise(party_donations = sum(Amount)) %>% 
  
  ungroup() 
Warning: There was 1 warning in `filter()`.
ℹ In argument: `League == c("MLB", "NASCAR", "NBA", "NFL", "NHL", "WNBA")`.
Caused by warning in `League == c("MLB", "NASCAR", "NBA", "NFL", "NHL", "WNBA")`:
! longer object length is not a multiple of shorter object length
Warning: There was 1 warning in `filter()`.
ℹ In argument: `Party == c("Democrat", "Republican")`.
Caused by warning in `Party == c("Democrat", "Republican")`:
! longer object length is not a multiple of shorter object length
`summarise()` has grouped output by 'League', 'Election Year'. You can override
using the `.groups` argument.
skim(donation_clean)
Data summary
Name donation_clean
Number of rows 32
Number of columns 4
_______________________
Column type frequency:
character 2
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
League 0 1 3 6 0 6 0
Party 0 1 8 10 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Election Year 0 1 2018.0 1.68 2016 2016 2018 2020 2020 ▇▁▇▁▇
party_donations 0 1 138422.2 379996.99 1600 9250 18450 65300 1808600 ▇▁▁▁▁

Plot data

donation_plot <- donation_clean %>%
ggplot(aes(fill=Party, y=party_donations, x=`Election Year`)) +

  geom_bar(position = "fill", stat = "identity", color = "white") + #basic geometry of plot, bar plot

  facet_wrap(as.factor(donation_clean$League)) + #plot by individual League

  scale_fill_manual(name="DONATIONS TO", values=c("#00A5E3", "#CC0000"), #looked up hexcodes on google 
  labels=c("DEMOCRATS","REPUBLICANS")) + #fill colors in correct order

  theme_fivethirtyeight() + #theme from fivethirtyeight website

  scale_x_continuous(breaks = seq(2016,2020,2)) + #2 year breaks between 2016 & 2020

  labs(title = "Across leagues, majority of donations go to Republicans",
  subtitle = "Share of donations from team owners in six leagues, per year, league and party") +

  theme(legend.title = element_text(face = "bold", size = 16),
  legend.text = element_text(size = 16),
  plot.title.position = "plot",
  plot.title = element_text(face = "bold", size = 24),
  plot.subtitle = element_text(size = 18),
  legend.position = "top",
  axis.text.x = element_text(size = 14))
  donation_plot

You will notice that the proportions on my plot are slightly different from the original. This is because I excluded all cross-listed league affiliations (for the sake of time, but in the future I would like to be able to parse these out into individual observations). I also excluded bipartisan donations and NAs because I was having trouble renaming these all to bipartisan. I would also like to work on this, as I would like to be able to re-create this chart more accurately.

I also had some formatting issues that I couldn’t quite work out on my own. For one, I could not figure out how to change the size of the facet labels. I also struggled to find a way to change the y-axis tick labels to be percentages like on the original.