Flu Data Exploration

Author

Leah Lariscy

Let’s explore!

Load libraries first

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(here)
here() starts at /Users/leahlariscy/Desktop/MADA2023/leahlariscy-MADA-portfolio

Load RDS file of clean symptom data

symptoms_clean <- readRDS(here("fluanalysis/data/processed_data/symptoms_clean.RDS"))

Lets look at the data to be sure it loaded correctly

skimr::skim(symptoms_clean)
Data summary
Name symptoms_clean
Number of rows 730
Number of columns 32
_______________________
Column type frequency:
factor 31
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
SwollenLymphNodes 0 1 FALSE 2 No: 418, Yes: 312
ChestCongestion 0 1 FALSE 2 Yes: 407, No: 323
ChillsSweats 0 1 FALSE 2 Yes: 600, No: 130
NasalCongestion 0 1 FALSE 2 Yes: 563, No: 167
CoughYN 0 1 FALSE 2 Yes: 655, No: 75
Sneeze 0 1 FALSE 2 Yes: 391, No: 339
Fatigue 0 1 FALSE 2 Yes: 666, No: 64
SubjectiveFever 0 1 FALSE 2 Yes: 500, No: 230
Headache 0 1 FALSE 2 Yes: 615, No: 115
Weakness 0 1 FALSE 4 Mod: 338, Mil: 223, Sev: 120, Non: 49
WeaknessYN 0 1 FALSE 2 Yes: 681, No: 49
CoughIntensity 0 1 FALSE 4 Mod: 357, Sev: 172, Mil: 154, Non: 47
CoughYN2 0 1 FALSE 2 Yes: 683, No: 47
Myalgia 0 1 FALSE 4 Mod: 325, Mil: 213, Sev: 113, Non: 79
MyalgiaYN 0 1 FALSE 2 Yes: 651, No: 79
RunnyNose 0 1 FALSE 2 Yes: 519, No: 211
AbPain 0 1 FALSE 2 No: 639, Yes: 91
ChestPain 0 1 FALSE 2 No: 497, Yes: 233
Diarrhea 0 1 FALSE 2 No: 631, Yes: 99
EyePn 0 1 FALSE 2 No: 617, Yes: 113
Insomnia 0 1 FALSE 2 Yes: 415, No: 315
ItchyEye 0 1 FALSE 2 No: 551, Yes: 179
Nausea 0 1 FALSE 2 No: 475, Yes: 255
EarPn 0 1 FALSE 2 No: 568, Yes: 162
Hearing 0 1 FALSE 2 No: 700, Yes: 30
Pharyngitis 0 1 FALSE 2 Yes: 611, No: 119
Breathless 0 1 FALSE 2 No: 436, Yes: 294
ToothPn 0 1 FALSE 2 No: 565, Yes: 165
Vision 0 1 FALSE 2 No: 711, Yes: 19
Vomit 0 1 FALSE 2 No: 652, Yes: 78
Wheeze 0 1 FALSE 2 No: 510, Yes: 220

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
BodyTemp 0 1 98.94 1.2 97.2 98.2 98.5 99.3 103.1 ▇▇▂▁▁

Looks like the data did load properly. There are 730 observations of 32 variables with no NAs. There are 31 factor and 1 integer variables.

Lets explore the important variables

Outcomes of interest: body temperature, nausea

Predictors of interest: weakness, fatigue, headache

Look at summary statistics of all variables

summary(symptoms_clean)
 SwollenLymphNodes ChestCongestion ChillsSweats NasalCongestion CoughYN  
 No :418           No :323         No :130      No :167         No : 75  
 Yes:312           Yes:407         Yes:600      Yes:563         Yes:655  
                                                                         
                                                                         
                                                                         
                                                                         
 Sneeze    Fatigue   SubjectiveFever Headache      Weakness   WeaknessYN
 No :339   No : 64   No :230         No :115   None    : 49   No : 49   
 Yes:391   Yes:666   Yes:500         Yes:615   Mild    :223   Yes:681   
                                               Moderate:338             
                                               Severe  :120             
                                                                        
                                                                        
  CoughIntensity CoughYN2      Myalgia    MyalgiaYN RunnyNose AbPain   
 None    : 47    No : 47   None    : 79   No : 79   No :211   No :639  
 Mild    :154    Yes:683   Mild    :213   Yes:651   Yes:519   Yes: 91  
 Moderate:357              Moderate:325                                
 Severe  :172              Severe  :113                                
                                                                       
                                                                       
 ChestPain Diarrhea  EyePn     Insomnia  ItchyEye  Nausea    EarPn    
 No :497   No :631   No :617   No :315   No :551   No :475   No :568  
 Yes:233   Yes: 99   Yes:113   Yes:415   Yes:179   Yes:255   Yes:162  
                                                                      
                                                                      
                                                                      
                                                                      
 Hearing   Pharyngitis Breathless ToothPn   Vision    Vomit     Wheeze   
 No :700   No :119     No :436    No :565   No :711   No :652   No :510  
 Yes: 30   Yes:611     Yes:294    Yes:165   Yes: 19   Yes: 78   Yes:220  
                                                                         
                                                                         
                                                                         
                                                                         
    BodyTemp     
 Min.   : 97.20  
 1st Qu.: 98.20  
 Median : 98.50  
 Mean   : 98.94  
 3rd Qu.: 99.30  
 Max.   :103.10  

Here we see that most of the categorical variables have either Yes or No responses, simply indicating presence-absence of the symptoms. A few others have a range of responses to address the severity of certain symptoms. There is one continuous variable, BodyTemp, which has a range from 97.2 to 103.1. Nausea, our other outcome of interest, had 475 No and 255 Yes.

Histogram of BodyTemp

symptoms_clean %>% ggplot(aes(BodyTemp)) +
  geom_histogram(fill = "#d65aa8") +
  theme_light()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here we see the distribution of BodyTemp, showing that the most commonly reported body temperature was slightly higher than 98. This checks out, as we saw in the summary report above than the median is 98.5.

Boxplot of body temps in regards to weakness levels

symptoms_clean %>% ggplot(aes(WeaknessYN, BodyTemp, color = WeaknessYN)) +
  geom_boxplot() +
  theme_light()

From first glance, it appears as if mean body temp increases when weakness is present.

Boxplot of body temp in regards to fatigue presence/absence

symptoms_clean %>% ggplot(aes(Fatigue, BodyTemp, color = Fatigue)) +
  geom_boxplot() +
  theme_light()

It appears that mean body temp does increase slightly when fatigue is present.

Boxplot of body temp in regards to headache presence/absence

symptoms_clean %>% ggplot(aes(Headache, BodyTemp, color = Headache)) +
  geom_boxplot() +
  theme_light()

It appears that mean body temp does increase slightly when headache is present.

Counts of Weakness presence/absence and Nausea presence/absence

symptoms_clean %>% ggplot(aes(WeaknessYN, Nausea)) +
  geom_count() +
  theme_light()

The most common combination is weakness + no nausea, followed by weakness + nausea. The count size is similar for both though, so there may or may not a significant difference there. The least common observation is lack of weakness but presence of nausea. What this shows is that weakness is more common than no weakness, but that weakness may not necessarily determine nausea.

Counts of Fatigue presence/absence and Nausea presence/absence

symptoms_clean %>% ggplot(aes(Fatigue, Nausea)) +
  geom_count() +
  theme_light()

This output is similar to the one above, where presence of fatigue is most common, but fatigue without the presence of nausea is slightly more common.

Counts of Headache presence/absence and Nausea presence/absence

symptoms_clean %>% ggplot(aes(Headache, Nausea)) +
  geom_count() +
  theme_light()

Like in the previous two outputs, headaches are more common than not, but lack of nausea with headaches was slightly more common than presence of nausea.