Lets look at the data to be sure it loaded correctly
skimr::skim(symptoms_clean)
Data summary
Name
symptoms_clean
Number of rows
730
Number of columns
32
_______________________
Column type frequency:
factor
31
numeric
1
________________________
Group variables
None
Variable type: factor
skim_variable
n_missing
complete_rate
ordered
n_unique
top_counts
SwollenLymphNodes
0
1
FALSE
2
No: 418, Yes: 312
ChestCongestion
0
1
FALSE
2
Yes: 407, No: 323
ChillsSweats
0
1
FALSE
2
Yes: 600, No: 130
NasalCongestion
0
1
FALSE
2
Yes: 563, No: 167
CoughYN
0
1
FALSE
2
Yes: 655, No: 75
Sneeze
0
1
FALSE
2
Yes: 391, No: 339
Fatigue
0
1
FALSE
2
Yes: 666, No: 64
SubjectiveFever
0
1
FALSE
2
Yes: 500, No: 230
Headache
0
1
FALSE
2
Yes: 615, No: 115
Weakness
0
1
FALSE
4
Mod: 338, Mil: 223, Sev: 120, Non: 49
WeaknessYN
0
1
FALSE
2
Yes: 681, No: 49
CoughIntensity
0
1
FALSE
4
Mod: 357, Sev: 172, Mil: 154, Non: 47
CoughYN2
0
1
FALSE
2
Yes: 683, No: 47
Myalgia
0
1
FALSE
4
Mod: 325, Mil: 213, Sev: 113, Non: 79
MyalgiaYN
0
1
FALSE
2
Yes: 651, No: 79
RunnyNose
0
1
FALSE
2
Yes: 519, No: 211
AbPain
0
1
FALSE
2
No: 639, Yes: 91
ChestPain
0
1
FALSE
2
No: 497, Yes: 233
Diarrhea
0
1
FALSE
2
No: 631, Yes: 99
EyePn
0
1
FALSE
2
No: 617, Yes: 113
Insomnia
0
1
FALSE
2
Yes: 415, No: 315
ItchyEye
0
1
FALSE
2
No: 551, Yes: 179
Nausea
0
1
FALSE
2
No: 475, Yes: 255
EarPn
0
1
FALSE
2
No: 568, Yes: 162
Hearing
0
1
FALSE
2
No: 700, Yes: 30
Pharyngitis
0
1
FALSE
2
Yes: 611, No: 119
Breathless
0
1
FALSE
2
No: 436, Yes: 294
ToothPn
0
1
FALSE
2
No: 565, Yes: 165
Vision
0
1
FALSE
2
No: 711, Yes: 19
Vomit
0
1
FALSE
2
No: 652, Yes: 78
Wheeze
0
1
FALSE
2
No: 510, Yes: 220
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
BodyTemp
0
1
98.94
1.2
97.2
98.2
98.5
99.3
103.1
▇▇▂▁▁
Looks like the data did load properly. There are 730 observations of 32 variables with no NAs. There are 31 factor and 1 integer variables.
Lets explore the important variables
Outcomes of interest: body temperature, nausea
Predictors of interest: weakness, fatigue, headache
Look at summary statistics of all variables
summary(symptoms_clean)
SwollenLymphNodes ChestCongestion ChillsSweats NasalCongestion CoughYN
No :418 No :323 No :130 No :167 No : 75
Yes:312 Yes:407 Yes:600 Yes:563 Yes:655
Sneeze Fatigue SubjectiveFever Headache Weakness WeaknessYN
No :339 No : 64 No :230 No :115 None : 49 No : 49
Yes:391 Yes:666 Yes:500 Yes:615 Mild :223 Yes:681
Moderate:338
Severe :120
CoughIntensity CoughYN2 Myalgia MyalgiaYN RunnyNose AbPain
None : 47 No : 47 None : 79 No : 79 No :211 No :639
Mild :154 Yes:683 Mild :213 Yes:651 Yes:519 Yes: 91
Moderate:357 Moderate:325
Severe :172 Severe :113
ChestPain Diarrhea EyePn Insomnia ItchyEye Nausea EarPn
No :497 No :631 No :617 No :315 No :551 No :475 No :568
Yes:233 Yes: 99 Yes:113 Yes:415 Yes:179 Yes:255 Yes:162
Hearing Pharyngitis Breathless ToothPn Vision Vomit Wheeze
No :700 No :119 No :436 No :565 No :711 No :652 No :510
Yes: 30 Yes:611 Yes:294 Yes:165 Yes: 19 Yes: 78 Yes:220
BodyTemp
Min. : 97.20
1st Qu.: 98.20
Median : 98.50
Mean : 98.94
3rd Qu.: 99.30
Max. :103.10
Here we see that most of the categorical variables have either Yes or No responses, simply indicating presence-absence of the symptoms. A few others have a range of responses to address the severity of certain symptoms. There is one continuous variable, BodyTemp, which has a range from 97.2 to 103.1. Nausea, our other outcome of interest, had 475 No and 255 Yes.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here we see the distribution of BodyTemp, showing that the most commonly reported body temperature was slightly higher than 98. This checks out, as we saw in the summary report above than the median is 98.5.
Boxplot of body temps in regards to weakness levels
symptoms_clean %>%ggplot(aes(WeaknessYN, BodyTemp, color = WeaknessYN)) +geom_boxplot() +theme_light()
From first glance, it appears as if mean body temp increases when weakness is present.
Boxplot of body temp in regards to fatigue presence/absence
symptoms_clean %>%ggplot(aes(Fatigue, BodyTemp, color = Fatigue)) +geom_boxplot() +theme_light()
It appears that mean body temp does increase slightly when fatigue is present.
Boxplot of body temp in regards to headache presence/absence
symptoms_clean %>%ggplot(aes(Headache, BodyTemp, color = Headache)) +geom_boxplot() +theme_light()
It appears that mean body temp does increase slightly when headache is present.
Counts of Weakness presence/absence and Nausea presence/absence
The most common combination is weakness + no nausea, followed by weakness + nausea. The count size is similar for both though, so there may or may not a significant difference there. The least common observation is lack of weakness but presence of nausea. What this shows is that weakness is more common than no weakness, but that weakness may not necessarily determine nausea.
Counts of Fatigue presence/absence and Nausea presence/absence
Like in the previous two outputs, headaches are more common than not, but lack of nausea with headaches was slightly more common than presence of nausea.