5 Test Yourselves 2

Please read the Using tidyverse section before working on this exercise, especially if you have never used tidyverse before.

As before, suggested answers are at the bottom of this page, but please do try the exercise before looking at them. :)

5.1 Instructions

  1. Download the data file here: SWB.csv.
  2. Start an R Project in the folder where you saved the data file.
  3. Open up a new script file in RStudio.
  4. Load tidyverse and psych packages.
  5. Read in the data file SWB.csv. Give this dataset the name dataset.
  6. Examine dataset. Minimally you should be able to tell there are 343 rows and 23 columns. You should also be able to tell the variable names and the data type for each variable.
  7. Convert marital_status and have_children from integer to factor. Add value labels to the factors. See the legend below to see how each variable is coded.
  8. Check that the two variables are indeed converted to factors.
  9. Create a subset of the dataset of only married parents (i.e., married people who have children). Further subset the dataset such that only the variables swls2021_1, swls2021_2, swls2021_3, swls2021_4, and swls2021_5 are in the dataset. You should have 68 rows and 5 columns in this subset. Give this subset the name subset.
  10. Using subset, compute the average of swls2021_1, swls2021_2, swls2021_3, swls2021_4, and swls2021_5 for each person. Call this variable swls2021_avg.
  11. Create a histogram for swls2021_avg. Although I’d be okay with just the basic histogram, I’d like you to play around with the different customisations! Change the bar fill color, change the range of the x-axis, make the background white, etc. (I just want you to explore and have fun here!)
  12. Using subset, get the following summary statistics for swls2021_avg: maximum, minimum, mean, and sd. Specifically, use the summarise() function from dplyr package. Then confirm the results using the describe() function from the psych package. Check that the maximum is 5.4, the minimum is 2.6, the mean is 4.18, and the sd is 0.707.
  13. Combine different functions from the dplyr package to find out the number of married parents who have swls2021_avg greater than or equal to 4.18. Check that the number is 36.

5.2 Legend

Variable Name Variable Label Value Label
pin participant identification number
gender gender 0 = male, 1 = female
marital_status marital status 1 = married, 2 = divorced, 3 = widowed
have_children parental status 0 = no children, 1 = have children
mvs_1 My life would be better if I own certain things I don’t have. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_2 The things I own say a lot about how well I’m doing. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_3 I’d be happier if I could afford to buy more things. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_4 It bothers me that I can’t afford to buy things I’d like. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_5 Buying things gives me a lot of pleasure. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_6 I admire people who own expensive homes, cars, clothes. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_7 I like to own things that impress people. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_8 I like a lot of luxury in my life. 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
mvs_9 I try to keep my life simple, as far as possessions are concerned. (R) 1 = strongly disagree, 2 = disagree, 3 = neither disagree nor agree, 4 = agree, 5 = strongly agree
swls2019_1 In most ways my life is close to my ideal. (2019) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2019_2 The conditions of my life are excellent. (2019) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2019_3 I am satisfied with my life. (2019) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2019_4 So far I have gotten the important things I want in life. (2019) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2019_5 In most ways my life is close to my ideal. (2019) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2021_1 In most ways my life is close to my ideal. (2021) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2021_2 The conditions of my life are excellent. (2021) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2021_3 I am satisfied with my life. (2021) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2021_4 So far I have gotten the important things I want in life. (2021) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree
swls2021_5 In most ways my life is close to my ideal. (2021) 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = neither disagree nor agree, 5 = slightly agree, 6 = agree, 7 = strongly agree

5.3 Suggested Answers

# Skipped Steps 1 to 3. 

# 4. Load packages
library(tidyverse)
library(psych)

# 5. Read .csv file
dataset <- read_csv("SWB.csv") 
# read.csv("SWB.csv") also okay

# 6. Examine the data
# Take a look at the variable names and data types
glimpse(dataset)  # str(dataset) or View(dataset) also okay

# 7. Convert integer to factor and add value labels to the factor.
dataset$marital_status <- factor(dataset$marital_status, 
                                 levels = c(1, 2, 3), 
                                 labels = c("married", "divorced", "widowed"))

dataset$have_children <- factor(dataset$have_children, 
                                levels = c(0, 1), 
                                labels = c("no children", "have children"))

# 8. Check variables are now factors
glimpse(dataset$marital_status)  # str(dataset$marital_status) also okay
glimpse(dataset$have_children) # str(dataset$have_children) also okay

# 9. Create subset 
subset <- dataset %>% 
  filter(marital_status == "married" & have_children == "have children") %>% 
  select(swls2021_1:swls2021_5)

# 10. Compute swls2021_avg
subset <- subset %>% 
  mutate(swls2021_avg = rowMeans(across(c(swls2021_1:swls2021_5)))) 

# 11. Create a histogram for swls2021_avg
ggplot(data = subset) + 
  geom_histogram(aes(x = swls2021_avg)) + 
  scale_x_continuous(expand = c(0, 0),
                     limits = c(0, 8),
                     # Every 0.5 units, a tick will appear on x-axis
                     breaks = seq(0, 8, 0.5)) + 
  scale_y_continuous(expand = c(0, 0), 
                     # Every 1 unit, a tick will appear on y-axis
                     breaks = seq(0, 20, 1)) + 
  labs(x = "Average 2021 Satisfaction With Life Scores",
       y = "Frequency", 
       title = "Histogram of Average 2021 Satisfaction With Life Scores") + 
  # Choose the theme (there are different themes to choose, just type theme and some suggestions will pop up)
  theme_classic()

# 12A. Get summary statistics using the summarise() function
# Maximum is 5.4, minimum is 2.6, mean is 4.18, and sd is 0.707.
subset %>% 
  summarise(avg_swls = mean(swls2021_avg), 
            min_swls = min(swls2021_avg), 
            max_swls = max(swls2021_avg), 
            sd_swls = sd(swls2021_avg))

# 12B. Get summary statistics using the describe() function
# Maximum is 5.4, minimum is 2.6, mean is 4.18, and sd is 0.707.
describe(subset$swls2021_avg)

# 13A. Create a subset that only contains people with an average SWLS of 4.18 and above
# Then, count the number of people remaining 
subset %>% 
  filter(swls2021_avg >= 4.18) %>% 
  summarise (total = n())

# 13B. Create a subset that only contains people with an average SWLS of 4.18 and above
# Then, count the number of people remaining 
subset %>% 
  filter(swls2021_avg >= 4.18) %>% 
  nrow  # count the number of rows remaining