Using the survey package in R to analyze the European Social Survey, part 2
Recoding the party support measure
We copy paste the old labels and type the new:
ess8_at<-ess8_at %>%
mutate(at_party_vote = fct_recode(prtvtbat,
"Social Democratic Party SP" = "SP\xd6",
"People's Party VP" = "\xd6VP",
"Freedom Party FP" = "FP\xd6",
"Alliance for the Future of Austria BZ"= "BZ\xd6",
"The Greens Gr" = "Gr\xfcne",
"Communist Party of Austria KP" = "KP\xd6",
"New Austria and Liberal Forum NEOS" = "NEOS",
"Pirate Party of Austria PIRAIT" = "Piratenpartei \xd6sterreich",
"Team Stronach for Austria" = "Team Frank Stronach",
NULL = "Other",
NULL = "Not applicable",
NULL = "Refusal",
NULL = "Don't know",
NULL = "No answer", ))
table(ess8_at$at_party_vote)
##
## Social Democratic Party SP
## 475
## People's Party VP
## 358
## Freedom Party FP
## 250
## Alliance for the Future of Austria BZ
## 10
## The Greens Gr
## 185
## Communist Party of Austria KP
## 6
## New Austria and Liberal Forum NEOS
## 36
## Pirate Party of Austria PIRAIT
## 4
## Team Stronach for Austria
## 18
mutate(at_party_vote = fct_recode(prtvtbat,
"Social Democratic Party SP" = "SP\xd6",
"People's Party VP" = "\xd6VP",
"Freedom Party FP" = "FP\xd6",
"Alliance for the Future of Austria BZ"= "BZ\xd6",
"The Greens Gr" = "Gr\xfcne",
"Communist Party of Austria KP" = "KP\xd6",
"New Austria and Liberal Forum NEOS" = "NEOS",
"Pirate Party of Austria PIRAIT" = "Piratenpartei \xd6sterreich",
"Team Stronach for Austria" = "Team Frank Stronach",
NULL = "Other",
NULL = "Not applicable",
NULL = "Refusal",
NULL = "Don't know",
NULL = "No answer", ))
table(ess8_at$at_party_vote)
## Social Democratic Party SP
## 475
## People's Party VP
## 358
## Freedom Party FP
## 250
## Alliance for the Future of Austria BZ
## 10
## The Greens Gr
## 185
## Communist Party of Austria KP
## 6
## New Austria and Liberal Forum NEOS
## 36
## Pirate Party of Austria PIRAIT
## 4
## Team Stronach for Austria
## 18
collapsing or grouping multiple labels together
The fct_collapse() function will group together labels into one new category, which you specify. Here, multiple responses are grouped into NULL, which in this case doesn’t require you to type NULL over and over:
ess8_at<-ess8_at %>%
mutate(at_party_vote = fct_recode(prtvtbat,
"Social Democratic Party SP" = "SP\xd6",
"People's Party VP" = "\xd6VP",
"Freedom Party FP" = "FP\xd6",
"Alliance for the Future of Austria BZ"= "BZ\xd6",
"The Greens Gr" = "Gr\xfcne",
"Communist Party of Austria KP" = "KP\xd6",
"New Austria and Liberal Forum NEOS" = "NEOS",
"Pirate Party of Austria PIRAIT" = "Piratenpartei \xd6sterreich",
"Team Stronach for Austria" = "Team Frank Stronach" )) %>%
mutate(at_party_vote = fct_collapse(at_party_vote,
NULL = c("Other", "Not applicable", "Refusal", "Don't know", "No answer")
))
table(ess8_at$at_party_vote)
##
## Social Democratic Party SP
## 475
## People's Party VP
## 358
## Freedom Party FP
## 250
## Alliance for the Future of Austria BZ
## 10
## The Greens Gr
## 185
## Communist Party of Austria KP
## 6
## New Austria and Liberal Forum NEOS
## 36
## Pirate Party of Austria PIRAIT
## 4
## Team Stronach for Austria
## 18
mutate(at_party_vote = fct_recode(prtvtbat,
"Social Democratic Party SP" = "SP\xd6",
"People's Party VP" = "\xd6VP",
"Freedom Party FP" = "FP\xd6",
"Alliance for the Future of Austria BZ"= "BZ\xd6",
"The Greens Gr" = "Gr\xfcne",
"Communist Party of Austria KP" = "KP\xd6",
"New Austria and Liberal Forum NEOS" = "NEOS",
"Pirate Party of Austria PIRAIT" = "Piratenpartei \xd6sterreich",
"Team Stronach for Austria" = "Team Frank Stronach" )) %>%
mutate(at_party_vote = fct_collapse(at_party_vote,
NULL = c("Other", "Not applicable", "Refusal", "Don't know", "No answer")
))
table(ess8_at$at_party_vote)
## Social Democratic Party SP
## 475
## People's Party VP
## 358
## Freedom Party FP
## 250
## Alliance for the Future of Austria BZ
## 10
## The Greens Gr
## 185
## Communist Party of Austria KP
## 6
## New Austria and Liberal Forum NEOS
## 36
## Pirate Party of Austria PIRAIT
## 4
## Team Stronach for Austria
## 18
collapsing smaller groups into one “other” category
Given Austria’s multi-party system, we could combine the smaller political parties into an ‘other’ category. The function fct_lump() will do this automatically:
ess8_at<-ess8_at %>%
mutate(at_party_vote = fct_lump(prtvtbat))
table(ess8_at$at_party_vote)
##
## SP\xd6 \xd6VP FP\xd6 Gr\xfcne Not applicable
## 475 358 250 185 418
## Refusal Other
## 205 119
Of course, we need to be careful. The default choices may not make sense. You can control the number of total categories with n=, such as fct_lump(prtvtbat, n=10).
##
## SP VP FP BZ Gr KP NEOS PIR Strn
## 475 358 250 10 185 6 36 4 18
mutate(at_party_vote = fct_lump(prtvtbat))
table(ess8_at$at_party_vote)
## SP\xd6 \xd6VP FP\xd6 Gr\xfcne Not applicable
## 475 358 250 185 418
## Refusal Other
## 205 119
## SP VP FP BZ Gr KP NEOS PIR Strn
## 475 358 250 10 185 6 36 4 18
Removing unused factor labels
If a variable includes unused labels — for example a label records 0 people – it can be dropped with fct_drop(). This function simply removes the label that is unused.
ess8_at<-ess8_at %>%
mutate(at_party_vote = fct_drop(prtvtbat))
And we will collapse a few categories of the party closeness variable:
ess8_at<-ess8_at %>%
mutate(at_party_vote = fct_lump(at_party_vote), n=2)
table(ess8_at$at_party_vote)
##
## SP VP FP Gr Other
## 475 358 250 185 74
And we will recode the left right measures to exclude the “don’t know” and “no answer” responses:
ess8_at<-ess8_at %>%
mutate(left_right = fct_recode(left_right,
NULL= "Refusal",
NULL = "Don't know",
NULL = "No answer"))
ess8_at<-ess8_at %>%
mutate(left_right3cat = fct_recode(left_right3cat,
NULL= "Refusal",
NULL = "Don't know",
NULL = "No answer"))
Since we created a new variable and recoded the left right self placement, we need to reset the survey design object. We can update the existing object, or just overwrite it. :
ess_design_at <-
svydesign(
ids = ~0 ,
strata=NULL,
weights=~pspwght,
data = ess8_at)
We use the svyby() function, which allows us to calculate means, and other summary statistics, on a numeric variable across levels of a qualitative, factor variable. While the box whiskers plots showed median left right placement by political party, we could use svyby() to calculate arithmetic means. The variable upon which you want to calculate a mean is specified at the begining of the function, following a tilde symbol, ~.
Of course, we need to specify what to do once we encounter missing values, which are found throughout the dataset, before calculating the mean. We add na.rm=TRUE to remove responses with missing values prior to calculating the means. To calculate a mean, a variable needs to be treated as a numeric score; in the case below, we calculate a mean on the 10 point left to right scale; the mean ranges from 1 to 10. Because left_right is originally scored as a qualitative variable, we add as.numeric() to it, so that it is treated as a numeric score. –And remember, when coercing a factor to a numeric score, R will automatically convert the factor levels to integers beginning at 1. In the case of lrscale, this is fine.
svyby(~as.numeric(left_right), by=~at_party_vote, design=ess_design_at, svymean, na.rm=TRUE)
## at_party_vote as.numeric(left_right) se
## SP SP 5.188813 0.1081252
## VP VP 6.716724 0.0906762
## FP FP 7.574956 0.1367827
## Gr Gr 4.221286 0.1671370
## Other Other 5.905669 0.2728092
The function svymean identifies arithmetic means as the statistic to calculate. One additional thing, while the results are not listed below, we could combine this command above with the subset() function. For example, if we had set the survey design on the entire dataset across all ESS nations, we could subset it by a particular country:
svyby(~left_right, ~at_party_vote, subset (ess_design, cntry=="AT"), svymean, na.rm=TRUE)
To produce column or row percentages, we use a prop.table() function to wrap around the svytable() function. margin=2 means column proportions. margin=1 would produce row proportions.
tab1<-prop.table(svytable(~left_right3cat + at_party_vote, ess_design_at), margin=2)
## these are column proportions
knitr::kable(tab1*100, digits = 2, columns=2, caption = "Percentage of Austrian party supporters identifying on left, right, or center")
Percentage of Austrian party supporters identifying on left, right, or center
mutate(at_party_vote = fct_drop(prtvtbat))
mutate(at_party_vote = fct_lump(at_party_vote), n=2)
table(ess8_at$at_party_vote)
## SP VP FP Gr Other
## 475 358 250 185 74
mutate(left_right = fct_recode(left_right,
NULL= "Refusal",
NULL = "Don't know",
NULL = "No answer"))
ess8_at<-ess8_at %>%
mutate(left_right3cat = fct_recode(left_right3cat,
NULL= "Refusal",
NULL = "Don't know",
NULL = "No answer"))
svydesign(
ids = ~0 ,
strata=NULL,
weights=~pspwght,
data = ess8_at)
## SP SP 5.188813 0.1081252
## VP VP 6.716724 0.0906762
## FP FP 7.574956 0.1367827
## Gr Gr 4.221286 0.1671370
## Other Other 5.905669 0.2728092
## these are column proportions
knitr::kable(tab1*100, digits = 2, columns=2, caption = "Percentage of Austrian party supporters identifying on left, right, or center")
SP
|
VP
|
FP
|
Gr
|
Other
| |
left
|
50.86
|
9.86
|
6.74
|
77.54
|
32.13
|
center
|
37.16
|
39.38
|
30.47
|
14.50
|
22.29
|
right
|
11.98
|
50.76
|
62.79
|
7.96
|
45.58
|
We can add a sample weight adjusted Chi-square test of independence with statistic=f("Chisq").
Given a table of results, if it makes sense to do so you can construct simple barplots of the results. In this case, column percentages. Barplot graphic excluded:
barplot(tab1,beside=FALSE,legend=TRUE, main="left-right ID by party vote choice", ylab="proportion of left, right, or center placement")
A histogram of left right placement in Austria.
Since the variable is currently stored as a factor, we will use as.numeric() to make it numeric:
svyhist(~ as.numeric(left_right), ess_design_at, main="Left to right self identification", xlab="left (min) right (max)")
We can subset it with age variable or party
svyhist(~ as.numeric(left_right), subset (ess_design_at, agea<=35), main="Left to right ID, Austrians 35 and younger", xlab="left (min) right (max)")
# svyhist(~ as.numeric(left_right), subset (ess_design_at, at_party_vote=="VP"))
A Boxplot.
It requires two variables – the numeric scores with which to calculate the boxplot, and a factor that determines the categories within which the scores are calculated.
svyboxplot(as.numeric(left_right) ~ at_party_vote , ess_design_at, na.rm=TRUE,
main="left right placement by party closeness, Austria",
ylab="left (1) to right (11)")
More next post on regression modeling and visualization with the survey pacakge and analyzing other nationally representative datasets.
main="left right placement by party closeness, Austria",
ylab="left (1) to right (11)")
Comments
Post a Comment