Using the survey package in R to analyze the European Social Survey
For future reference, I’d like to have a record of tools for analyzing the European Social Survey, via the “survey” package by Lumley (https://cran.r-project.org/web/packages/survey/survey.pdf). In this post, I simply setup the survey object and demonstrate the tabulation of responses.
The examples below require the survey
, dplyr
, and forcats
packages:
library(survey)
library(dplyr)
library(forcats)
Below I load a version of the 8th round of the European Social Survey dataset (http://europeansocialsurvey.org)
load(file=url("https://github.com/whittkilburn/ESSdata/raw/master/ess%20round%208%20workspace.rdata"))
The dataframe within the workspace is ess8
; it was imported from a Stata datafile with the foreign
package; factor labels were preserved for available columns, with one exception: the sampling weight column was replaced with a numeric version. –The factor labels are there to make it easier for students to tabulate variables without having to check attributes of each one, such as would be the case if the dataset was imported via the `haven’ package. For example, the variable recording an individual’s left-right self placement is preserved with factor labels:
table(ess8$lrscale)
##
## Left 1 2 3 4 5
## 1463 846 2121 3754 3780 12389
## 6 7 8 9 Right Refusal
## 4105 4269 3265 999 1592 1487
## Don't know No answer
## 4305 12
str(ess8$lrscale)
## Factor w/ 14 levels "Left","1","2",..: 1 2 6 1 6 6 5 6 6 6 ...
We will use the mutate()
function to add a new lrscale
variable to the dataset, at the end of it, called left_right
. The following code simply creates an identical copy of the old variable. We use ess8 <-
to store the contents of the mutate to the original dataset.
ess8 <- ess8 %>% #
mutate(left_right = lrscale)
## mutate(NEW = OLD) --- specify the name of the new variable = the original variable name..
table(ess8$left_right)
##
## Left 1 2 3 4 5
## 1463 846 2121 3754 3780 12389
## 6 7 8 9 Right Refusal
## 4105 4269 3265 999 1592 1487
## Don't know No answer
## 4305 12
For simplicity, let’s just work with the sample from Austria. We use filter()
to select the Austrian sample.
ess8_at<-ess8 %>%
filter(cntry=="AT")
Creating the survey design object.
The ESS survey contains two different weights: pweight
is the population size weight. pspwght
is the survey analysis weight. Since I’m working with just the Austrian sample, we’ll use the analysis weight. In a future post I’ll construct some multi-level analyses using both.
The code below creates an object, ess_design_at
, which specifies the design of the survey from the function svydesign()
. This object, ess_design_at
contains the info we need to account for the survey design:
It doesn’t show any results, just stores the design in ess_design_at
:
ess_design_at <-
svydesign(
ids = ~0 ,
strata=NULL,
weights=~pspwght,
data = ess8_at)
The part of the statement that says weights=~pspwght
sets the sampling weight. Apart from naming your own survey object something besides ess_design_at
, you will use your own weight variable in place of pspwght
.
The other parts of the statement are ids = ~0
means we don’t have a variable for sampling clusters. strata=NULL
means there was no stratification.Then we list the weight and the dataset. If you pass a qualitative – factor – variable to the survey object design statement, you will see an error Error in 1/as.matrix(weights) : non-numeric argument to binary operator
.
survey analysis
The survey analysis commands include both a variable to analyze and the survey object. svytable()
produces frequency counts. *For each survey analysis command, you will need to specify both the variable and the survey design object. In this case, the variable to analyze is left_right
, while the survey design object ess_design_at
is next.
svytable(~ left_right, ess_design_at )
## left_right
## Left 1 2 3 4 5
## 68.88656 38.72109 123.95784 170.99772 215.48897 651.09398
## 6 7 8 9 Right Refusal
## 220.82195 135.83134 107.26632 30.88962 56.81019 82.46806
## Don't know No answer
## 106.76636 0.00000
The decimal points are calculated because of the survey weights. We can round entries up to the nearest integer with round=TRUE
:
svytable(~ left_right, ess_design_at, round=TRUE )
## left_right
## Left 1 2 3 4 5
## 69 39 124 171 215 651
## 6 7 8 9 Right Refusal
## 221 136 107 31 57 82
## Don't know No answer
## 107 0
These are frequencies; We could calculate percentages by thinking through how each of the percentages within the table would be calculated. To shorten the code, we store the svytable()
function as table_a
:
table_a<-svytable(~ left_right, ess_design_at)
table_a/sum(table_a)*100
## left_right
## Left 1 2 3 4 5
## 3.427192 1.926422 6.167057 8.507349 10.720844 32.392735
## 6 7 8 9 Right Refusal
## 10.986167 6.757778 5.336633 1.536797 2.826378 4.102889
## Don't know No answer
## 5.311759 0.000000
Creating better formatted tables
Usually, the purpose of running these svytable()
functions is to calculate various statistics, then to create a neat table within a word processing software. There is one additional command to neaten up the results. It is knitr::kable(, digits=2)
, where the table goes before the comma. Here are two examples, The first is a set of simple frequencies. The second is the percentages:
knitr::kable(table_a, digits = 2)
left_right | Freq |
---|---|
Left | 68.89 |
1 | 38.72 |
2 | 123.96 |
3 | 171.00 |
4 | 215.49 |
5 | 651.09 |
6 | 220.82 |
7 | 135.83 |
8 | 107.27 |
9 | 30.89 |
Right | 56.81 |
Refusal | 82.47 |
Don’t know | 106.77 |
No answer | 0.00 |
# While the table header shows "Freq", these are actually percentages
knitr::kable(table_a/sum(table_a)*100 , digits=2)
left_right | Freq |
---|---|
Left | 3.43 |
1 | 1.93 |
2 | 6.17 |
3 | 8.51 |
4 | 10.72 |
5 | 32.39 |
6 | 10.99 |
7 | 6.76 |
8 | 5.34 |
9 | 1.54 |
Right | 2.83 |
Refusal | 4.10 |
Don’t know | 5.31 |
No answer | 0.00 |
We could change the column headers and a caption
knitr::kable(table_a/sum(table_a)*100 , digits=2, col.names=c("left to right ID", "Percent"))
left to right ID | Percent |
---|---|
Left | 3.43 |
1 | 1.93 |
2 | 6.17 |
3 | 8.51 |
4 | 10.72 |
5 | 32.39 |
6 | 10.99 |
7 | 6.76 |
8 | 5.34 |
9 | 1.54 |
Right | 2.83 |
Refusal | 4.10 |
Don’t know | 5.31 |
No answer | 0.00 |
We can calculate subsetted statistics, such as younger people, less than 30 yrs. There is an age
variable in the dataset, called agea
:
svytable(~ left_right, subset(ess_design_at, agea<30), round=TRUE)
## left_right
## Left 1 2 3 4 5
## 16 14 51 61 35 117
## 6 7 8 9 Right Refusal
## 30 26 22 6 14 15
## Don't know No answer
## 23 0
These are frequency counts, rounded to the nearest integer, per round=TRUE
.
When subsetting, the survey design goes inside the subset function. We can calculate mean self-placement on left right scale by party. But first let’s create a left-right variable that is simply left right and center:
ess8_at <- ess8_at %>%
mutate(left_right3cat = fct_collapse(left_right,
left = c("Left", "1" , "2", "3", "4"),
center = "5",
right = c("6", "7", "8", "9", "Right") ))
table(ess8_at$left_right3cat)
##
## left center right Refusal Don't know No answer
## 575 677 576 87 95 0
And not to get bogged down in more recoding, but we need to recode a measure of party support.
Let’s look at the example of the political party an Austrian respondent reports voting for in the last election:
table(ess8_at$prtvtbat)
##
## SP\xd6 \xd6VP
## 475 358
## FP\xd6 BZ\xd6
## 250 10
## Gr\xfcne KP\xd6
## 185 6
## NEOS Piratenpartei \xd6sterreich
## 36 4
## Team Frank Stronach Other
## 18 7
## Not applicable Refusal
## 418 205
## Don't know No answer
## 38 0
The labels contain stray character encodings from translating the responses into the English character set, and we would want to clean that up and provide more descriptive labels in English. So we will use fct_recode()
to alter the existing labels.
See part 2 of this post, next:
- Get link
- Other Apps
- Get link
- Other Apps
MorphVox Pro Crack
ReplyDeleteToon Boom Harmony Premium Crack
Stata Crack