Skip to main content

Using the survey package in R to analyze the European Social Survey, part 2


Using the survey package in R to analyze the European Social Survey, part 2

Recoding the party support measure
We copy paste the old labels and type the new:
ess8_at<-ess8_at %>%
  mutate(at_party_vote = fct_recode(prtvtbat,
    "Social Democratic Party SP" = "SP\xd6",
    "People's Party VP" = "\xd6VP",
    "Freedom Party FP" = "FP\xd6",
    "Alliance for the Future of Austria BZ"= "BZ\xd6",
    "The Greens Gr" = "Gr\xfcne",
    "Communist Party of Austria KP" = "KP\xd6",
    "New Austria and Liberal Forum NEOS" = "NEOS",
    "Pirate Party of Austria PIRAIT" = "Piratenpartei \xd6sterreich",
    "Team Stronach for Austria" = "Team Frank Stronach",
    NULL = "Other",
    NULL = "Not applicable",
    NULL = "Refusal"
    NULL = "Don't know",
    NULL = "No answer", ))

table(ess8_at$at_party_vote)
## 
##            Social Democratic Party SP 
##                                   475 
##                     People's Party VP 
##                                   358 
##                      Freedom Party FP 
##                                   250 
## Alliance for the Future of Austria BZ 
##                                    10 
##                         The Greens Gr 
##                                   185 
##         Communist Party of Austria KP 
##                                     6 
##    New Austria and Liberal Forum NEOS 
##                                    36 
##        Pirate Party of Austria PIRAIT 
##                                     4 
##             Team Stronach for Austria 
##                                    18

collapsing or grouping multiple labels together

The fct_collapse() function will group together labels into one new category, which you specify. Here, multiple responses are grouped into NULL, which in this case doesn’t require you to type NULL over and over:
ess8_at<-ess8_at %>%
  mutate(at_party_vote = fct_recode(prtvtbat,
    "Social Democratic Party SP" = "SP\xd6",
    "People's Party VP" = "\xd6VP",
    "Freedom Party FP" = "FP\xd6",
    "Alliance for the Future of Austria BZ"= "BZ\xd6",
    "The Greens Gr" = "Gr\xfcne",
    "Communist Party of Austria KP" = "KP\xd6",
    "New Austria and Liberal Forum NEOS" = "NEOS",
    "Pirate Party of Austria PIRAIT" = "Piratenpartei \xd6sterreich",
    "Team Stronach for Austria" = "Team Frank Stronach" )) %>%
  mutate(at_party_vote = fct_collapse(at_party_vote, 
     NULL = c("Other""Not applicable""Refusal""Don't know""No answer"
         )) 

table(ess8_at$at_party_vote)    
## 
##            Social Democratic Party SP 
##                                   475 
##                     People's Party VP 
##                                   358 
##                      Freedom Party FP 
##                                   250 
## Alliance for the Future of Austria BZ 
##                                    10 
##                         The Greens Gr 
##                                   185 
##         Communist Party of Austria KP 
##                                     6 
##    New Austria and Liberal Forum NEOS 
##                                    36 
##        Pirate Party of Austria PIRAIT 
##                                     4 
##             Team Stronach for Austria 
##                                    18

collapsing smaller groups into one “other” category

Given Austria’s multi-party system, we could combine the smaller political parties into an ‘other’ category. The function fct_lump() will do this automatically:
ess8_at<-ess8_at %>%
  mutate(at_party_vote = fct_lump(prtvtbat))
         
table(ess8_at$at_party_vote)
## 
##         SP\xd6         \xd6VP         FP\xd6       Gr\xfcne Not applicable 
##            475            358            250            185            418 
##        Refusal          Other 
##            205            119
Of course, we need to be careful. The default choices may not make sense. You can control the number of total categories with n=, such as fct_lump(prtvtbat, n=10).
## 
##   SP   VP   FP   BZ   Gr   KP NEOS  PIR Strn 
##  475  358  250   10  185    6   36    4   18

Removing unused factor labels

If a variable includes unused labels — for example a label records 0 people – it can be dropped with fct_drop(). This function simply removes the label that is unused.
ess8_at<-ess8_at %>%
  mutate(at_party_vote = fct_drop(prtvtbat))
And we will collapse a few categories of the party closeness variable:
ess8_at<-ess8_at %>%
  mutate(at_party_vote = fct_lump(at_party_vote), n=2)
         
table(ess8_at$at_party_vote)
## 
##    SP    VP    FP    Gr Other 
##   475   358   250   185    74
And we will recode the left right measures to exclude the “don’t know” and “no answer” responses:
ess8_at<-ess8_at %>%
  mutate(left_right = fct_recode(left_right, 
        NULL= "Refusal",
        NULL = "Don't know",
        NULL = "No answer"))

ess8_at<-ess8_at %>%
  mutate(left_right3cat = fct_recode(left_right3cat, 
        NULL= "Refusal",
        NULL = "Don't know",
        NULL = "No answer"))
Since we created a new variable and recoded the left right self placement, we need to reset the survey design object. We can update the existing object, or just overwrite it. :
ess_design_at <- 
  svydesign(
    ids = ~0 ,
    strata=NULL,
    weights=~pspwght,
    data = ess8_at)
We use the svyby() function, which allows us to calculate means, and other summary statistics, on a numeric variable across levels of a qualitative, factor variable. While the box whiskers plots showed median left right placement by political party, we could use svyby() to calculate arithmetic means. The variable upon which you want to calculate a mean is specified at the begining of the function, following a tilde symbol, ~.
Of course, we need to specify what to do once we encounter missing values, which are found throughout the dataset, before calculating the mean. We add na.rm=TRUE to remove responses with missing values prior to calculating the means. To calculate a mean, a variable needs to be treated as a numeric score; in the case below, we calculate a mean on the 10 point left to right scale; the mean ranges from 1 to 10. Because left_right is originally scored as a qualitative variable, we add as.numeric() to it, so that it is treated as a numeric score. –And remember, when coercing a factor to a numeric score, R will automatically convert the factor levels to integers beginning at 1. In the case of lrscale, this is fine.
svyby(~as.numeric(left_right), by=~at_party_vote, design=ess_design_at, svymean, na.rm=TRUE)
##       at_party_vote as.numeric(left_right)        se
## SP               SP               5.188813 0.1081252
## VP               VP               6.716724 0.0906762
## FP               FP               7.574956 0.1367827
## Gr               Gr               4.221286 0.1671370
## Other         Other               5.905669 0.2728092
The function svymean identifies arithmetic means as the statistic to calculate. One additional thing, while the results are not listed below, we could combine this command above with the subset() function. For example, if we had set the survey design on the entire dataset across all ESS nations, we could subset it by a particular country:
svyby(~left_right, ~at_party_vote, subset (ess_design, cntry=="AT"), svymean, na.rm=TRUE)
To produce column or row percentages, we use a prop.table() function to wrap around the svytable() function. margin=2 means column proportions. margin=1 would produce row proportions.
tab1<-prop.table(svytable(~left_right3cat + at_party_vote, ess_design_at), margin=2
## these are column proportions
knitr::kable(tab1*100digits = 2columns=2caption = "Percentage of Austrian party supporters identifying on left, right, or center")
Percentage of Austrian party supporters identifying on left, right, or center

SP
VP
FP
Gr
Other
left
50.86
9.86
6.74
77.54
32.13
center
37.16
39.38
30.47
14.50
22.29
right
11.98
50.76
62.79
7.96
45.58
We can add a sample weight adjusted Chi-square test of independence with statistic=f("Chisq").
Given a table of results, if it makes sense to do so you can construct simple barplots of the results. In this case, column percentages. Barplot graphic excluded:
barplot(tab1,beside=FALSE,legend=TRUEmain="left-right ID by party vote choice"ylab="proportion of left, right, or center placement")

A histogram of left right placement in Austria.

Since the variable is currently stored as a factor, we will use as.numeric() to make it numeric:
svyhist(~ as.numeric(left_right), ess_design_at, main="Left to right self identification"xlab="left (min) right (max)")
We can subset it with age variable or party
svyhist(~ as.numeric(left_right), subset (ess_design_at, agea<=35), main="Left to right ID, Austrians 35 and younger"xlab="left (min) right (max)")
svyhist(~ as.numeric(left_right), subset (ess_design_at, at_party_vote=="VP"))

A Boxplot.

It requires two variables – the numeric scores with which to calculate the boxplot, and a factor that determines the categories within which the scores are calculated.
svyboxplot(as.numeric(left_right) ~ at_party_vote , ess_design_at, na.rm=TRUE
           main="left right placement by party closeness, Austria",
           ylab="left (1) to right (11)")
More next post on regression modeling and visualization with the survey pacakge and analyzing other nationally representative datasets.

Comments

Popular posts from this blog

Using the survey package in R to analyze the European Social Survey, part 1

Using the survey package in R to analyze the European Social Survey For future reference, I’d like to have a record of tools for analyzing the European Social Survey, via the “survey” package by Lumley ( https://cran.r-project.org/web/packages/survey/survey.pdf ). In this post, I simply setup the survey object and demonstrate the tabulation of responses. The examples below require the survey , dplyr , and forcats packages: library(survey) library(dplyr) library(forcats) Below I load a version of the 8th round of the European Social Survey dataset ( http://europeansocialsurvey.org ) load(file=url("https://github.com/whittkilburn/ESSdata/raw/master/ess%20round%208%20workspace.rdata")) The dataframe within the workspace is ess8 ; it was imported from a Stata datafile with the foreign package; factor labels were preserved for available columns, with one exception: the sampling weight column was replaced with a

More contour and density plots [stat_density2d() and hdrcde()] of Michigan lottery sales in Grand Rapids

After the prior post of a density map of lottery sales, I thought perhaps I had incorrectly passed on some arguments within ggplot for the use of stat_density2d().  So I looked back through the documentation for  stat_density2d()at  docs.gg gplot2.org .  The example in the documentation is the Old Faithful geyser data, which I recalled from other contour/density plot analyses in Antony Unwin's Graphical Data Analysis with R .   Unwin's discussion of density plots relies on both ggplot() and the hdrcde() packages.  The two packages use different engines for density estimation/contour lines, so perhaps it could be interesting to compare the two.  Let's start with the contour/density estimation in Unwin's book.  Unwin begins with a scatterplot and contour lines for Old Faithful, which shows three distinct clusters of eruptions:  ggplot(geyser, aes(duration, waiting)) + geom_point() +        geom_density2d() +         ggtitle("Old Faithful geyser eruption d