Skip to main content

More contour and density plots [stat_density2d() and hdrcde()] of Michigan lottery sales in Grand Rapids

After the prior post of a density map of lottery sales, I thought perhaps I had incorrectly passed on some arguments within ggplot for the use of stat_density2d(). So I looked back through the documentation for stat_density2d()at docs.gggplot2.org.  The example in the documentation is the Old Faithful geyser data, which I recalled from other contour/density plot analyses in Antony Unwin's Graphical Data Analysis with R.  

Unwin's discussion of density plots relies on both ggplot() and the hdrcde() packages.  The two packages use different engines for density estimation/contour lines, so perhaps it could be interesting to compare the two. 

Let's start with the contour/density estimation in Unwin's book.  Unwin begins with a scatterplot and contour lines for Old Faithful, which shows three distinct clusters of eruptions: 

ggplot(geyser, aes(duration, waiting)) + geom_point() +
       geom_density2d() + 
       ggtitle("Old Faithful geyser eruption duration 
                and inter-eruption time in minutes")  


Unwin contrasts the default contour lines from ggplot()with specified proportions in hdrcde(). Selecting the proportions 1, 5%, 50%, and 75% results in the following plot, which does not show the three clusters as clearly as ggplot().

par(mar=3.1, 4.1, 1.1, 2.1)


with(geyser, hdr.boxplot.2d(duration, waiting, 
     show.points=TRUE, 
     prob=c(0.01, 0.05, 0.50, 0.75)))



There's not really any guidance in Unwin's book for selecting the proportions to plot. Or an explanation of the methodological differences between the two packages. Maybe appropriate density estimation and smoothness/proportion or bandwidth choices are obvious to a statistician, but not me. I guess since these are exploratory plots, we could select a range of proportions to examine how it affects the the contour lines. Anyhow, it's interesting that the proportions specified with hdrcde() do not as clearly show the three clusters. 

Let's try these plot packages with the lottery data, starting with a simple scatterplot of the geographic coordinates:

ggplot(Kentretaildensity3, aes(Long, Lat)) + geom_point()





This map is `zoomed out' to a greater extent than the lottery maps from the prior post. But you can still see the two primary north-south streets with high concentrations of lottery retailers, from Alpine avenue to the right of the -85.7 Longitude line, and Division running south of downtown a bit further to the right. Plainfield avenue is the curvy line extending out in a northeastern direction.  

We can plot some contour lines around these points to highlight some of the clusters. Unless I misunderstand the ggplot documentation, the geom_density2d() function relies upon the kde2d() function in the MASS package for a two-dimensional kernal density estimation. The MASS package explains that, without specifying the 'bins' or bandwith, a default is chosen according to the option 'bandwidth.rnd', about which I ran out of time to read! 

ggplot(Kentretaildensity3, aes(Long, Lat)) + geom_point() + geom_density2d() 




The contour lines display the same `epicenter' of lottery sales on the near west side of GR. The shape of the outermost contour line is similar to the lines from the map in the prior post, with a smaller extent of the data.  A small contour circle is drawn around the southeast smaller area of concentrated lottery sales.

The ggplot2 package conveniently drops missing values, but apparently hdrcde does not.  So first, I drop the addresses with missing coordinates --- 124 in total.  With the hdr.boxplot.2d() function, I draw attention to the highest density regions, regions that do not include proportions of the lottery retailers, starting at the 20th percentile and moving up through 20 percent increments.  So for example, in the plot below, the darkest region  centered around the area of the highest concentration of retailers, contains all but 80 percent of the retailers. The largest, lightest gray cloud of points contains all but 20 percent of the retailers. From there, the concentric areas include all but 40, and 60 percent.  (If I mis-state the explanation, perhaps a reader could correct my interpretation.) 

par(mar=c(3.1, 4.1, 1.1, 2.1)) # par setting from Unwin

with(Kentretaildensity2, hdr.boxplot.2d(Long, Lat, show.points=TRUE, prob=c(0.20, 0.40, 0.60, 0.80)))













I'm not aware of a method to add a map layer to the hdr plot . So we'll go back to adding the map layers to the contour lines.  

The zoom function on ggmap(), as in grmap<-get_map("grand rapids, mi", zoom = 11)doesn't line up perfectly with the points in the countour lines. I'm not sure how to work around that, so the contour lines + Google maps below show a slightly different extent of metro Grand Rapids. (Similarly, the zoom extent on a Google map around 'Kent County, MI' doesn't seem to work correctly, either.)

Here's the first map: 

## A bigger extent of metro GR than prior post, I think: 
grmap<-qmap("grand rapids, mi", zoom = 11)
plot(grmap)

grmap + 
 ggtitle("Michigan Lottery retailers in metro Grand Rapids, 2015") +
 geom_point(aes(x = Long, y = Lat), alpha = .4, 
            data = Kentretail) + theme(legend.position="none")















Now adding in the contour lines to this map, zoomed in one step further to Grand Rapids:

GrandRapidsMap2 +
ggtitle("Michigan Lottery retailers in metro Grand Rapids, 2015") +
geom_point(aes(x = Long, y = Lat), alpha = .4, 
           data = Kentretail) + theme(legend.position="none") + 
geom_density2d(aes(x=Long, y=Lat), bins=5, size=1, 
           data = Kentretail)








Adding in the shading, along with the points and contour lines:

GrandRapidsMap2 + 
ggtitle("Michigan Lottery retailers in metro Grand Rapids, 2015") +
geom_point(aes(x = Long, y = Lat), alpha = .4, 
               data = Kentretail) + theme(legend.position="none") + 
stat_density2d(aes(x=Long, y=Lat, fill=..level..,
               alpha=..level..), bins=5, size=1, data =
               Kentretail, geom="polygon")









Finally, one additional plot shows the point coordinates of lottery retailers along with a marker drawn according to the total sales of lottery tickets in 2015.  

grmap + ggtitle("Michigan Lottery retailers in metro Grand Rapids and total lottery sales in 2015") +
  geom_point(aes(x = Long, y = Lat, size = X2015), alpha = .4, 
             data = Kentretail) + theme(legend.position="none")













The plot above relies upon a default scaling for the size of the markers, which could be changed with scale_size() [such as scale_size(range = c(1, 10))], but given the difficulties interpreting maps like  these --visually observing differences in size, and the need to calculate the circles differently for better visual interpretation (see this brief guide), I'll leave the default settings.  

Geographers seem to prefer high contrast colors to range across the size of the circles as well. And of course, a legend would be useful, too: 

GrandRapidsMap2 +
 ggtitle("Michigan Lottery retailers in metro Grand Rapids and total
          lottery sales in 2015") +
 geom_point(aes(x = Long, y = Lat, size = X2015, 
            colour=X2015), alpha = .4, data = Kentretail) + 
 scale_colour_gradient(high = "red", low = "green") 



Sales are in U.S. dollars, reported by the Michigan Lottery Commission for annual totals in 2015.  In this map, notice how higher dollar lottery sales occurred outside of downtown, though there is a larger number of lower volume lottery retailers in the downtown area!

One final thing -- I can't figure out how to get the coordinates to plot on the map layer axes.  the get_map() function that produces a map with the labelled axes isn't compatible with the other ggplot functions.  If anyone has a suggestion for how to do this, please let me know.  

The code for each of the figures should appear below, in one bloc:




## The first set of lines below are from Unwin's http://www.gradaanwr.net/
library(ggplot2)

# page 80 of Unwin
data(geyser, package="MASS")
ggplot(geyser, aes(duration, waiting)) + geom_point()

## page 81 of Unwin
ggplot(geyser, aes(duration, waiting)) + geom_point() + geom_density2d() + ggtitle("Old Faithful geyser eruption duration and inter-eruption time in minutes")  

install.packages("hdrcde")
library(hdrcde)

par(mar=3.1, 4.1, 1.1, 2.1)
with(geyser, hdr.boxplot.2d(duration, waiting, show.points=TRUE, prob=c(0.01, 0.05, 0.50, 0.75)))

## plotting same contour style with the lottery data
## To compare with the GR map, subset the Kent county points over GR
names(Kentretail)
Kentretaildensity<-Kentretail
ggplot(Kentretaildensity, aes(Long, Lat)) + geom_point() + ggtitle("Coordinates of Michigan Lottery retailers in metro Grand Rapids")

## Adding contour lines
ggplot(Kentretaildensity, aes(Long, Lat)) + geom_point() + geom_density2d() + 
        ggtitle("Coordinates of Michigan Lottery retailers in metro Grand Rapids")

## trying hdrcde()
### first subsetting to non-missing addresses:
Kentretaildensity2<-subset(Kentretaildensity, is.na(Kentretaildensity$Lat)==FALSE)
table(is.na(Kentretaildensity2$Lat))
par(mar=c(3.1, 4.1, 1.1, 2.1))
with(Kentretaildensity2, hdr.boxplot.2d(Long, Lat, show.points=TRUE, prob=c(0.20, 0.40, 0.60, 0.80)))

## Map and Contour lines

## A bigger extent of metro GR than prior post: 
grmap<-qmap("grand rapids, mi", zoom = 11)
plot(grmap)

## points
grmap + ggtitle("Michigan Lottery retailers in metro Grand Rapids, 2015") +
  geom_point(aes(x = Long, y = Lat), alpha = .4, 
             data = Kentretail) + theme(legend.position="none")

## points and contour lines on  zoom=12
GrandRapidsMap2 + ggtitle("Michigan Lottery retailers in metro Grand Rapids, 2015") +
  geom_point(aes(x = Long, y = Lat), alpha = .4, 
             data = Kentretail) + theme(legend.position="none") + 
  geom_density2d(aes(x=Long, y=Lat), bins=5, size=1, data = Kentretail)

## points and contour lines on  zoom=12
## fill
GrandRapidsMap2 + ggtitle("Michigan Lottery retailers in metro Grand Rapids, 2015") +
  geom_point(aes(x = Long, y = Lat), alpha = .4, 
             data = Kentretail) + theme(legend.position="none") + 
  stat_density2d(aes(x=Long, y=Lat, fill=..level.., alpha=..level..), bins=5, size=1, data = Kentretail, geom="polygon")

### and marker size:
## points
GrandRapidsMap2 + ggtitle("Michigan Lottery retailers in metro Grand Rapids and total lottery sales in 2015") +
  geom_point(aes(x = Long, y = Lat, size = X2015), alpha = .4, 
             data = Kentretail) +  theme(legend.position="none")  

## Changing the default size and adding back the legend
GrandRapidsMap2 + ggtitle("Michigan Lottery retailers in metro Grand Rapids and total lottery sales in 2015") +
  geom_point(aes(x = Long, y = Lat, size = X2015), alpha = .4, 
             data = Kentretail) + scale_size(range = c(1, 10))
  
## color and size of marker
GrandRapidsMap2 + ggtitle("Michigan Lottery retailers in metro Grand Rapids and total lottery sales in 2015") +
  geom_point(aes(x = Long, y = Lat, size = X2015, colour=X2015), alpha = .4, 
             data = Kentretail) + 
  scale_colour_gradient(high = "red", low = "green") 

Comments

Popular posts from this blog

Using the survey package in R to analyze the European Social Survey, part 1

Using the survey package in R to analyze the European Social Survey For future reference, I’d like to have a record of tools for analyzing the European Social Survey, via the “survey” package by Lumley ( https://cran.r-project.org/web/packages/survey/survey.pdf ). In this post, I simply setup the survey object and demonstrate the tabulation of responses. The examples below require the survey , dplyr , and forcats packages: library(survey) library(dplyr) library(forcats) Below I load a version of the 8th round of the European Social Survey dataset ( http://europeansocialsurvey.org ) load(file=url("https://github.com/whittkilburn/ESSdata/raw/master/ess%20round%208%20workspace.rdata")) The dataframe within the workspace is ess8 ; it was imported from a Stata datafile with the foreign package; factor labels were preserved for available columns, with one exception: the sampling weight column was replaced with a

Using the survey package in R to analyze the European Social Survey, part 2

Using the survey package in R to analyze the European Social Survey, part 2 Recoding the party support measure We copy paste the old labels and type the new: ess8_at<-ess8_at  %>%    mutate ( at_party_vote =   fct_recode (prtvtbat,      "Social Democratic Party SP"  =  "SP \xd6 " ,      "People's Party VP"  =  " \xd6 VP" ,      "Freedom Party FP"  =  "FP \xd6 " ,      "Alliance for the Future of Austria BZ" =  "BZ \xd6 " ,      "The Greens Gr"  =  "Gr \xfc ne" ,      "Communist Party of Austria KP"  =  "KP \xd6 " ,      "New Austria and Liberal Forum NEOS"  =  "NEOS" ,      "Pirate Party of Austria PIRAIT"  =  "Piratenpartei  \xd6 sterreich" ,      "Team Stronach for Austria"  =  "Team Frank Stronach" ,      NULL =   "Other" ,      NULL =   "Not applicable" ,