Skip to main content

Choroplethic and Totoro-like density maps of Michigan lottery sales in Grand Rapids, with ggmap/ggplot2

The data start with Kent County lottery retailers, aggregated from the records of individual lottery retailers, with addresses geocoded through the Census Bureau. These records, with sales totals aggregated to Census tracts, were merged with Census records of various demographic characteristics of the tracts, such as median family income and educational attainment.  The Kent subset of the tract-level data, kent_lottery_census, is downloadable from the code at the bottom of the page and can be used to replicate the figures in this post.

In kent_lottery_census, the unit of observation is census tracts, where there are 128 rows corresponding to the tracts within Kent county, followed by income, population by degree attainment, total sales within the tract, sales for 2015, and a retailer or licensee count by tract.  

With the tract level data, I wondered whether there was evidence that tracts including lower income households held both a larger share of retailers and overall sales, relative to higher income tracts. The scatterplot below (Figure 1) displays lottery sales per capita within the tracts, by median family income.  The size of each point displays a rough approximation, in tens, of the number of retailers in each tract. The curved portion of the LOESS regression line follows the skewed distribution of sales at the moderate family income of $40,000, and returns to a linear trend to the near zero sales per capita among the County's highest income census tracts at $120,000.

Of course, the focus on per capita sales is misleading; retailers do not exclusively sell tickets to census tract residents. And the leading retailers of lottery tickets are located on major Grand Rapids thoroughfares and in grocery or convenience stores and bars.

Figure 1. lottery sales per capita in Kent County census tracts decrease over tract median family income
Still, the same pattern of sales across tract income persists when focusing on total sales, rather than per capita sales: sales decrease through higher median family income.  Figure 2 displays sales in millions of U.S. dollars, simply the sum total of all lottery ticket sales within census tracts over 2007 to 2016.  The largest volume of sales occurred in census tract 19, with over $51 million in total sales since 2007.  In census tract 124, to the eastern suburbs of Grand Rapids, the smallest non-zero volume of sales occurred, totaling just $6,100, perhaps owing to one retailers that may has suspended sales. I did not investigate further to determine the source of these sales.

I attempted a few choroplethic maps based on census tract boundaries. With the boundary line files of Michigan tracts from the Census Bureau, I created a shapefile subset in ArcGIS for Kent County alone.

I plotted the sales data from figure 2 within the census tracts displayed on a Google map base layer. The color differences, from magenta to green for higher total sales sort of mimics the 'magenta2green' color ramp from RColorBrewer.  While some tracts are dark magenta, showing little to no ticket sales, the lighter magenta to green tracts are centered around Grand Rapids city and the major streets leading in to Plainfield Twp. and Walker to the north, and Wyoming and Kentwood to the south.  

The bright green tract in the middle of the map is the epicenter of lottery sales in metro Grand rapids, the neighborhood to the west of the Grand River.  Plotting sales totals from a single year on a basemap layer --- in contrast with a prior post, using Stamen maps in 'toner' -- shows the distribution of lottery retailers and sales totals for calendar year 2015, the last full year for which I have sales data.  The 'toner' style map  contrasts well with the green and pink points.  As discussed in a prior post, while there are more retailers in central Grand Rapids, the higher volume retailers tend to be located on the periphery of the city.  The largest green point is in the northeast quadrant of the map, on Plainfield avenue near Plainfield Twp.  

Incidentally, the top ten retailers appear to be visible within the map and are listed below.  The largest retailer is the Meijer on Plainfield Avenue, which sold $972,316.50 worth of lottery tickets in 2015.

The lottery records seem to show three distinct types of retailers: 1) grocery stores, such as Meijer, 2) pizza/bar places, such as Florentine's and 3) the party and convenience stores.

One final map: I plotted the location of lottery retailers by year, from 2007 to 2016.  The Michigan Lottery began in the 1970s, and nine years is not a large span of time for the duration of lottery sales. Still, the changes in the density of retailer locations show that over time, the density of retailers on the near west side has emerged from a a broader cluster of retailers within downtown Grand Rapids.

R code for replicating these results are below. Note the resemblance of these density plots to a Totoro.



## tracts where there were no sales of tickets have NA replaced with 0
kent_lottery_census<- kent_lottery_census %>%
  mutate(n=ifelse(, 0, n))

# FIGURE HERE -------------------------------------------------------------
#  print(n=20) 
ggplot(data=kent_lottery_census, mapping = aes(x=median_family_income, y=pc_sales)) + 
geom_point(aes(size=n), alpha=1/3) +
  scale_y_continuous(breaks=seq(0, 16000, by=2000)) +
  scale_x_continuous(breaks=seq(0, 125000, by=20000)) +
    geom_smooth(se=FALSE) +
    title="Lottery sales per capita by median family income, Kent County census tracts",
    subtitle="Number of retailers (n) per census tract",
    caption="Author's own analysis. Data from U.S. Census Bureau, Michigan Lottery",
    y="total lottery sales (2003-2015) per capita (2010 Decennial Census )",
    x="median family income within 2010 Census tracts, in 2010 US Dollars"

kent_lottery_census <- kent_lottery_census %>%

ggplot(data=kent_lottery_census, mapping = aes(x=median_family_income, y=sales_millions)) + 
  geom_point(aes(size=n), alpha=1/3) +
  scale_y_continuous(breaks=seq(0, 300, by=5)) +
  scale_x_continuous(breaks=seq(0, 125000, by=20000)) +
  geom_smooth(se=FALSE) +
    title="Total Lottery sales in millions by median family income, Kent County census tracts",
    subtitle="Number of retailers (n) per census tract",
    caption="Author's own analysis. Data from U.S. Census Bureau, Michigan Lottery",
    y="total lottery sales (2003-2015) in millions of annual dollars ",
    x="median family income within 2010 Census tracts, in 2010 US Dollars"


## Download these two files to your working directory for the Shapefile:

kent_map_tracts<-fortify(kent_map_tracts, region="NAME")

kc + geom_polygon(aes(x=long, y=lat, group=group), data=kent_map_tracts, 
                  color="white", fill="black", alpha=.4, size=.3) 

## make sure the tract ids have correct decimal place:
# kent_lottery_census$tract<-kent_lottery_census$tract/100
# kent_lottery_census$tract<-kent_lottery_census$tract*100


## two datasets here, first is census data, second is map


kent_lottery_census<-kent_lottery_census %>%

## merge the kent sales and census dataset onto the kent_map_tracts object
kent_map_tracts<-left_join(kent_map_tracts, kent_lottery_census, by="id")  

kent_map_tracts <- kent_map_tracts %>%

## the base layer for the map is kc 
kc<-qmap('Kent County, Michigan', zoom= 10)

kc  + geom_polygon(aes(x=long, y=lat, group=group, fill=sales_millions), data=kent_map_tracts, 
                  color="white",  alpha=.6, size=.3) + 
scale_fill_gradient(high = "green", low = "magenta2")  + labs(fill="sales in US Millions", 
  title="Michigan Lottery ticket sales in Kent County, 2007 to mid 2016", subtitle="by U.S. Census Bureau Tracts, 2010 vintage",
  caption="Author's own analysis,") 

# kc3<-qmap('Grand Rapids, Michigan', zoom= 12, maptype = "toner", source = "stamen")
kc3+ ggtitle("Michigan Lottery retailer sales in metro Grand Rapids, 2015") +
  geom_point(aes(x = Long, y = Lat, size = X2015, colour=X2015), alpha = .6, 
             data = Kentretail) + scale_colour_gradient(high = "green2", low = "magenta2") +
  labs(size="2015 sales in USD", colour=" ")



  filter(City=="GRAND RAPIDS") %>%
  arrange(desc(`2015`)) %>%
  select(Retailer.Name, `2015`,everything())

Kentretail_tibble<-Kentretail_tibble %>%
  rename(`2003`=X2003, `2004`=X2004, `2005`=X2005, `2006`=X2006, `2007`=X2007, `2008`=X2008,
         `2009`=X2009,`2010`=X2010, `2011`=X2011,`2012`=X2012,`2013`=X2013,`2014`=X2014, `2015`=X2015,
Kentretail_tibbleLONG <- Kentretail_tibble %>%
  gather(`2003`, `2004`, `2005`, `2006`, `2007`, `2008`,
         `2009`, `2010`, `2011`, `2012`, `2013`, `2014`, `2015`, `2016`, key=year, value=sales) 

## Change Long to NA if sales == NA, otherwise NA
Kentretail_tibbleLONG <- Kentretail_tibbleLONG %>%
    mutate(Long=ifelse(, NA, Long)) %>%
    mutate(Lat=ifelse(, NA, Lat)) 

kc3 + 
    geom_point(aes(x = Long, y = Lat), alpha = .4, 
               data = Kentretail_tibbleLONG) + theme(legend.position="none") + 
    stat_density2d(aes(x=Long, y=Lat, fill=..level..,
                 alpha=..level..), bins=6,  data=Kentretail_tibbleLONG, geom="polygon")  +
      scale_fill_gradient(high = "green4", low = "magenta2") + 
      facet_wrap(~ year) +
     labs(title="Density of Michigan Lottery retailers, 2003-2016, Grand Rapids", subtitle="density of retailers from low (magenta) to high (green)",
          caption="Author's own analysis. Data source: Michigan Lottery and US Census Bureau")


Popular posts from this blog

Using the survey package in R to analyze the European Social Survey, part 1

Using the survey package in R to analyze the European Social Survey For future reference, I’d like to have a record of tools for analyzing the European Social Survey, via the “survey” package by Lumley ( ). In this post, I simply setup the survey object and demonstrate the tabulation of responses. The examples below require the survey , dplyr , and forcats packages: library(survey) library(dplyr) library(forcats) Below I load a version of the 8th round of the European Social Survey dataset ( ) load(file=url("")) The dataframe within the workspace is ess8 ; it was imported from a Stata datafile with the foreign package; factor labels were preserved for available columns, with one exception: the sampling weight column was replaced with a

Using the survey package in R to analyze the European Social Survey, part 2

Using the survey package in R to analyze the European Social Survey, part 2 Recoding the party support measure We copy paste the old labels and type the new: ess8_at<-ess8_at  %>%    mutate ( at_party_vote =   fct_recode (prtvtbat,      "Social Democratic Party SP"  =  "SP \xd6 " ,      "People's Party VP"  =  " \xd6 VP" ,      "Freedom Party FP"  =  "FP \xd6 " ,      "Alliance for the Future of Austria BZ" =  "BZ \xd6 " ,      "The Greens Gr"  =  "Gr \xfc ne" ,      "Communist Party of Austria KP"  =  "KP \xd6 " ,      "New Austria and Liberal Forum NEOS"  =  "NEOS" ,      "Pirate Party of Austria PIRAIT"  =  "Piratenpartei  \xd6 sterreich" ,      "Team Stronach for Austria"  =  "Team Frank Stronach" ,      NULL =   "Other" ,      NULL =   "Not applicable" ,  

More contour and density plots [stat_density2d() and hdrcde()] of Michigan lottery sales in Grand Rapids

After the prior post of a density map of lottery sales, I thought perhaps I had incorrectly passed on some arguments within ggplot for the use of stat_density2d().  So I looked back through the documentation for  stat_density2d()at .  The example in the documentation is the Old Faithful geyser data, which I recalled from other contour/density plot analyses in Antony Unwin's Graphical Data Analysis with R .   Unwin's discussion of density plots relies on both ggplot() and the hdrcde() packages.  The two packages use different engines for density estimation/contour lines, so perhaps it could be interesting to compare the two.  Let's start with the contour/density estimation in Unwin's book.  Unwin begins with a scatterplot and contour lines for Old Faithful, which shows three distinct clusters of eruptions:  ggplot(geyser, aes(duration, waiting)) + geom_point() +        geom_density2d() +         ggtitle("Old Faithful geyser eruption d