Skip to main content


Showing posts from 2017

Choroplethic and Totoro-like density maps of Michigan lottery sales in Grand Rapids, with ggmap/ggplot2

The data start with Kent County lottery retailers, aggregated from the records of individual lottery retailers, with addresses geocoded through the Census Bureau. These records, with sales totals aggregated to Census tracts, were merged with Census records of various demographic characteristics of the tracts, such as median family income and educational attainment.  The Kent subset of the tract-level data, kent_lottery_census , is downloadable from the code at the bottom of the page and can be used to replicate the figures in this post. In  kent_lottery_census , the unit of observation is census tracts, where there are 128 rows corresponding to the tracts within Kent county, followed by income, population by degree attainment, total sales within the tract, sales for 2015, and a retailer or licensee count by tract.   With the tract level data, I wondered whether there was evidence that tracts including lower income households held both a larger share of retailers and overall sales

tidy text mining --- sentiment and most frequent words (MFW) analyses of Star Trek DS9 first episode, "The Nagus"

A few months ago, the Revolution Analytics newsletter directed me to the 'tidy data' approach to text mining by Julia Silge and David Robinson .  I began trying out their tidytext() R package on The Federalist papers, attempting to sort of replicate an analysis similar to Mosteller and Wallace (1964), and secondly inaugural addresses by U.S. presidents. The Federalist analysis ended up morphing into an application of Burrow's 'rolling delta' and the use of a different R text analytics package.  More on that in a subsequent post.   Silge and Robinson's text mining examples include a lexicon-based sentiment analysis of Jane Austen's novels.  On example included the net positive versus negative change in sentiment over the progression of each.  So while mulling over what to do next with tidy text mining, I was re-watching the pilot episode of my favorite Star Trek series, Deep Space Nine .   I wondered to what extent the dialogue spoken by characters in

More contour and density plots [stat_density2d() and hdrcde()] of Michigan lottery sales in Grand Rapids

After the prior post of a density map of lottery sales, I thought perhaps I had incorrectly passed on some arguments within ggplot for the use of stat_density2d().  So I looked back through the documentation for  stat_density2d()at .  The example in the documentation is the Old Faithful geyser data, which I recalled from other contour/density plot analyses in Antony Unwin's Graphical Data Analysis with R .   Unwin's discussion of density plots relies on both ggplot() and the hdrcde() packages.  The two packages use different engines for density estimation/contour lines, so perhaps it could be interesting to compare the two.  Let's start with the contour/density estimation in Unwin's book.  Unwin begins with a scatterplot and contour lines for Old Faithful, which shows three distinct clusters of eruptions:  ggplot(geyser, aes(duration, waiting)) + geom_point() +        geom_density2d() +         ggtitle("Old Faithful geyser eruption d

Locating Michigan Lottery Retailers with the U.S. Census Bureau Geocoder API in R, part 1

This post records the process of cleaning and geocoding U.S. address records. The addresses are official records of Michigan lottery retailers.   Michigan contracts with one of the multi-national gambling companies to run the State lottery, and participates in the multi-state Powerball lottery as well. The data described here are retailer based records of lottery sales from 2003 to 2016. The purpose of the geocoding was to explore the distribution of sales across the State and to prepare to merge the dataset with various Census Bureau measures of geographic place characteristics. Of course, there’s a lengthy literature on individual level correlates of lottery ticket purchases , but I have been curious about where retailers tend to cluster throughout cities and townships in the State.  In what follows I describe the process of geocoding latitude and longitude coordinates — as well as U.S. Census geographies – of Michigan lottery retailers. The lottery sales data were provided to