Skip to main content


Showing posts from July, 2017

Choroplethic and Totoro-like density maps of Michigan lottery sales in Grand Rapids, with ggmap/ggplot2

The data start with Kent County lottery retailers, aggregated from the records of individual lottery retailers, with addresses geocoded through the Census Bureau. These records, with sales totals aggregated to Census tracts, were merged with Census records of various demographic characteristics of the tracts, such as median family income and educational attainment.  The Kent subset of the tract-level data, kent_lottery_census , is downloadable from the code at the bottom of the page and can be used to replicate the figures in this post. In  kent_lottery_census , the unit of observation is census tracts, where there are 128 rows corresponding to the tracts within Kent county, followed by income, population by degree attainment, total sales within the tract, sales for 2015, and a retailer or licensee count by tract.   With the tract level data, I wondered whether there was evidence that tracts including lower income households held both a larger share of retailers and overall sales

tidy text mining --- sentiment and most frequent words (MFW) analyses of Star Trek DS9 first episode, "The Nagus"

A few months ago, the Revolution Analytics newsletter directed me to the 'tidy data' approach to text mining by Julia Silge and David Robinson .  I began trying out their tidytext() R package on The Federalist papers, attempting to sort of replicate an analysis similar to Mosteller and Wallace (1964), and secondly inaugural addresses by U.S. presidents. The Federalist analysis ended up morphing into an application of Burrow's 'rolling delta' and the use of a different R text analytics package.  More on that in a subsequent post.   Silge and Robinson's text mining examples include a lexicon-based sentiment analysis of Jane Austen's novels.  On example included the net positive versus negative change in sentiment over the progression of each.  So while mulling over what to do next with tidy text mining, I was re-watching the pilot episode of my favorite Star Trek series, Deep Space Nine .   I wondered to what extent the dialogue spoken by characters in