Using tidycensus and leaflet to map Census data

By Julia Silge

June 24, 2017

Recently, I have been following the development and release of Kyle Walker’s tidycensus package. I have been filled with amazement, delight, and well, perhaps another feeling…

But seriously, I have worked with US Census data a lot in the past and this package

  • is such a valuable addition to the R ecosystem and
  • would have saved me SO MUCH ENERGY, HEADACHE, and TIME.

I was working this weekend on a side project with an old friend about opioid usage in Texas and needed to download some Census data again. A perfect opportunity to give this new package a little run-through!

Exercising my joygret

Before running code like the following from tidycensus, you need to obtain an API key from the Census and then use the function census_api_key() to set it in R.

library(tidyverse)
library(tidycensus)

texas_pop <- get_acs(geography = "county", 
                     variables = "B01003_001", 
                     state = "TX",
                     geometry = TRUE) 

texas_pop
## Simple feature collection with 254 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
## # A tibble: 254 x 6
##    GEOID                    NAME   variable estimate   moe               geometry
##    <chr>                   <chr>      <chr>    <dbl> <dbl> <S3: sfc_MULTIPOLYGON>
##  1 48007   Aransas County, Texas B01003_001    24292     0 <S3: sfc_MULTIPOLYGON>
##  2 48025       Bee County, Texas B01003_001    32659     0 <S3: sfc_MULTIPOLYGON>
##  3 48035    Bosque County, Texas B01003_001    17971     0 <S3: sfc_MULTIPOLYGON>
##  4 48067      Cass County, Texas B01003_001    30328     0 <S3: sfc_MULTIPOLYGON>
##  5 48083   Coleman County, Texas B01003_001     8536     0 <S3: sfc_MULTIPOLYGON>
##  6 48091     Comal County, Texas B01003_001   119632     0 <S3: sfc_MULTIPOLYGON>
##  7 48103     Crane County, Texas B01003_001     4730     0 <S3: sfc_MULTIPOLYGON>
##  8 48139     Ellis County, Texas B01003_001   157058     0 <S3: sfc_MULTIPOLYGON>
##  9 48151    Fisher County, Texas B01003_001     3858     0 <S3: sfc_MULTIPOLYGON>
## 10 48167 Galveston County, Texas B01003_001   308163     0 <S3: sfc_MULTIPOLYGON>
## # ... with 244 more rows

There we go! The total population in each county in Texas, in a tidyverse-ready data frame. If you want to get information for multiple states, just use purrr. The US Census tabulates lots of important kinds of information here in the United States, although there has been troubling uncertainty about leadership and funding there in recent months.

So we have this data in a form that will be easy to manipulate; what if we want to map it? Kyle Walker again has this taken care of, with his tigris package (a dependency of tidycensus); if you set geometry = TRUE the way that I did when I downloaded the Census data above, tigris handles downloading the shapefiles from the Census, with support for sf simple features. Kyle has a vignette for mapping using ggplot2, but you can also pipe straight into leaflet.

library(leaflet)
library(stringr)
library(sf)

pal <- colorQuantile(palette = "viridis", domain = texas_pop$estimate, n = 10)

texas_pop %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(estimate)) %>%
    addLegend("bottomright", 
              pal = pal, 
              values = ~ estimate,
              title = "Population percentiles",
              opacity = 1)

What is that st_transform doing? Well, I am no cartographer and I am still fuzzy on these issues, but it is doing a projection onto a certain reference system of the spatial information contained in the sf column. The specific choice of an EPSG code of 4326 is for a given projection.

A couple more examples

Let’s look at the counties in Utah (where I live) while we’re at it. Let’s map color to population here, instead of quantiles, just for something different.

utah_pop <- get_acs(geography = "county", 
                    variables = "B01003_001", 
                    state = "UT",
                    geometry = TRUE)

pal <- colorNumeric(palette = "plasma", 
                    domain = utah_pop$estimate)

utah_pop %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(estimate)) %>%
    addLegend("bottomright", 
              pal = pal, 
              values = ~ estimate,
              title = "County Populations",
              opacity = 1)

Yep, that is right, although still remarkable to me. Utah is largely an extremely rural state, with lots of people here in Salt Lake City where I live and then in the corridor to the north and south.

There is so much other information available from the Census. For example, what if I want to look at the median home value in Salt Lake County, at the census tract level?

slc_value <- get_acs(geography = "tract", 
                    variables = "B25077_001", 
                    state = "UT",
                    county = "Salt Lake County",
                    geometry = TRUE)

pal <- colorNumeric(palette = "viridis", 
                    domain = slc_value$estimate)

slc_value %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(estimate)) %>%
    addLegend("bottomright", 
              pal = pal, 
              values = ~ estimate,
              title = "Median Home Value",
              labFormat = labelFormat(prefix = "$"),
              opacity = 1)

The two census tracts with NA values are the airport on the west side and the University of Utah on the east side. You can very obviously see the east-to-west gradient that comes as no surprise to us locals, and that priciest tract is up against one of the canyons with beautiful views. But mainly please notice with what ease I made this interactive map!

The End

Maybe the main reason I wrote up this blog post is to say how streamlined and easy it now is to get Census data into R and plot it, but another reason is to demonstrate, at least to myself, how little effort it takes to make a blog post with, say, interactive leaflet components with my new blogging workflow. I recently changed my blog from a Jekyll blog hosted on GitHub Pages to a blog built with blogdown and Hugo, deployed using Netlify. I am finding this workflow so great, and this post with its leaflet maps went off without a hitch! I would like to say a big THANK YOU to Yihui Xie for his work on R Markdown, knitr, and blogdown. Let me know if you have any questions!

Posted on:
June 24, 2017
Length:
9 minute read, 1849 words
Tags:
rstats
See Also:
Topic modeling for #TidyTuesday Spice Girls lyrics
Predicting viewership for #TidyTuesday Doctor Who episodes
Spatial resampling for #TidyTuesday and the #30DayMapChallenge