14  Mapping with R

Published

August 30, 2024

Keywords

geospatial, sf simple features, tidycensus, tmap, leaflet, rnatural earth, usmap

14.1 Introduction

14.1.1 Learning Outcomes

  • Develop strategies for geospatial analysis based on constructs such as vectors vs raster maps, simple features, and Coordinate Reference Systems (CRS).
  • Use {sf} package functions to identify and convert the CRS for geospatial data.
  • Create static choropleth maps using {ggplot2} with Census Bureau data and other data.
  • Create interactive maps in R using {leaflet}.
  • Create raster and vector maps using {tmaps}, {rnaturalearth}, and {maps}.
  • Create choropleth maps of US States and Counties, with Alaska and Hawaii re-positioned, with {usmap}.

14.1.2 References:

14.1.2.1 Other References

14.2 Key Constructs in Geospatial Analysis

14.2.1 Open Geospatial Information

Geospatial information is about things and their location in the world.

  • Humans have been interested in geospatial information for millenia, even before the “geometers” of Egypt used ropes and data to recreate the boundaries of farmland after Nile river would flood.
  • Geospatial information allows us to make maps.

There are many different standards and systems for creating, organizing, and sharing geospatial information.

Our focus is on open-source standards and data as promulgated by the Open Geospatial Consortium (OGC).

14.2.2 Vector versus Raster Models/Data

There are two main models for representing the world: Vector and Raster.

  • The vector data model represents the world using points, lines, and polygons.
    • These have discrete, well-defined borders.
    • Vector data tends to dominate the social sciences because human settlements tend to have discrete borders.
  • The raster data model divides the surface up into cells of constant size.
    • Raster datasets are the basis of background images used in web-mapping.
    • Raster models aggregate spatially specific features to a given resolution.
    • Raster dominates many environmental sciences because of the reliance on remote sensing data.

These two models are fundamentally different in how they represent the world so they use different methods and tools.

  • They can work together though, e.g., place a raster background under a vector map.

We will focus on vector models with a few examples of raster data.

14.2.3 Geospatial Features

We use maps to display information about features (things or objects) of, or in, our world.

  • Features can be stand alone or composed of other features, e.g. a state can be composed of counties, or a forest could be composed of individual trees.

Features have two kinds of attributes.

  • Spatial attributes describe the geometry of the feature.
    • This includes the location and shape of the feature.
    • The geometry is described based on a reference point of the feature, e.g., a point, a point on an edge, or the center point.
  • Non-spatial attributes describe other information about or associated with the feature.
    • These could be physical such as color, material, or more abstract concepts such as its population or average vaccination rate.

14.2.4 Simple Features

The OGC maintains a multi-part standard for “Simple Features” to provide a precise and consistent set of definitions and methods for representing and manipulating data about features.

This set of standards defines a Simple Feature as having “both spatial and non-spatial attributes.

  • Spatial attributes are geometry valued, and simple features are based on 2D geometry with linear interpolation between vertices.
  • This definition has been extended to 3D/4D features so there is lots of flexibility.

14.2.4.1 SF Points

“Geometry-valued” means simple feature geometries are described using points specified as coordinates in 2, 3, or even 4-dimensional space.

  • The point coordinates are defined as X, Y, Z, and sometimes the data may have an M.
    • X specifies “easting” or “longitude” and Y specifies “northing” or “latitude”. Together they are XY.
    • Z specifies altitude, if present for XYZ.
    • M specifies some other dimension just for that point, e.g., time or temperature.

14.2.4.2 SF Geometry Types

Geometry Type Description
POINT zero-dimensional geometry containing a single point
LINESTRING sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry
POLYGON geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring
MULTIPOINT set of points; a MULTIPOINT is simple if no two Points in the MULTIPOINT are equal
MULTILINESTRING set of linestrings
MULTIPOLYGON set of polygons
GEOMETRYCOLLECTION set of geometries of any type except GEOMETRYCOLLECTION

Figure 14.1 shows various types of simple features. Lovelace, Nowosad, and Muenchow (2023)

Figure 14.1: Simple Features

14.2.5 Coordinate Projections

Working with mapping data can involve a lot of complex math to ensure different data sources are using the same methods for identifying location and/or shape.

  • Fortunately, you don’t have to do the math!

Many mapping or Geospatial Information Systems (GIS), regardless of programming language, use three open-source libraries to access standard functions or data to manipulate SF data.

  • The GEOS or Geometry Engine, Open Source library. This contains fundamental algorithms for processing linear geometry on the 2-d Cartesian plane. Functions include shifting, rotating, expanding, measuring distances, areas, etc..
  • The GDAL (goo-dal) library. This contains functions for reading or writing vector or raster files and a translator library for converting between raster and vector geospatial data formats.
  • The PROJ library. This provides generic functions to transform geospatial coordinates from one coordinate reference system (CRS) to another. This is to help you ensure all data sources in a map are using the same CRS.

14.2.6 Datums, CRS, and Projections

The underlying Math is broken out into three different sets of information: a Datum, a Coordinate Reference System, and a Projection.

A Datum is a mathematical model of a 3D object that creates an underlying reference model, e.g., a sphere, that allows us to use coordinates to identify points in each dimension of the model, typically x, y, z.

  • Unfortunately, the earth is not close to a perfect sphere or ellipsoid (it’s an irregular squashed ellipsoid).
  • The surface of the earth is also rough, so no model is equally accurate everywhere.
  • A more complex mathematical model called a geoid is used to represent the earth.
  • The parameters of the geoid model can “be”tweaked” to accommodate a portion of the earth more accurately.
  • The “tweaked” version of the geoid is known as a datum.
  • Since a geoid and datum are representing a sphere, there are infinitely many ways for choosing an origin, the point where x=0, y=0, and z=0 (you identify every other point with respect to the origin). Thus we need a Coordinate Reference System.

Given a datum, a Coordinate Reference System (CRS) starts with a chosen datum and then defines a specific point in that datum as the origin.

  • A CRS may use angular measures for x and y (longitude and latitude) and define a point from the “center” in terms of angles relative to the origin.
  • Think of the origin as a point of the surface of the datum, e.g., where longitude is 0, latitude is 0, and elevation is 0.
  • The origin is the “reference value” for 0,0,0 and all other points are defined relative to that origin.

A Projection takes the coordinates for a point in a given 3-D CRS and mathematically converts them to identify a point on a 2D Cartesian surface.

  • Each projection chooses its own origin for x, y on a 2D plane.
  • This is often the lower, left-hand corner of the plane for the areas of interest.
  • A projection uses the terms Easting and Northing to describe the x and y values from its own origin for the plane.

Datums, CRSs, and Projections are constantly being updated as new data comes in from satellites, as the earth’s shape changes, and the tectonic plates move affecting surface heights.

14.2.6.1 Coordinate Reference Systems (CRS)

A Coordinate Reference System (CRS) defines how the spatial elements of a feature relate to the surface of the Earth.

  • The CRS uses a mathematical model (datum) for representing a 3D location/shape on a 3D or 2d planar surface.
Important

A CRS can get quite complicated as the earth is Not a perfect sphere and is Not perfectly smooth.

  • This means there are accuracy/precision trade offs among attributes such as location, linear distances, direction, and areas, as well as the interpretability of the visual representation.

  • Any CRS has to make choices about which aspects of a feature or a set of features to make most accurate while sacrificing accuracy on others.

  • There are two types of CRS, geographic or projected,which make different trade offs across the measurement attributes.

14.2.6.1.1 Geographic CRS use Angular Measures to locate a Feature on a 3-D “Sphere”

Geographic CRS identify any location on the Earth’s surface using two values: longitude and latitude which are angular measurements from a fixed point on the globe (0,0).

  • Remember a Geographic CRS as “G stands for Globe”.

A common geocentric datum is WGS84 (World Geodetic System 1984).

  • WGS 84 tries to provide a global perspective. It uses the Earth’s center of gravity as its (0,0) center so the accuracy of projections is not optimized for a specific location.
  • Local datum, such as NAD83 (North American Datum 1983), shift the ellipsoidal surface to better align with the surface at a particular location (e.g., North America).
    • NAD83 is what is used by the Global Positioning System (GPS).

Hundreds of datum have been developed over the last few centuries but with the advent of better satellite-enabled measurements, countries and organizations are focusing on updating common standards like WGS84, NAD83, or other local datum, e.g., the British isles Ordnance Survey National Grid.

Important

Geographic (or geocentric) CRS use angular measurements (Long, Lat) so they are not accurate in measuring distances using linear measures such as meters.

14.2.6.2 Projected CRS transform the Angular Measures from a Geographic CRS to Linear Measures on a Plane

Projected CRSs transform the measures of long, lat from a 3-D globe (as defined by a geographic CRS) into 2-D Cartesian Coordinates on an implicitly flat plane, like a map.

  • Remember a projected CRS as “P stands for Plane”.
  • Think of drawing a triangle on an orange and then what happens when you smash the orange flat on the table.

Each projected CRS has its own (0,0) origin, x and y axes, and a linear unit of measurement such as meters.

Important

Converting the 3-D surface of the Earth into 2-D Easting and Northing (x and y) values always causes some distortion in other metrics such as area, direction, distance, and shape.

If you need to do measurements using linear distance, distance between points, or display a linear or radial distance from a feature or point, use a projected CRS.

Depending upon the analysis, there are different types of projections designed to preserve accuracy for (at most) one or two of the attributes.

  • All projected CRSs are either equal-area, equidistant, conformal (with shapes remaining unchanged), or some combination of compromises of those.

A popular projected CRS is the Universal Transverse Mercator (UTM).

The UTM is actually a set of CRSs. The UTM divides the Earth into 60 longitudinal wedges and 20 latitudinal segments.

  • Each has its own CRS to provide the best accuracy for that region.

Every place on Earth (other than the polar regions) has a UTM code to identify the wedge and segment.

  • “60H” refers to northern New Zealand.

To find the UTM code for a given longitude and latitude, write a short function as shown below.

  • You can then use sf::st_crs(my_utm_code) to look up the CRS for the UTM code and the output tells you the type of CRS and the underlying datum.
# https://stackoverflow.com/questions/9186496/determining-utm-zone-to-convert-from-longitude-latitude/9188972#9188972

lonlat2UTM <- function(lonlat) {
  utm <- (floor((lonlat[1] + 180) / 6) %% 60) + 1
  if (lonlat[2] > 0) {
    utm + 32600
  } else {
    utm + 32700
  }
}
## Source the function and then use it on a longitude and latitude of your choice
    utm_au <- lonlat2UTM(c( -77.0888, 38.9375)) ## Long, Lat for AU
## Look up the EPSG Code
    sf::st_crs(utm_au) ## get the EPSG code and check https://epsg.io/32618
Coordinate Reference System:
  User input: EPSG:32618 
  wkt:
PROJCRS["WGS 84 / UTM zone 18N",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 18N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Between 78°W and 72°W, northern hemisphere between equator and 84°N, onshore and offshore. Bahamas. Canada - Nunavut; Ontario; Quebec. Colombia. Cuba. Ecuador. Greenland. Haiti. Jamaica. Panama. Turks and Caicos Islands. United States (USA). Venezuela."],
        BBOX[0,-78,84,-72]],
    ID["EPSG",32618]]

14.2.6.3 Specifying the CRS

Each CRS can be identified by an EPSG code for a standard CRS, or a proj4string definition for a customized CRS.

  • An EPSG code refers to one specific projection out of the PROJ library.
  • The EPSG stands for the European Petroleum Survey Group which first created the data set in 1985.
  • To see all 6,720 CRS available in R, use crs_data <- rgdal::make_EPSG(), then View(crs_data).

14.2.6.4 Some Common CRS EPSG Codes for North America

See EPSG Geodetic Parameter Dataset for info on any CRS.

  • 4326 is the WGS84.

    • Geographic with Lat and Long.
    • The WGS84 is the most common CRS in the world so is a convenient baseline for transforming other data to to be consistent.
    • Commonly used by organizations that provide GIS data for the entire globe or many countries.
    • It is the CRS used by Google Earth as the default.
  • 4269 is NAD83 (North American Datum 1983).

    • Geographic with Lat and Long.
      • Most commonly used by U.S. federal agencies, e.g., GPS.
      • Although WGS84 and NAD83 are not exactly the same they are generally considered equivalent.
      • Don’t bother to transform between NAD83 and WGS84 unless you require extremely high precision (rare).
  • 8255 NAD83(CSRS)v7 is a geographic CRS used in Canada.

  • 26915 is a projected CRS used in North America

    • UTM zone 15N.
    • It is based on the NAD83 datum and as a projected CRS uses Easting and Northing.
    • Last updated on 3/14/2020.
  • 3857 or 900913: 3857 is a Mercator projection (also available in Google Earth).

    • 900913 is the “unofficial” code used by the open source projects.
    • Google Mercator projection is used by Google Maps.
    • Mercator preserves direction so is useful for navigation.

14.3 The {sf} Package

The R {sf} package provides a class system for working with geographic vector data (for raster data see the {terra} package).

  • {sf} supersedes the older {sp} package. Many packages may still use {sp} which stores GIS data in a “Spatial” class. You can covert data from those packages to the newer {sf} format using st_as_sf().

The {sf} package can represent all common vector geometry types: points, lines, polygons and their respective ‘multi’ versions (which group together features of the same type into a single feature).

{sf} provides a unified interface to the GEOS library for geometry operations, the GDAL library for reading and writing geographic data files, and the PROJ library for representing and transforming data between projected coordinate reference systems.

Simple feature objects in R are stored in a data frame, with geographic data occupying a special list column, usually named geom or geometry.

  • The addition of simple features data to a data frame changes the class of the data frame to a simple features object.

{sf} stores data for a single simple feature geometry in an sfg object.

  • The geometry column in an {sf} data frame (a simple feature geometry column (sfc)) is a list of sfg objects.
  • The sf object contains information about the underlying CRS. Like adding a Time Zone to a date time.
Important

All geometries in a simple features (sfc) object must have the same CRS.

14.3.1 Transforming SF Data From One CRS to Another CRS

Most data sets loaded from open data will have a CRS specified in the geometry column.

Use sf::st_crs(my_sf_data_frame$geometry) to get the CRS of the data.

  • If the data does not have a CRS associated with it, you can add one using st_set_crs(). Check documentation to see if tells you what it is.

If you want to add points or combine data from different sources, all the data must be in the same CRS to create meaningful maps.

  • If you want to do some spatial transformations or measurements, you will want the data in a projected CRS.

Vector geometries are made-up of points, and how points form the basis of more complex objects such as lines and polygons.

Reprojecting vectors (transforming from one projection to another) thus consists of transforming the coordinates of these points.

  • Use st_transform(my_sf_data, new_epsg) to reproject (using the PROJ library) a simple features tibble geometry.
  • When plotting with ggplot() and sf::geom_sf(), you can use argument sf_crs = to adjust the CRS for the plot.

14.3.2 Sources of Mapping Data

The US Census Bureau has a wide variety of map data for different geographic entities in the United States.

  • It uses vector model data stored in a format known as a “shapefile”.

Many state and local governments also use their open data sources to make their own shapefiles available if they are not available at the Census Bureau.

When searching for data, look for the term “shapefile”.

  • Shapefiles are often downloadable as a Zip file. When unzipped there is a series of folders with the specific information.
  • They can be very large (hundreds of MB) so be careful about uploading to a cloud repository.

14.4 Creating Maps with {ggplot2} and Census Bureau Data

14.4.1 Census Bureau Demographic Data

Two main sources of interest for us:

  • The decennial census (every 10 years) and
  • The American Community Survey (ACS) which is updated every year and covers the past five years.
  • See Comparison of census and ACS

Data is compiled into many different tables to include products known as summary files.

  • Census Summary File 1 (sf1) contains data from the “100%” census down to the Census Block level. It has information on age, sex, race, Hispanic/Latino origin, household relationship, whether residence is owned or rented.
  • Census Summary File 3 (sf3) contains data from a 1-in-6 sample who get the longer version of the census from. It has data for the population, down to the Census Block Group level (less resolution than SF1 and SF2). It presents detailed population and housing data (such as place of birth, education, employment status, income, value of housing unit).

The ACS has data from an ongoing survey that provides data every year for areas of over 65,000 population.

  • The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population.
  • Much of the ACS data are available separately by age group, race, Hispanic origin, and sex.
    • Select data are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 65,000 or more.
  • There are several versions - a 1 year version (acs1), a five year version (acs5), and extended details which are released at different times of the year - typically the one year in September and the 5 year in December.

Each of the many thousands of variables in the Census data has its own unique identifier - an alpha-numeric “name”.

  • These can change over time so each product also has a table listing the available names along with the descriptors known as Label and Concept

The census bureau data is collected at one level and aggregated at different levels as seen in Figure 14.2.

Census Block -> Census Block Group -> Census Tract -> County -> State -> National >

  • It is also organized into other geographic entities such as congressional districts and major metropolitan areas that may be available depending upon the source. See the Glossary on geographies.
  • Depending upon the entity, the Census Bureau makes geometry shape files available which are useful for mapping at various levels.
Figure 14.2: The hierarchy of Census Bureau geographic entities shows how data from census tract level is aggregated and then disaggregated.
  • We can use this data to create choropleth maps where we can represent one or more attributes of a geographic feature by color coding the shape of the feature on a 2D map.

14.4.2 The {tidycensus} Package

Note

This sections is for readers who have not used the {tidycensus} package before.

  • There are multiple R packages for working with the Census Bureau’s 300+ APIs.
  • The {tidycensus} package allows users to interface with the US Census Bureau’s decennial Census and five-year American Community Survey APIs and return tidyverse-ready data frames, with the option for including simple feature geometry at various levels.
  • Install the {tidycensus} package using the console.
  • Load the packages.
library(tidycensus)
library(tidyverse)
14.4.2.0.1 Your Census API Key
  • If you don’t have one already, you need an “API Key” to access the Census Bureau APIs.
  • Request a key from the census bureau: https://api.census.gov/data/key_signup.html
  • You should get an email in a few seconds. Click on the link in the email to activate your key.
  • Copy the key from the email - it will be long….
14.4.2.0.2 Set up your key using the {keyring} package
  • Install the {keyring} package using the console.
  • Use key_set() to store your key in your computer’s password manager.
  • Do not run with INSTALL = TRUE as it will put our key in plain text in your Renviron file.
library(keyring)
# key_set("API_KEY_CENSUS")
14.4.2.0.3 Accessing your key
  • When an API function asks for a password or credential, use your new key
  • Use key_get() to put your key value into a variable you can reuse or, to be even more secure, just use it inside the API call each time.
# my_census_key <- key_get("API_KEY_CENSUS")
14.4.2.0.4 {tidycensus} Functions for the Decennial Census and the American Community Survey
  • There are two major functions implemented in {tidycensus}:

  • get_decennial() grants access to the 1990, 2000, 2010, and 2020 decennial US Census APIs, and

    • Unfortunately, as of 21 Sep, 2020, the U.S. Census Bureau identified a problem with tract numbers in the 1990 API endpoints so that data is temporarily unavailable
    • The 2020 decennial census has a limited release of data.
  • get_acs() grants access to the 5-year American Community Survey APIs.

  • Example: let’s get the decennial census data for median age by state in 2010, the variable name is “P013001”

age10 <- get_decennial(
  geography = "state",
  variables = "P013001",
  year = 2010,
  key = key_get("API_KEY_CENSUS")
)
head(age10)
# A tibble: 6 × 4
  GEOID NAME       variable value
  <chr> <chr>      <chr>    <dbl>
1 01    Alabama    P013001   37.9
2 02    Alaska     P013001   33.8
3 04    Arizona    P013001   35.9
4 05    Arkansas   P013001   37.4
5 06    California P013001   35.2
6 22    Louisiana  P013001   35.8
nrow(age10)
[1] 52
  • Plot the data and reorder the states, on the y axis, by value.
age10 |>
  ggplot(aes(x = value, y = fct_reorder(NAME, value))) +
  geom_point()

14.4.2.0.5 Searching for Variable Identifiers
  • Getting variables from the Census or ACS requires knowing the variable name - and there are thousands of these names across the different products.
  • To rapidly search for variables, use the load_variables() function.
  • Two required arguments:
    • the year of the Census or end year of the ACS sample, and
    • the dataset: “sf1”, “sf3”, “acs1”, “acs5” etc…
  • For efficiency, assign the result to a variable, setting cache = TRUE to store the result on your computer for future access.
  • You can view and filter or use stringr to search the labels and concepts for data of interest
  • Example: Let’s use load_variables() to get the variables for the concept median age by sex.
acs18_5_vars <- load_variables(2018, "acs5", cache = TRUE)
acs18_5_vars |>
  filter(str_detect(concept, "MEDIAN AGE BY SEX")) |>
  head()
# A tibble: 6 × 4
  name        label                           concept                  geography
  <chr>       <chr>                           <chr>                    <chr>    
1 B01002A_001 Estimate!!Median age --!!Total  MEDIAN AGE BY SEX (WHIT… block gr…
2 B01002A_002 Estimate!!Median age --!!Male   MEDIAN AGE BY SEX (WHIT… block gr…
3 B01002A_003 Estimate!!Median age --!!Female MEDIAN AGE BY SEX (WHIT… block gr…
4 B01002B_001 Estimate!!Median age --!!Total  MEDIAN AGE BY SEX (BLAC… block gr…
5 B01002B_002 Estimate!!Median age --!!Male   MEDIAN AGE BY SEX (BLAC… block gr…
6 B01002B_003 Estimate!!Median age --!!Female MEDIAN AGE BY SEX (BLAC… block gr…
acs20_5_vars <- load_variables(2020, "acs5", cache = TRUE)
acs21_1_vars <- load_variables(2021, "acs1", cache = TRUE)
acs21_1_vars |>
  filter(str_detect(concept, "MEDIAN AGE BY SEX")) |>
  head()
# A tibble: 6 × 3
  name        label                           concept                           
  <chr>       <chr>                           <chr>                             
1 B01002A_001 Estimate!!Median age --!!Total: MEDIAN AGE BY SEX (WHITE ALONE)   
2 B01002A_002 Estimate!!Median age --!!Male   MEDIAN AGE BY SEX (WHITE ALONE)   
3 B01002A_003 Estimate!!Median age --!!Female MEDIAN AGE BY SEX (WHITE ALONE)   
4 B01002B_001 Estimate!!Median age --!!Total: MEDIAN AGE BY SEX (BLACK OR AFRIC…
5 B01002B_002 Estimate!!Median age --!!Male   MEDIAN AGE BY SEX (BLACK OR AFRIC…
6 B01002B_003 Estimate!!Median age --!!Female MEDIAN AGE BY SEX (BLACK OR AFRIC…

14.4.3 Making Choropleth Maps with Census Data

14.4.3.1 Set up Your Environment with the Required Libraries

The {tidycensus} package makes it easier to work with census data.

The SimpleFeatures {sf} package provides a standardized way to encode spatial vector data in data frames.

  • The {tigris} package works with Census bureau “TIGER/Line” shapefiles for use in data frames.
    • To save the census data between sessions set the option: options(tigris_use_cache = TRUE)
  • The {viridis} package has color palettes that are perceptually uniform and legible to colorblind individuals and in black and white
    • These are popular for data visualization, including for choropleth mapping.
## install.packages for the following
library(tidyverse)
library(tidycensus)
library(sf) # https://r-spatial.github.io/sf/
library(tigris) # https://cran.r-project.org/web/packages/tigris/tigris.pdf
options(tigris_use_cache = TRUE)
library(viridis) # a color palette

14.4.3.2 Download Census Data with Geography (SF) Objects

The following example gets median household income from the 2016-2020 ACS for census tracts for Baltimore City.

  • Note the FIPS code is an older system of identification for geographic entities in the United States.
  • The current standard is to use GEOIDs
  • You will often see both to facilitate translation.
acs20_5_vars <- load_variables(2020, "acs5", cache = TRUE)

acs20_5_vars |>
  filter(str_detect(concept, "MEDIAN HOUSEHOLD")) |>
  head()
# A tibble: 6 × 4
  name        label                                            concept geography
  <chr>       <chr>                                            <chr>   <chr>    
1 B19013A_001 Estimate!!Median household income in the past 1… MEDIAN… tract    
2 B19013B_001 Estimate!!Median household income in the past 1… MEDIAN… tract    
3 B19013C_001 Estimate!!Median household income in the past 1… MEDIAN… tract    
4 B19013D_001 Estimate!!Median household income in the past 1… MEDIAN… tract    
5 B19013E_001 Estimate!!Median household income in the past 1… MEDIAN… county   
6 B19013F_001 Estimate!!Median household income in the past 1… MEDIAN… tract    
baltimore <- get_acs(
  state = "MD",
  county = "Baltimore City",
  geography = "tract",
  year = 2020,
  variables = "B19013_001",
  geometry = TRUE,
  key = Sys.getenv("API_KEY_CENSUS")
)
glimpse(baltimore)
Rows: 199
Columns: 6
$ GEOID    <chr> "24510151200", "24510271200", "24510140100", "24510090600", "…
$ NAME     <chr> "Census Tract 1512, Baltimore city, Maryland", "Census Tract …
$ variable <chr> "B19013_001", "B19013_001", "B19013_001", "B19013_001", "B190…
$ estimate <dbl> 22632, 195353, 57292, 50048, 53543, 151389, 84318, 46250, 375…
$ moe      <dbl> 5206, 28079, 26401, 16043, 4417, 22844, 16213, 14519, 11730, …
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-76.66589 3..., MULTIPOLYGON (((…

Let’s check the CRS for the geographic data.

st_crs(baltimore$geometry)
Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
  • GEOGCRS means geographic CRS.
  • The datum is NAD83 and you can see the model parameters.
  • The EPSG is 4269.

We could convert to a Projected Mercator (EPSG 3857) using the below or do directly in the plot.

st_crs(st_transform(baltimore, 3857))
Coordinate Reference System:
  User input: EPSG:3857 
  wkt:
PROJCRS["WGS 84 / Pseudo-Mercator",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["Popular Visualisation Pseudo-Mercator",
        METHOD["Popular Visualisation Pseudo Mercator",
            ID["EPSG",1024]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Web mapping and visualisation."],
        AREA["World between 85.06°S and 85.06°N."],
        BBOX[-85.06,-180,85.06,180]],
    ID["EPSG",3857]]
  • PROJCRS means a planar projected CRS - based on the geographic BASEGEOGCRS WGS84.

We can quickly turn into a plot.

  • Try the different CRS values in the coord_sf() line.
  • Check out on epsg.io to see the different centers and for projections the accuracy.
    • 4269. NAD83 for North America - Geographic
    • 4326: WGS84 GPS
    • 3857: Pseudo-Mercator projected coordinate system for Google Maps
    • 26915: NAD83/UTM zone 15N of the projected Universal Transverse Mercator coordinate system
baltimore |>
  ggplot(aes(fill = estimate)) +
  geom_sf(color = NA) +
  coord_sf(crs = 4269) + 
  scale_fill_viridis_c(option = "magma")

14.4.3.3 Faceted Mapping

{ggplot2}’s support for small multiples works very well with the tidy data format returned by {tidycensus}.

  • However, many Census and ACS variables return counts, which are generally inappropriate for choropleth mapping.
  • To overcome this, get_decennial() and get_acs() have an optional argument, summary_var, that can work as a multi-group denominator.

Let’s get Baltimore City’s population count data for single race non-Hispanic whites, non-Hispanic blacks, non-Hispanic Asians, and Hispanic_Latinos of all races by Census tract for the 2020 Census, and specify total population as the summary variable.

You can compare the data for 2020 and 2010 but there are different summary files and variable names.

  • Get the list of variable names and review the hierarchy to choose the ones of interest.
dec20_var <- load_variables(2020, "pl", cache = TRUE)


racevars_20 <- c(
  White = "P2_005N",
  Black = "P2_006N",
  Asian = "P2_008N",
  Hispanic_Latino = "P2_002N"
)

baltimore_race_20 <- get_decennial(
  state = "MD", county = "Baltimore City",
  geography = "tract",
  year = "2020",
  variables = racevars_20, geometry = TRUE,
  summary_var = "P2_001N",
  key = Sys.getenv("API_KEY_CENSUS")
)
glimpse(baltimore_race_20)
Rows: 796
Columns: 6
$ GEOID         <chr> "24510151200", "24510151200", "24510151200", "2451015120…
$ NAME          <chr> "Census Tract 1512, Baltimore city, Maryland", "Census T…
$ variable      <chr> "White", "Black", "Asian", "Hispanic_Latino", "White", "…
$ value         <dbl> 159, 2995, 9, 66, 4626, 686, 222, 275, 2353, 1494, 596, …
$ summary_value <dbl> 3340, 3340, 3340, 3340, 6121, 6121, 6121, 6121, 5034, 50…
$ geometry      <MULTIPOLYGON [°]> MULTIPOLYGON (((-76.66589 3..., MULTIPOLYGO…

Note the 2010 census uses different variable names for the Summary File 1 data.

dec10_var <- load_variables(2010, "sf1", cache = TRUE)

racevars_10 <- c(
  White = "P005003",
  Black = "P005004",
  Asian = "P005006",
  Hispanic = "P005010"
)

baltimore_race_10 <- get_decennial(
  state = "MD", county = "Baltimore City",
  geography = "tract",
  year = "2010",
  variables = racevars_10, geometry = TRUE,
  summary_var = "P001001",
  key = Sys.getenv("API_KEY_CENSUS")
)
# glimpse(baltimore_race_10)

Plot percent by Tract.

baltimore_race_20 |>
  mutate(pct = 100 * (value / summary_value)) |>
  ggplot(aes(fill = pct)) +
  facet_wrap(~ variable) +
  geom_sf(color = NA) +
  # coord_sf(crs = 26915) + # un-comment to see shift in map
  scale_fill_viridis_c()

baltimore_race_10 |>
  mutate(pct = 100 * (value / summary_value)) |>
  ggplot(aes(fill = pct)) +
  facet_wrap(~variable) +
  geom_sf(color = NA) +
  # coord_sf(crs = 26915) + # un-comment to see shift in map
  scale_fill_viridis_c()

2020 Census

2010 Census

Percentage of Race for single race non-hispanic-latino and all race hispanic-latino by census tract.

14.4.4 Create an Interactive Map (only in HTML) with the {mapview} Package

The {mapview} package is for quick interactive analysis of spatial data.

  • It creates a leaflet widget which allows you to interact e.g., scroll and zoom.
  • With the update to version 2.90 need to set fgb = FALSE to be able to create a self-contained or embedded HTML.
  • For a development version of mapview remotes::install_github("r-spatial/mapview").
library(mapview)
mapviewOptions(fgb = FALSE)
mapview(baltimore, zcol = "estimate")

14.4.5 DC Example with Census and Non-Census Data

14.4.5.1 Get the Census Data

Download the median household income for DC from the 2016-2020 ACS at the census tract level with geography objects.

Add an explicit variable for CENSUS_TRACT, extracted from the GEOID.

dc_median <- get_acs(
  state = "DC", county = "District of Columbia",
  geography = "tract",
  variables = "B19013_001",
  geometry = TRUE,
  key = Sys.getenv("API_KEY_CENSUS")
)
glimpse(dc_median)
Rows: 206
Columns: 6
$ GEOID    <chr> "11001000201", "11001010300", "11001002801", "11001004002", "…
$ NAME     <chr> "Census Tract 2.01; District of Columbia; District of Columbi…
$ variable <chr> "B19013_001", "B19013_001", "B19013_001", "B19013_001", "B190…
$ estimate <dbl> NA, 94145, 80223, 170216, 164297, 54437, 79194, 105833, 10910…
$ moe      <dbl> NA, 24793, 18396, 34250, 45958, 17427, 14965, 31629, 20010, 2…
$ geometry <POLYGON [°]> POLYGON ((-77.07902 38.9126..., POLYGON ((-77.03636 3…
dc_median |>
  mutate(CENSUS_TRACT = str_sub(GEOID, 6, 11)) ->
dc_median
glimpse(dc_median)
Rows: 206
Columns: 7
$ GEOID        <chr> "11001000201", "11001010300", "11001002801", "11001004002…
$ NAME         <chr> "Census Tract 2.01; District of Columbia; District of Col…
$ variable     <chr> "B19013_001", "B19013_001", "B19013_001", "B19013_001", "…
$ estimate     <dbl> NA, 94145, 80223, 170216, 164297, 54437, 79194, 105833, 1…
$ moe          <dbl> NA, 24793, 18396, 34250, 45958, 17427, 14965, 31629, 2001…
$ geometry     <POLYGON [°]> POLYGON ((-77.07902 38.9126..., POLYGON ((-77.036…
$ CENSUS_TRACT <chr> "000201", "010300", "002801", "004002", "006700", "007707…

14.4.5.2 Plot the Median Household Income

The variable is estimate.

dc_median |>
  ggplot(aes(fill = estimate)) +
  geom_sf(aes(geometry = geometry), color = NA) +
  scale_fill_viridis_c()

Check the CRS.

st_crs(dc_median)
Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

14.4.6 Get Non-Census Crime data

Go to DC Open Data Crime Incidents for 2022.

  • Click on Download and then select CSV Download options and download the “file previously generated”.
  • Save to a data directory beneath your working directory for your .qmd file.
crime <- read_csv("./data/Crime_Incidents_in_2022.csv",
  col_types = cols(
    DISTRICT = col_factor(),
    WARD = col_factor(),
    PSA = col_factor(),
     METHOD = col_factor()
  )
)

14.4.6.1 Plot Crimes by Census Tract

Plot the total number of crimes in each tract.

  • Filter to METHOD that are not OTHERS which means GUN and KNIFE.
  • Note: we have to explicitly identify the geometry given the join with the census data frame to get the geometry.
crime |>
  filter(METHOD != "OTHERS") |>
  group_by(CENSUS_TRACT, METHOD) |>
  count() |>
  left_join(dc_median, by = "CENSUS_TRACT") |>
  ggplot(aes(fill = n)) +
  geom_sf(aes(geometry = geometry), color = NA) +
  scale_fill_viridis_c() +
  facet_wrap(~METHOD)

14.4.6.2 Now Get Shapefiles for the Wards

Go to DC open data and download the Shapefile for data for the wards

  • This will be a zip file as it has multiple files in a folder.
  • It includes the data you in the CSV plus the geometry data for the wards.

Unzip into the data directory.

Use the sf::read_sf() to load using the directory name (not a file name).

  • Check the names of the data.
  • Convert WARD from integer to a factor.
  • We don’t need all the census bureau data so we can remove columns that start with P00.
  • Check the class.
wards <- read_sf("./data/Wards_from_2022")
#names(wards)
wards |>
  mutate(WARD = as.factor(as.character(WARD))) |>
  select(!(starts_with("P00") | starts_with("H00"))) ->
wards
class(wards)
[1] "sf"         "tbl_df"     "tbl"        "data.frame"
head(wards)
Simple feature collection with 6 features and 24 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -77.08172 ymin: 38.79164 xmax: -76.90915 ymax: 38.9573
Geodetic CRS:  WGS 84
# A tibble: 6 × 25
  WARD  NAME   REP_NAME     WEB_URL REP_PHONE REP_EMAIL REP_OFFICE WARD_ID LABEL
  <fct> <chr>  <chr>        <chr>   <chr>     <chr>     <chr>      <chr>   <chr>
1 8     Ward 8 Trayon Whit… https:… (202) 72… twhite@d… 1350 Penn… 8       Ward…
2 6     Ward 6 Charles All… https:… (202) 72… callen@d… 1350 Penn… 6       Ward…
3 7     Ward 7 Vincent Gray https:… (202) 72… vgray@dc… 1350 Penn… 7       Ward…
4 2     Ward 2 Brooke Pinto https:… (202) 72… bpinto@d… 1350 Penn… 2       Ward…
5 1     Ward 1 Brianne Nad… https:… (202) 72… bnadeau@… 1350 Penn… 1       Ward…
6 5     Ward 5 Zachary Par… https:… (202) 72… kmcduffi… 1350 Penn… 5       Ward…
# ℹ 16 more variables: STUSAB <chr>, SUMLEV <chr>, GEOID <chr>, GEOCODE <chr>,
#   STATE <chr>, POP100 <int>, HU100 <int>, OBJECTID <int>, GLOBALID <chr>,
#   CREATED_US <chr>, CREATED_DA <chr>, LAST_EDITE <chr>, LAST_EDI_1 <date>,
#   SHAPEAREA <int>, SHAPELEN <int>, geometry <POLYGON [°]>

Check the CRS for the data.

st_crs(wards)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

This is WGS84 which is different than what we had from the census bureau which was NAD83.

  • However, we are not combining the two geometery columns so that is okay.

Now plot at the Ward Level.

crime |>
  filter(METHOD != "OTHERS") |>
  group_by(WARD, METHOD) |>
  count() |>
  left_join(wards, by = "WARD") |>  #glimpse()
  ggplot(aes(fill = n)) +
  geom_sf(aes(geometry = geometry), color = NA) +
  scale_fill_viridis_c() +
  facet_wrap(~ METHOD) +
  geom_sf_label(aes(label = WARD, geometry = geometry),
    color = "red", label.size = 0
  )

Compare totals by ward.

crime |>
  filter(METHOD != "OTHERS") |>
  group_by(WARD, METHOD) |>
  count() |>
  ggplot(aes(x = WARD, y = n, color = METHOD)) +
  geom_point()

Some wards have more than one census tract so the census data does not exactly line up but you could approximate.

crime |>
  select(CENSUS_TRACT, WARD) |>
  group_by(CENSUS_TRACT) |>
  unique() |>
  count() |>
  filter(n > 1) |>
  arrange(desc(n))
# A tibble: 20 × 2
# Groups:   CENSUS_TRACT [20]
   CENSUS_TRACT     n
   <chr>        <int>
 1 000300           2
 2 001401           2
 3 001402           2
 4 002900           2
 5 003400           2
 6 004201           2
 7 004300           2
 8 004402           2
 9 004704           2
10 004801           2
11 004901           2
12 005001           2
13 006802           2
14 006804           2
15 007601           2
16 007903           2
17 008002           2
18 010601           2
19 980000           2
20 <NA>             2

14.4.7 Add Points of Interest to the Maps

Get and set the Longitude and Latitude for AU as an sf object.

au_latlong <- tibble(longitude = -77.0869, latitude = 38.9370)
au_latlong <- st_as_sf(au_latlong,
  coords = c("longitude", "latitude"),
  crs = 4326, ## same as the DC Wards shapefile WGS84 Geo
  agr = "constant"
)

Add to the crime plot.

crime |>
  filter(METHOD != "OTHERS") |>
  group_by(WARD, METHOD) |>
  count() |>
  left_join(wards, by = "WARD") |>
  ggplot(aes(fill = n)) +
  geom_sf(aes(geometry = geometry), color = NA) +
  scale_fill_viridis_c() +
  facet_wrap(~ METHOD) +
  geom_sf(data = au_latlong, size = 3, shape = 23, fill = "red")

Add to the median household income plot.

dc_median |>
  ggplot(aes(fill = estimate)) +
  geom_sf(aes(geometry = geometry), color = NA) +
  scale_fill_viridis_c() +
  geom_sf(data = au_latlong, size = 2, shape = 23, fill = "red")

Noe that we did not convert the AU location from WGS84 back to NAD83 because we did not need that much accuracy, but we could have using st_as_sf().

14.4.8 Density Plots

  • Plot the points for gun and knife crimes using Latitude and Longitude as numeric values, not simple features.
crime |>
  filter(METHOD != "OTHERS") |>
  ggplot(aes(x = LONGITUDE, y = LATITUDE, color = METHOD)) +
  geom_point(alpha = .3) +
  scale_color_viridis_d()

Create a contour 2d density plot.

crime |>
  filter(METHOD != "OTHERS") |>
  ggplot(aes(x = LONGITUDE, y = LATITUDE)) +
  geom_density2d(binwidth = 10) ## adjust binwidth for resolution

Create a filled 2d density plot.

crime |>
  filter(METHOD != "OTHERS") |>
  ggplot(aes(x = LONGITUDE, y = LATITUDE)) +
  geom_density2d_filled(bins = 10) + ## adjust bins for resolution
  theme(legend.position = "bottom") ## + facet_grid(~ METHOD)

Zoom in by filtering the data.

crime |>
  filter(METHOD != "OTHERS", WARD %in% c(7, 8)) |>
  ggplot(aes(x = LONGITUDE, y = LATITUDE)) +
  facet_grid(~METHOD) +
  geom_density2d_filled(bins = 10) +
  theme(legend.position = "top")

Zoom in by adjusting the map limits to see a different view of more data.

crime |>
  filter(METHOD != "OTHERS") |>
  ggplot(aes(x = LONGITUDE, y = LATITUDE)) +
  facet_grid(~METHOD) +
  coord_sf(xlim = c(-77.025, -76.9), ylim = c(38.82, 38.91)) +
  geom_density2d_filled(bins = 10) +
  theme(legend.position = "top")

Combine layers with map.

crime |>
  filter(METHOD != "OTHERS") |> ## 7 minutes to run for all data
  left_join(wards, by = "WARD") |> ## get the sf data for Wards
  ggplot(aes(x = LONGITUDE, y = LATITUDE)) +
  geom_sf(aes(geometry = geometry), color = "green") +
  geom_density2d(bins = 10) +
  geom_sf_label(aes(label = WARD, geometry = geometry),
    color = "red", label.size = 0
  )

Try with filled contour.

crime |>
  filter(METHOD != "OTHERS") |>
  left_join(wards, by = "WARD") |> ## get the sf data for Wards
  ggplot(aes(x = LONGITUDE, y = LATITUDE)) +
  geom_sf(aes(geometry = geometry), color = "black") +
  geom_density2d_filled(bins = 10, alpha = .4) +
  geom_sf_label(aes(label = WARD, geometry = geometry),
    color = "red", label.size = 0
  )

14.5 Working with Open Data and the {leaflet} package

This section is based on the tutorial by Ryan Reynolds in the references.

14.5.1 Government Open Data Initiatives

Many locales have publicly-available map data (and other data as well)

14.5.2 Example with Affordable Housing Grants

  • The US Department of Housing and Urban Development (HUD) gives grants to local partners to help rebuild and build affordable housing.
  • The data source is the City of New Orleans Open Data Portal.
  • Observational Units: Grants from HUD to help rebuild and build affordable housing in the city.
  • Variables: information about each funding award and the location of the housing development supported by the funding.
  • This example provides insight into how the city is continuing to recover from the long-term impact of Hurricane Katrina.

14.5.3 Set up the Environment with the Required Libraries

  • The SimpleFeatures (sf) R package provides a standardized way to encode spatial vector data.
  • It binds to {GDAL} for reading and writing data, to {GEOS} for geometrical operations, and to {PROJ} for projection conversions and datum transformations.
  • Leaflet is a popular open-source JavaScript library for interactive maps used by many websites and GIS products.
  • The {leaflet} R package makes it easy to integrate and control Leaflet maps in R.
## install.packages for the following if needed
library(tidyverse)
library(sf)
library(leaflet)
library(viridis) #a color palette

14.5.4 Download and Load the Data

14.5.4.1 Download the Data

Two different datasets:

  • The main dataset of HUD grants grants .
  • A dataset with a shapefile of New Orleans neighborhoods shapefile.

Select Export, then Download, then CSV format for the first file, and Shapefile for the second (a pre-defined geospatial format managed by ESRI, a commercial GIS provider).

Download the files into your data directory.

Unzip the shapefile into its directory.

14.5.4.2 Load and Convert the data

The Housing_and_Urban_Development__HUD__Grants.csv data has numeric values for the longitude and latitude of each grant.

We want to convert them into simple features points.

The st_as_sf() function converts longitude and latitude into a simple features “POINT” object containing longitude and latitude.

  • WGS84 EPSG 4326 coordinate reference system is a good default for longitude and latitude data.
read_csv("./data/Housing_and_Urban_Development__HUD__Grants.csv") |> 
## note double underlines in the file name

## Convert Latitude and Longitude data in csv to a simple features object
st_as_sf(coords = c("Longitude", "Latitude"), 
               crs = 4326) ->
hud_grants 

glimpse(hud_grants)
Rows: 342
Columns: 5
$ `OCD Program` <chr> "Owner Occupied Rental", "Rental", "Homebuyer", "Rental"…
$ Partnership   <chr> "PRC Rebuilding Together Award", "Gert Town Ent Econ Red…
$ Status        <chr> "Complete", "Complete", "Active", "Active", "Active", "C…
$ Address       <chr> "3915 Marais St", "3235 Fern St", "3229 Washington Ave",…
$ geometry      <POINT [°]> POINT (-90.0357 29.96678), POINT (-90.11209 29.960…
st_crs(hud_grants$geometry)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Load the shapes for the neighborhoods and check the CRS.

neighborhoods <- read_sf("./data/Neighborhood Statistical Areas Demographics 2020")
glimpse(neighborhoods)
Rows: 72
Columns: 45
$ objectid   <dbl> 4, 26, 33, 16, 25, 34, 72, 10, 63, 2, 19, 65, 46, 39, 42, 5…
$ gnocdc_lab <chr> "AUDUBON", "FRERET", "IBERVILLE", "DILLARD", "FLORIDA DEV",…
$ lup_lab    <chr> "AUDUBON - UNIVERSITY", "FRERET", "IBERVILLE HOUSING DEV", …
$ neigh_id   <chr> "10A", "11B", "6C", "3D", "8C", "13B", "13A", "11A", "13F",…
$ cnt_lup_la <dbl> 17, 3, 2, 6, 1, 5, 6, 7, 5, 3, 4, 3, 4, 2, 8, 3, 11, 7, 4, …
$ sum_pop100 <dbl> 17054, 1643, 1122, 5413, 160, 3104, 4742, 6039, 2295, 2289,…
$ sum_hu100  <dbl> 5862, 931, 647, 2483, 52, 1922, 2740, 3056, 1269, 1414, 153…
$ sum_totpop <dbl> 17054, 1643, 1122, 5413, 160, 3104, 4742, 6039, 2295, 2289,…
$ sum_white  <dbl> 13466, 793, 87, 447, 2, 2069, 3470, 2491, 624, 1664, 1633, …
$ sum_black  <dbl> 434, 627, 967, 4593, 154, 620, 682, 2735, 1457, 348, 618, 3…
$ sum_amer_i <dbl> 70, 14, 7, 16, 0, 9, 11, 33, 4, 13, 8, 9, 3, 3, 21, 16, 12,…
$ sum_asian  <dbl> 500, 34, 8, 28, 0, 51, 109, 95, 38, 18, 45, 64, 17, 40, 250…
$ sum_pac_is <dbl> 29, 0, 0, 2, 1, 0, 4, 0, 1, 4, 2, 1, 0, 0, 0, 2, 6, 0, 2, 0…
$ sum_other_ <dbl> 322, 47, 10, 94, 0, 84, 83, 242, 20, 29, 63, 59, 37, 32, 17…
$ sum_multi_ <dbl> 2233, 128, 43, 233, 3, 271, 383, 443, 151, 213, 232, 225, 1…
$ sum_totp_2 <dbl> 17054, 1643, 1122, 5413, 160, 3104, 4742, 6039, 2295, 2289,…
$ sum_hisp_l <dbl> 3057, 162, 34, 174, 2, 274, 367, 631, 110, 165, 209, 262, 9…
$ sum_non_hi <dbl> 13997, 1481, 1088, 5239, 158, 2830, 4375, 5408, 2185, 2124,…
$ sum_totp_3 <dbl> 15066, 1408, 816, 4582, 107, 2716, 3927, 4889, 1769, 1897, …
$ sum_white_ <dbl> 11894, 695, 73, 385, 2, 1821, 2888, 2094, 571, 1422, 1423, …
$ sum_black_ <dbl> 384, 534, 688, 3912, 102, 547, 585, 2198, 1027, 272, 522, 3…
$ sum_amerin <dbl> 63, 12, 7, 13, 0, 9, 10, 22, 3, 13, 0, 4, 2, 2, 18, 11, 4, …
$ sum_asian_ <dbl> 427, 30, 6, 28, 0, 46, 85, 89, 38, 17, 32, 59, 16, 33, 241,…
$ sum_pacisl <dbl> 29, 0, 0, 2, 1, 0, 4, 0, 1, 2, 2, 1, 0, 0, 0, 2, 6, 0, 2, 0…
$ sum_othe_2 <dbl> 298, 34, 9, 74, 0, 67, 78, 165, 17, 25, 51, 45, 25, 18, 162…
$ sum_mult_2 <dbl> 1971, 103, 33, 168, 2, 226, 277, 321, 112, 146, 165, 186, 6…
$ sum_totp_4 <dbl> 15066, 1408, 816, 4582, 107, 2716, 3927, 4889, 1769, 1897, …
$ sum_hisp_2 <dbl> 2877, 117, 31, 112, 1, 226, 299, 442, 84, 115, 148, 203, 56…
$ sum_non__2 <dbl> 12189, 1291, 785, 4470, 106, 2490, 3628, 4447, 1685, 1782, …
$ sum_total_ <dbl> 5862, 931, 647, 2483, 52, 1922, 2740, 3056, 1269, 1414, 153…
$ sum_occ_hu <dbl> 4923, 782, 604, 2175, 51, 1650, 2389, 2643, 1156, 1162, 132…
$ sum_vacant <dbl> 939, 149, 43, 308, 1, 272, 351, 413, 113, 252, 209, 322, 26…
$ sum_totp_5 <dbl> 5536, 30, 0, 709, 0, 0, 120, 9, 0, 7, 0, 365, 20, 0, 7, 7, …
$ sum_instgq <dbl> 162, 0, 0, 0, 0, 0, 108, 0, 0, 0, 0, 327, 0, 0, 0, 7, 0, 0,…
$ sum_adulco <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ sum_juvcor <dbl> 0, 0, 0, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0…
$ sum_nursho <dbl> 162, 0, 0, 0, 0, 0, 90, 0, 0, 0, 0, 295, 0, 0, 0, 0, 0, 0, …
$ sum_othins <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0…
$ sum_nonins <dbl> 5374, 30, 0, 709, 0, 0, 12, 9, 0, 7, 0, 38, 20, 0, 7, 0, 10…
$ sum_col_do <dbl> 5344, 0, 0, 707, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sum_milita <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ sum_othnon <dbl> 30, 30, 0, 2, 0, 0, 12, 9, 0, 7, 0, 38, 20, 0, 7, 0, 10, 5,…
$ shape_star <dbl> 63257823, 5464058, 1747063, 26475756, 1540363, 23457801, 38…
$ shape_stle <dbl> 39015.654, 9710.452, 5408.459, 20830.352, 5959.030, 26507.6…
$ geometry   <POLYGON [°]> POLYGON ((-90.10964 29.9413..., POLYGON ((-90.10331…
st_crs(neighborhoods)
Coordinate Reference System:
  User input: WGS84(DD) 
  wkt:
GEOGCRS["WGS84(DD)",
    DATUM["WGS84",
        ELLIPSOID["WGS84",6378137,298.257223563,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]]]

The WGS84(DD) means decimal degrees. This says the EPSG code is 9001 which is a projected CRS.

14.5.5 Clean the Data

When doing geospatial mapping make sure all your spatial datasets are in the same projection system. In this case let’s use the WGS84 EPSG 4326 as a geographic CRS.

  • We are interested in locations, not measuring areas or distances, so area distortion is okay on this city scale.

The function sf::st_transform(4326) takes the neighborhoods dataset and transforms it into the chosen CRS.

We will also add labels and use raw HTML to bold the word “Neighborhood” and then {stringr} to convert to title case.

neighborhoods_clean <- neighborhoods |> 
st_transform(4326) |> ## transform to WGS 84
## Add label formatted with html and Title case
mutate(neighborhood_label = str_c('<b>Neighborhood:</b> ',
                                  str_to_title(gnocdc_lab)))

Clean up variable names in hud_grants, filter out rows with no program data, and add a label.

  • Note the addition of some html to create line breaks.
hud_grants_clean <- hud_grants |> 
  rename(OCD_Program = `OCD Program`) |> 
  filter(OCD_Program != "None") |> 
  mutate(popup_label = paste(paste0('<b>Partner: ', Partnership, '</b>'),
                         paste0('Address: ', Address), 
                         sep = '<br/>')) ## HTML line break

14.5.6 The {leaflet} Package

Leaflet builds maps in a fashion similar to how {ggplot2} builds graphics, one layer at a time.

  • First create the map.
  • Add neighborhood areas.
  • Add grant project markers.

Each piece is layered on top of each other, physically, in the displayed map, and virtually, in your code.

Let’s walk piece-by-piece through making the map, with images of the map along the way.

{leaflet} has a number of functions for adding controls or layers to the map to create a new map object.

  1. First create the base layer map.
  • Leaflet supports base maps using map tiles, popularized by Google Maps and now used by many interactive web maps.
  • The easiest way to add tiles is by calling addTiles() with no arguments.
  • The default is to use OpenStreetMap tiles.

A vector tile is a lightweight data format for storing geospatial vector data, such as points, lines, and polygons.

  • Vector tiles are used to create vector tilesets.

  • Vector tiles encode geographic information according to the Vector tile specification.

  • They can be accessed with styles (rules) that define how to display the tiles.

  • leaflet() creates the underlying interactive map widget.

  • addTiles() adds the vector tiles as the tile layer of the map.

leaflet() |> 
    addTiles() -> p
  1. Add the neighborhood polygons to the map.
p |> 
addPolygons(data = neighborhoods_clean, color = 'white', weight = 1.5,
            opacity = 1, fillColor = "#e8e8e8", fillOpacity = .8,
            highlightOptions = highlightOptions(color = "#FFF1BE", weight = 5),
            popup = ~ neighborhood_label) ->
  p
  1. Add the Grants
  • colorFactor() is a Leaflet function to map factor levels to colors according to a given palette. The domain is the possible values that can be mapped.
    • Leaflet has functions to handle numeric, factor, or character data values.
  • addCircleMarkers() is another leaflet addControl function.
pal <- colorFactor(
  palette = viridis_pal(begin = .4, end = .95, option = 'A')(3),
  domain = hud_grants_clean$OCD_Program
)

p |> 
  addCircleMarkers(data = hud_grants_clean, popup = ~popup_label,
                   stroke = F, radius = 4,
                   fillColor = ~ pal(OCD_Program), fillOpacity = 1) ->
  p
  1. addLegend() uses the colorFactor values to map the colors to the programs.

Your map is complete.

p |> 
  addLegend(data = hud_grants_clean, pal = pal, values = ~ OCD_Program,
            title = "HUD Grant Program") ->
  p

14.5.7 Leaflet in Shiny

Leaflet Maps also work in Shiny.

One can track events associated with the map and use reactive events to update the data used for other displays.

As an example, assume your Shiny app is using a leaflet map of the United States by state.

You can react to input$map_shape_click in a reactive environment and use the response to filter data to the chosen state.

# not complete working code

      ggplot_data <- reactive({
        state_click <- input$map_shape_click
        df_state_plot[df_state_plot$State %in% state_click,]
      })

    output$stateLoanRangePlot <- renderPlot({
       if (nrow(ggplot_data()) > 0) { # filter the data using the clicked state
        my_data <- ggplot_data()
      } else { # set a default for California
        my_data <- df_state_plot[df_state_plot$State %in% "CA",]
      }
        ggplot(data = my_data, aes(x = x)) +
          etc...
    })

14.6 Making Thematic Maps in R using the {tmap} Package

14.6.1 The {tmap} Package

The {tmap} package helps one create more complicated “thematic” maps (choropleth, bubble, etc.) using R.

  • {tmap} was designed to work like {ggplot2} by building up the layers of a map.
  • It works with spatial objects from the {sf} packages.
  • It can make static maps or interactive maps (using the {leaflet} package).

{tmap} can be used in two modes:

  • plot: static maps, shown in the window which can be exported in different formats.
  • view: interactive maps, shown in the viewing window or in the browser, which can be exported to standalone HTML files.

The fundamental object in tmap is a tmap_element or “shape” based on a {sf} shape.

All {tmap} functions for working with elements begin with tm_(). Other functions often begin with tmap_.

This section follows the “tmap: get started” vignette from the references.

  • Use the console to install the {tmap} package and then use library(tmap) to load and attach it.
library(tidyverse)
library(tmap)

To get started, let’s load a world map.

  • Load the World spatial object (a data frame with geographic data).
  • tm_shape() creates a tm_element of class tm_shape1.
data("World")
tm_shape(World) |> str()
List of 1
 $ tm_shape:List of 14
  ..$ shp_name         : chr "World"
  ..$ shp              :Classes 'sf' and 'data.frame':  177 obs. of  16 variables:
  .. ..$ iso_a3      : Factor w/ 177 levels "AFG","AGO","ALB",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ name        : Factor w/ 177 levels "Afghanistan",..: 1 4 2 166 6 7 5 56 8 9 ...
  .. ..$ sovereignt  : Factor w/ 171 levels "Afghanistan",..: 1 4 2 159 6 7 5 52 8 9 ...
  .. ..$ continent   : Factor w/ 8 levels "Africa","Antarctica",..: 3 1 4 3 8 3 2 7 6 4 ...
  .. ..$ area        : Units: [km^2] num [1:177] 652860 1246700 27400 71252 2736690 ...
  .. ..$ pop_est     : num [1:177] 28400000 12799293 3639453 4798491 40913584 ...
  .. ..$ pop_est_dens: num [1:177] 43.5 10.3 132.8 67.3 15 ...
  .. ..$ economy     : Factor w/ 7 levels "1. Developed region: G7",..: 7 7 6 6 5 6 6 6 2 2 ...
  .. ..$ income_grp  : Factor w/ 5 levels "1. High income: OECD",..: 5 3 4 2 3 4 2 2 1 1 ...
  .. ..$ gdp_cap_est : num [1:177] 784 8618 5993 38408 14027 ...
  .. ..$ life_exp    : num [1:177] 59.7 NA 77.3 NA 75.9 ...
  .. ..$ well_being  : num [1:177] 3.8 NA 5.5 NA 6.5 4.3 NA NA 7.2 7.4 ...
  .. ..$ footprint   : num [1:177] 0.79 NA 2.21 NA 3.14 2.23 NA NA 9.31 6.06 ...
  .. ..$ inequality  : num [1:177] 0.427 NA 0.165 NA 0.164 ...
  .. ..$ HPI         : num [1:177] 20.2 NA 36.8 NA 35.2 ...
  .. ..$ geometry    :sfc_MULTIPOLYGON of length 177; first list element: List of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ : num [1:69, 1:2] 61.2 62.2 63 63.2 64 ...
  .. .. ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
  .. ..- attr(*, "sf_column")= chr "geometry"
  .. ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  .. .. ..- attr(*, "names")= chr [1:15] "iso_a3" "name" "sovereignt" "continent" ...
  ..$ name             : NULL
  ..$ is.master        : logi NA
  ..$ projection       : NULL
  ..$ bbox             : NULL
  ..$ unit             : NULL
  ..$ simplify         : num 1
  ..$ point.per        : logi NA
  ..$ line.center      : chr "midpoint"
  ..$ filter           : NULL
  ..$ raster.downsample: logi TRUE
  ..$ raster.warp      : logi TRUE
  ..$ check_shape      : logi TRUE
 - attr(*, "class")= chr "tmap"

Now we can add map layer polygons with tm_polygons() by putting a numeric variable as the color argument col=.

tm_shape(World) +
  tm_polygons("HPI")

14.6.2 Changing Mode from plot to view

To change the mode from plot to view, use tmap_mode().

  • This invokes the {leaflet} package for interactivity (HTML output only).
tmap_mode("view")
tm_shape(World) +
  tm_polygons("life_exp")

We can go back to “plot” as well.

tmap_mode("plot")
tm_shape(World) +
  tm_polygons("life_exp")

14.6.3 Combining Objects as Layers

Let’s load different kinds of objects and see how {tmap} can easily combine them.

Use data() to access the built-in data for metro, rivers, and land.

  • The metro data is an old-style CRS object so we will get a warning to update it when we check its CRS.
  • Use sf::st_crs(metro$geometry) <- "EPSG:4326" to update to WGS84.
data(metro, rivers, land)
#sf::st_crs(metro$geometry)
sf::st_crs(metro$geometry) <- "EPSG:4326" # fix old style CRS for metro

Now we can create a map with a combination of raster data and vector features data.

  • Use tm_shape(land) and then add a raster map with elevation.
  • Add vector shapes for rivers, colored in blue.
tm_shape(land) + # create an element from class stars
  tm_raster("elevation", palette = terrain.colors(10)) +
  tm_shape(rivers) + #
  tm_lines(col = "blue") -> map_land_rivers
map_land_rivers

Add a shape for the world country borders (in white) and country names

map_land_rivers +
tm_shape(World) + # create an element
    tm_borders("white", lwd = .5) + # draw polygon borders only
    tm_text("iso_a3", size = "AREA") -> # add text labels
map_with_country_borders
map_with_country_borders

  • Add data for metropolitan areas as symbols.
  • Move the legend outside the map.
map_with_country_borders +
tm_shape(metro) + #create an element
    tm_symbols(col = "red", size = "pop2020", scale = .5) +
tm_legend(legend.outside = TRUE) # move legend

14.6.4 Facetting

One can facet maps in three ways

  1. Facet by a variable by adding multiple variables to the aesthetic in tm_polygons().
tmap_mode("view")
tm_shape(World) +
    tm_polygons(c("HPI", "economy")) +
    tm_facets(sync = TRUE, ncol = 2) #specify the facet output structure
  1. Facet by using the by argument of the tm_facets().
  • Should be a factor/character with a “small” number of distinct values.
tmap_mode("plot")

World |> 
  filter(!continent %in% c("Antarctica", "Seven seas (open ocean)")) |> 
tm_shape() +
    tm_polygons("HPI", palette = "RdYlBu") +
    tm_facets(by = "continent", ncol = 3)

  1. Creating multiple maps and using tm_arrange() to manually create the facets.
tm1 <- tm_shape(World) + 
  tm_polygons("pop_est", convert2density = TRUE) +
  tm_legend(legend.outside = TRUE) 
tm2 <- tm_shape(World) +
  tm_bubbles(size = "pop_est") +
  tm_legend(legend.outside = TRUE) 
tm3 <- tm_shape(metro) +
  tm_symbols(size = "pop2030", col = "green") +
  tm_legend(legend.outside = TRUE) 
tmap_arrange(tm1, tm2, tm3)

14.6.5 Exporting Maps

You can save images using tm_save().

tm <- tm_shape(World) +
    tm_polygons("HPI", legend.title = "Happy Planet Index")

## save an image ("plot" mode)
tmap_save(tm, filename = "./output/world_map.png")

## save as stand-alone HTML file ("view" mode)
tmap_save(tm, filename = "./output/world_map.html")

14.6.6 Shiny integration

It is possible to use {tmap} in {shiny}.

  • There are special Output and Render functions for {tmap}.
  • In invokes a leaflet widget for display.
library(shiny)
library(tmap)
data(World)

ui <- fluidPage(
  tmapOutput("my_tmap")
)

server <- function(input, output, session) {
  output$my_tmap = renderTmap({
    tm_shape(World) + 
      tm_polygons("HPI", 
                  legend.title = "Happy Planet Index")
})
}
shinyApp(ui, server)

14.7 Using Natural Earth Data for Mapping in R.

This section is based on the rspatial tutorial.

Natural Earth is a public domain map dataset including vector country and other administrative boundaries.” rnaturalearth is designed to provide data allowing creation of more elaborate maps in other visualisation packages (e.g. ggplot2, tmap and choroplethr). South (2023)

14.7.2 Country Names and Other Names (geom_text and annotate)

The world df contains country names and the coordinates of the centroid of each country.

Use geom_text() to add a layer of text to a map using geographic coordinates.

We have a very flexible control to adjust the text:

  • The size (argument size);
  • The alignment, which is centered by default on the coordinates provided.
    • Adjust horizontally or vertically using the arguments hjust and vjust, (a number between 0 (right/bottom) and 1 (top/left) or a character (“left”, “middle”, “right”, “bottom”, “center”, “top”).
    • Offset horizontally or vertically with the argument nudge_x and nudge_y;
  • Set text color with color or the type of font, fontface.
  • To control the overlap of labels, use check_overlap, which removes overlapping text.

For the text labels, define the centroid of the counties with sf::st_centroid().

Finally, the annotate() function can be used to add a single character string at a specific location, as shown here to add the “Gulf of Mexico”.

sf_use_s2(FALSE)

world_points <- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))

ggplot(data = world) +
  geom_sf() +
  geom_text(
    data = world_points, aes(x = X, y = Y, label = name),
    color = "darkblue", fontface = "bold", check_overlap = FALSE
  ) +
  annotate(
    geom = "text", x = -90, y = 26, label = "Gulf of Mexico",
    fontface = "italic", color = "grey22", size = 6
  ) +
  coord_sf(xlim = c(-97.15, -70.12), ylim = c(7.65, 30.97), expand = FALSE)

14.7.3 Set the Theme

The theme of the map can be edited to make the map more appealing.

  • The theme_bw is a good default but there are many possible themes.

Specific theme elements can be tweaked to get to the final outcome:

  • Adjust grid lines (graticules) on the map using panel.grid.major and panel.grid.minor; here set to a gray color and dashed line type.
  • Color the map background with panel.background; here set the ocean to light blue.
  • See help for the function theme().
ggplot(data = world) +
  geom_sf(fill = "antiquewhite") +
  geom_text(
    data = world_points, aes(x = X, y = Y, label = name),
    color = "darkblue", fontface = "bold", check_overlap = FALSE
  ) +
  annotate(
    geom = "text", x = -90, y = 26, label = "Gulf of Mexico",
    fontface = "italic", color = "grey22", size = 6
  ) +
  annotation_scale(location = "bl", width_hint = 0.5) +
  annotation_north_arrow(
    location = "bl", which_north = "true",
    pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
    style = north_arrow_fancy_orienteering
  ) +
  coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE) +
  xlab("Longitude") +
  ylab("Latitude") +
  ggtitle("Map of the Gulf of Mexico and the Caribbean Sea") +
  theme(
    panel.grid.major = element_line(
      color = gray(.5),
      linetype = "dashed", linewidth = 0.5
    ),
    panel.background = element_rect(fill = "aliceblue")
  )

14.7.4 Save the Map with ggsave()

ggsave() allows a graphic to be saved in a variety of formats, including the most common PNG (raster bitmap) and PDF (vector graphics), with control over the size and resolution of the outcome.

Here we save a PDF version of the map, which keeps the best quality, and a PNG version of it for web purposes.

ggsave("./output/map.pdf")
ggsave("./output/map_web.png", width = 6, height = 6, dpi = "screen")

14.7.5 Adding Data to a Map

14.7.6 Individual Points

We start by defining two study sites, according to their longitude and latitude, stored in a regular data frame.

sites <- data.frame(
  longitude = c(-80.144005, -80.109),
  latitude = c(26.479005, 26.83)
)
sites
  longitude latitude
1 -80.14401   26.479
2 -80.10900   26.830

You could use geom_point() to adds points but that works on the X, Y scale of the plot, not the map.

The recommended approach is to use functions from the {sf} package to convert locations to sf objects which can handle conversions between projections.

  • That requires entering the projection of the original points as part of the conversion function.

Then you can plot using the map information so the points remain in the right place, even if you change the limits of the map.

sites <- st_as_sf(sites,
  coords = c("longitude", "latitude"),
  crs = 4326, agr = "constant"
) ## WGS84

ggplot(data = world) +
  geom_sf() +
  geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE)
## Note points stay in the right location when zoomed in
ggplot(data = world) +
  geom_sf() +
  geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") +
  coord_sf(xlim = c(-82, -78), ylim = c(24.5, 28), expand = FALSE)

  • Note that coord_sf()has to be called after all geom_sf() calls, as to supersede any former input.

14.8 Using the {maps} Package with State, County, and City Data

The {maps} package is another option for working with {ggplot2}.

It is based on naturalearth data and adds additional information about US states and counties.

  • Install the package using the console.
  • Note when you load it, it masks the purrr::map() function
  • If you want to use purrr::map() then you have to specify using purrr::map() or detach the maps package.
library(maps)
states <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
head(states)
Simple feature collection with 6 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937
Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
                     ID                           geom
alabama         alabama MULTIPOLYGON (((-87.46201 3...
arizona         arizona MULTIPOLYGON (((-114.6374 3...
arkansas       arkansas MULTIPOLYGON (((-94.05103 3...
california   california MULTIPOLYGON (((-120.006 42...
colorado       colorado MULTIPOLYGON (((-102.0552 4...
connecticut connecticut MULTIPOLYGON (((-73.49902 4...

State names are part of this data, as the ID variable.

A simple (but not necessarily optimal) way to add state name is to compute the centroid of each state polygon as the coordinates where to draw their names.

  • Centroids are computed with the function st_centroid, their coordinates extracted with st_coordinates, and attached to the state object.
states <- bind_cols(states, as_tibble(st_coordinates(st_centroid(states))))
states$ID <- str_to_title(states$ID)

State data is directly plotted as an additional sf layer using geom_sf for simple features.

  • State names can be added using geom_text(), declaring coordinates on the X-axis and Y-axis, as well as the label (from ID), and a relatively big font size.
ggplot(data = world) +
  geom_sf() +
  geom_sf(data = states, fill = NA) +
  geom_text(data = states, aes(X, Y, label = ID), size = 5) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE)

  • Use nudge_* to adjust names as desired.
states$nudge_y <- -1
states$nudge_y[states$ID == "Florida"] <- 1.1
states$nudge_y[states$ID == "South Carolina"] <- .1

ggplot(data = world) +
  geom_sf() +
  geom_sf(data = states, fill = NA) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) +
  geom_text(
    data = states, aes(X, Y, label = ID), size = 5, fontface = "bold",
    nudge_y = states$nudge_y
  )

We can also get County data from maps.

counties <- st_as_sf(map("county", plot = FALSE, fill = TRUE))
counties  |> 
  filter(str_detect(ID, "florida")) ->
counties
counties$area <- as.numeric(st_area(counties))
head(counties)
Simple feature collection with 6 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -85.98951 ymin: 25.94926 xmax: -80.08804 ymax: 30.57303
Geodetic CRS:  +proj=longlat +ellps=clrk66 +no_defs +type=crs
                               ID                           geom       area
florida,alachua   florida,alachua MULTIPOLYGON (((-82.66062 2... 2498821972
florida,baker       florida,baker MULTIPOLYGON (((-82.04182 3... 1542442788
florida,bay           florida,bay MULTIPOLYGON (((-85.40509 3... 1946558057
florida,bradford florida,bradford MULTIPOLYGON (((-82.4257 29...  818885030
florida,brevard   florida,brevard MULTIPOLYGON (((-80.94747 2... 2189639534
florida,broward   florida,broward MULTIPOLYGON (((-80.89018 2... 3167310672

Create a choropleth map of the total area in each County in Florida.

ggplot(data = world) +
  geom_sf() +
  geom_sf(data = states, fill = NA) +
  geom_text(
    data = states, aes(X, Y, label = ID), size = 5, fontface = "bold",
    nudge_y = states$nudge_y
  ) +
  geom_sf(data = counties, aes(fill = area)) +
  scale_fill_viridis_c(trans = "sqrt", alpha = .4) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE)

You can add cities if you know their latitude and longitude.

  • If you have a GoogleMaps API you can use googleway::geocode() to find them.

Add the five largest cities in Florida to a data frame and convert it to an sf object.

flcities <- tibble(
  state = rep("Florida", 5),
  city = c("Miami", "Tampa", "Orlando", "Jacksonville", "Sarasota"),
  lat = c(25.7616798, 27.950575, 28.5383355, 30.3321838, 27.3364347),
  lng = c(-80.1917902, -82.4571776, -81.3792365, -81.655651, -82.5306527)
)

flcities <- st_as_sf(flcities,
  coords = c("lng", "lat"), remove = FALSE,
  crs = 4326, agr = "constant"
)

ggplot(data = world) +
  geom_sf() +
  geom_sf(data = states, fill = NA) +
  geom_text(
    data = states, aes(X, Y, label = ID), size = 5, fontface = "bold",
    nudge_y = states$nudge_y
  ) +
  geom_sf(data = counties, aes(fill = area)) +
  scale_fill_viridis_c(trans = "sqrt", alpha = .4) +
  geom_sf(data = flcities) +
  geom_text(
    data = flcities, aes(x = lng, y = lat, label = city),
    size = 3.9, col = "black", fontface = "bold"
  ) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE)

Use {ggrepel} package to fix the city name locations.

library("ggrepel")
ggplot(data = world) +
  geom_sf() +
  geom_sf(data = states, fill = NA) +
  geom_text(
    data = states, aes(X, Y, label = ID), size = 5, fontface = "bold",
    nudge_y = states$nudge_y
  ) +
  geom_sf(data = counties, aes(fill = area)) +
  scale_fill_viridis_c(trans = "sqrt", alpha = .4) +
  geom_text_repel(
    data = flcities, aes(x = lng, y = lat, label = city),
    fontface = "bold", 
    nudge_x = c(1, -1.5, 2, 2, -1), 
    nudge_y = c(0.25, -0.25, 0.5, 0.5, -0.5)) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE)

Combine the data all together with some extra formatting.

ggplot(data = world) +
  geom_sf(fill = "antiquewhite1") +
  geom_sf(data = counties, aes(fill = area)) +
  geom_sf(data = states, fill = NA) +
  geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") +
  geom_sf(data = flcities) +
  geom_text_repel(
    data = flcities, aes(x = lng, y = lat, label = city),
    fontface = "bold", nudge_x = c(1, -1.5, 2, 2, -1), nudge_y = c(
      0.25,
      -0.25, 0.5, 0.5, -0.5
    )
  ) +
  geom_label(
    data = states, aes(X, Y, label = ID), size = 5, fontface = "bold",
    nudge_y = states$nudge_y
  ) +
  scale_fill_viridis_c(trans = "sqrt", alpha = .4) +
  annotation_scale(location = "bl", width_hint = 0.4) +
  annotation_north_arrow(
    location = "bl", which_north = "true",
    pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
    style = north_arrow_fancy_orienteering
  ) +
  coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) +
  xlab("Longitude") +
  ylab("Latitude") +
  ggtitle("Observation Sites", subtitle = "(2 sites in Palm Beach County, Florida)") +
  theme(panel.grid.major = element_line(
    color = gray(0.5), linetype = "dashed",
    linewidth = 0.5
  ), panel.background = element_rect(fill = "aliceblue"))

14.9 Mapping the US with the {usmap} Package

The {usmap} package allows create choropleth maps of the United States with Alaska and Hawaii “moved” so they fit in the bottom left hand corner of the map.

  • It is difficult to create nice US choropleths that include Alaska and Hawaii given their distance from the other 48 states.
  • The {usmap} data frames have Alaska and Hawaii conveniently placed to the bottom left.

The {usmap} package has convenience functions for plotting choropleths and working with FIPS codes.

  • usmap::plot_usmap() returns a ggplot object, which means we can add {ggplot2} layers to the plot right out of the box.

Let’s load several libraries and the {usmap} data set for state populations.

library(tidyverse)
library(lubridate)
library(usmap)
data(statepop)
head(statepop)
# A tibble: 6 × 4
  fips  abbr  full       pop_2022
  <chr> <chr> <chr>         <dbl>
1 01    AL    Alabama     5074296
2 02    AK    Alaska       733583
3 04    AZ    Arizona     7359197
4 05    AR    Arkansas    3045637
5 06    CA    California 39029342
6 08    CO    Colorado    5839926

Note the column fips for Federal Information Processing System (FIPS) codes.

  • These are an older US government standard for using numbers to uniquely identify geographic areas.
  • The number of digits in FIPS codes vary depending on the level of geography.
    • State-level FIPS codes have two digits and county-level FIPS codes have five digits of which the first two are the FIPS code of the state to which the county belongs.
  • The US Census Bureau often provides FIPS codes with the data.
Important

Functions in {usmap} are built around the FIPS code identification system.

14.9.1 Create Maps of the Country, Regions and States

Let’s load some data saved from the Census Bureau at https://raw.githubusercontent.com/AU-datascience/data/main/413-613/state_geo_data.csv.

states <- read_csv("https://raw.githubusercontent.com/AU-datascience/data/main/413-613/state_geo_data.csv", col_types = cols(OID = col_character()))

Create an Empty map with a colored background and labels for states.

plot_usmap(regions = "states", labels = TRUE) +
  labs(
    title = "US States",
    subtitle = "This is a blank map of the states of the United States."
  ) +
  theme(panel.background = element_rect(color = "black", fill = "lightblue"))

Note the locations of Alaska and Hawaii are not geographically correct but they are convenient for display.

14.9.1.1 Create a map of the South Atlantic region

The packages uses pre-defined Census Regions.

plot_usmap(include = .south_atlantic, labels = TRUE) +
  labs(
    title = "US South Atlantic States",
    subtitle = "This is a blank map of the states in the South Atlantic Census Region"
  ) +
  theme(panel.background = element_rect(color = "black", fill = "lightblue"))

14.9.1.2 Create a map of the Counties

Map with just the borders.

plot_usmap(regions = "counties") +
  labs(
    title = "US Counties",
    subtitle = "This is a blank map of the counties of the United States."
  ) +
  theme(panel.background = element_rect(color = "black", fill = "lightblue"))

14.9.1.3 Create a map of the Counties in Specific States

Use include = with a character vector of state abbreviations.

plot_usmap("counties", include = c("VA", "MD", "DC")) +
  labs(
    title = "Custom DMV Area",
    subtitle = "These are the counties in MD, DC, and VA."
  ) +
  theme(panel.background = element_rect(color = "blue", fill = "lightblue"))

14.9.1.4 Create a Map of Specific Counties in a Custom Region

Select by FIPS (five digit code)

  • You could read in the data from the referenced FIP site to create a data frame you could search.
dmv_county_fips <- c(
  "11001", "24003", "24009", "24017", "24021", "24027", "24031",
  "24033", "51013", "51059", "51061", "51107", "51153", "51179",
  "51510", "51600", "51610", "51630", "51683", "51685"
)
plot_usmap("counties", include = dmv_county_fips, labels = TRUE) +
  labs(
    title = "Greater DMV Area",
    subtitle = "These are the Counties/Cities in the Greater DMV area."
  ) +
  theme(panel.background = element_rect(color = "blue", fill = "lightblue"))

14.9.2 Add COVID Data to Maps

14.9.2.1 Load COVID Data from Johns Hopkins

COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University

On March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data.

Download the data at https://raw.githubusercontent.com/AU-datascience/data/main/413-613/JHU_covid_us_23_03_09_clean.rds using the following and glimpse the data.

df_all <- read_rds("https://raw.githubusercontent.com/AU-datascience/data/main/413-613/JHU_covid_us_23_03_09_clean.rds")
glimpse(df_all)
Rows: 7,365,492
Columns: 9
$ case_type          <fct> confirmed_US, confirmed_US, confirmed_US, confirmed…
$ cFIPS              <chr> "01001", "01001", "01001", "01001", "01001", "01001…
$ sFIPS              <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01…
$ County             <chr> "Autauga", "Autauga", "Autauga", "Autauga", "Autaug…
$ Province_State     <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabam…
$ Population         <dbl> 55869, 55869, 55869, 55869, 55869, 55869, 55869, 55…
$ Date               <date> 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 20…
$ Daily_Total        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ Daily_Total_percap <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

14.9.3 State Level Data

Total Cases by State.

df_all |>
  filter(Date == max(Date), Population > 0, !is.na(sFIPS)) |>
  group_by(sFIPS, case_type, Date) |>
  summarise(
    Current_Total = sum(Daily_Total),
    Current_Total_percap = sum(Current_Total) / sum(Population),
    .groups = "drop"
  ) ->
state_totals


state_totals |>
  group_by(case_type) |>
  summarize(sumc = sum(Current_Total), n = n())
# A tibble: 2 × 3
  case_type         sumc     n
  <fct>            <dbl> <int>
1 confirmed_US 102548185    52
2 deaths_US      1102336    52

Add to US Maps.

state_totals |>
  filter(case_type == "confirmed_US") |>
  rename(fips = sFIPS) ->
state_cases

plot_usmap(data = state_cases, values = "Current_Total", color = "blue") +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Cases", label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0(
      "Total Cases by State as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

state_totals |>
  filter(case_type == "deaths_US") |>
  rename(fips = sFIPS) ->
state_cases
plot_usmap(data = state_cases, values = "Current_Total", color = "blue") +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Deaths", label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0("Total Deaths by State as of ", max(state_cases$Date))
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

Per Capita Maps.

state_totals |>
  filter(case_type == "confirmed_US") |>
  rename(fips = sFIPS) ->
state_cases

plot_usmap(
  data = state_cases, values = "Current_Total_percap",
  color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Cases per Capita",
    label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0(
      "Cases per Capita by State as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "right")

state_totals |>
  filter(case_type == "deaths_US") |>
  rename(fips = sFIPS) ->
state_cases

plot_usmap(
  data = state_cases, values = "Current_Total_percap",
  color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Deaths Per Capita",
    label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0("Deaths per Capita by State as of ",
      ... = max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "right")

Using a log10 Transformation.

state_totals |>
  filter(case_type == "deaths_US") |>
  mutate(
    Current_Total10 = log10(Current_Total),
    Current_Total_percap10 = log10(Current_Total_percap)
  ) |>
  rename(fips = sFIPS) ->
state_cases

plot_usmap(
  data = state_cases, values = "Current_Total_percap10",
  color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Total Deaths per Capita (Log 10)",
    label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0("Log(10) of Total Deaths per Capita by State as of ",
      ... = max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "right")

Use a Join to Create Per-Area Maps.

state_cases |>
  left_join(states, by = c("fips" = "STATE")) |>
  mutate(Current_Total_perarea = log10(Current_Total / AREALAND)) ->
state_casesa

plot_usmap(
  data = state_casesa, values = "Current_Total_perarea",
  color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "blue",
    name = "Deaths per unit Area (Log 10)",
    label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0("Deaths per unit area by State as of ",
      ... = max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "right")

14.9.4 County Level Data

Current Totals Cases by County (using the latest (Max) Date).

df_all |>
  filter(Date == max(Date), Population > 0, !is.na(cFIPS)) |>
  group_by(cFIPS, case_type, Date) |>
  summarise(
    Current_Total = sum(Daily_Total),
    Current_Total_percap = sum(Daily_Total) / sum(Population)
  ) ->
county_totals

Add to a County Map for New York and New Jersey.

county_totals |>
  filter(case_type == "confirmed_US") |>
  mutate(Current_Total_log2 = log2(Current_Total)) |>
  rename(fips = cFIPS) ->
county_cases

plot_usmap(
  data = county_cases, include = c("NY", "NJ"),
  values = "Current_Total", color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Confirmed Cases", label = scales::comma
  ) +
  labs(
    title = "US States",
    subtitle = paste0(
      "Total Cases by County as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

Add to a County/City Map for DMV.

plot_usmap(
  data = county_cases, include = dmv_county_fips, labels = TRUE,
  values = "Current_Total", color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Cases", label = scales::comma
  ) +
  labs(
    title = "DMV Region",
    subtitle = paste0(
      "Cases by County/City as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

Include region = "counties" to add the labels with labels = TRUE argument.

county_totals |>
  filter(case_type == "deaths_US") |>
  rename(fips = cFIPS) ->
county_deaths
plot_usmap(
  regions = "counties", data = county_deaths, include = dmv_county_fips,
  values = "Current_Total", color = "blue",
  labels = TRUE
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Deaths", label = scales::comma
  ) +
  labs(
    title = "DMV Region",
    subtitle = paste0(
      "Deaths by County/City as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

Per Capita Cases.

plot_usmap(
  data = county_cases, include = dmv_county_fips,
  values = "Current_Total_percap", color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Cases Per Capita", label = scales::comma
  ) +
  labs(
    title = "DMV Region",
    subtitle = paste0(
      "Cases per Capita by County/City as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

Per Capita Deaths in DMV.

county_totals |>
  filter(case_type == "deaths_US") |>
  rename(fips = cFIPS) ->
county_deaths
plot_usmap(
  data = county_deaths, include = dmv_county_fips,
  values = "Current_Total_percap", color = "blue",
  labels = TRUE
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "Deaths per Capita", label = scales::comma
  ) +
  labs(
    title = "DMV Region",
    subtitle = paste0(
      "Deaths per Capita by County/City as of ",
      max(state_cases$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")

14.9.5 Create Daily Change Data from Cumulative Totals

Need to convert the daily totals to differences

  • Gets a bit complicated as there may be days when corrections lead to reductions in the totals so the max of the daily total day may not be correct for some geographies.
  • Have to arrange the data.
df_all |>
  filter(!Daily_Total == 0) |>
  select(Province_State, County, cFIPS, Date, case_type, Daily_Total) |>
  pivot_wider(names_from = case_type, values_from = Daily_Total, values_fill = 0) |>
  rename("cases" = confirmed_US, "deaths" = deaths_US, "fips" = cFIPS) |>
  arrange(Province_State, County, Date) |>
  group_by(Province_State, County) |>
  mutate(
    first_date = min(Date),
    first_cases = cases[Date == first_date],
    first_deaths = deaths[Date == first_date],
    daily_cases = c(min(first_cases), diff(cases)),
    daily_deaths = c(min(first_deaths), diff(deaths))
  ) |>
  select(Date, County, Province_State, fips, cases, daily_cases, deaths, daily_deaths) ->
df_daily_data

Map the maximum case count for each county at any time.

dmv_county_fips <- c(
  "11001", "24003", "24009", "24017", "24021", "24027", "24031",
  "24033", "51013", "51059", "51061", "51107", "51153", "51179",
  "51510", "51600", "51610", "51630", "51683", "51685"
)
df_daily_data |>
  group_by(Province_State, County, fips) |>
  filter(fips %in% dmv_county_fips)  |>
  summarize(max_cases = max(daily_cases), .groups = "drop") ->
max_cases_df

plot_usmap(
  data = max_cases_df, include = dmv_county_fips,
  values = "max_cases", color = "blue"
) +
  scale_fill_continuous(
    low = "white", high = "red",
    name = "max_cases", label = scales::comma
  ) +
  labs(
    title = "DMV Region",
    subtitle = paste0(
      "Total Deaths per Capita by County/City as of ",
      max(df_daily_data$Date)
    )
  ) +
  theme(panel.background = element_rect(color = "black", fill = "white")) +
  theme(legend.position = "top")