| Title: | Data on the States and Counties of the United States | 
| Version: | 0.3.1 | 
| Description: | Demographic data on the United States at the county and state levels spanning multiple years. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.3.1 | 
| URL: | https://github.com/OpenIntroStat/usdata, https://openintrostat.github.io/usdata/ | 
| BugReports: | https://github.com/OpenIntroStat/usdata/issues | 
| Suggests: | dplyr, ggplot2, maps, lubridate, sf, testthat | 
| Imports: | tibble | 
| Depends: | R (≥ 2.10) | 
| NeedsCompilation: | no | 
| Packaged: | 2024-06-02 01:19:18 UTC; mine | 
| Author: | Mine Çetinkaya-Rundel | 
| Maintainer: | Mine Çetinkaya-Rundel <cetinkaya.mine@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-06-02 09:40:02 UTC | 
usdata: Data on the States and Counties of the United States
Description
 
Demographic data on the United States at the county and state levels spanning multiple years.
Author(s)
Maintainer: Mine Çetinkaya-Rundel cetinkaya.mine@gmail.com (ORCID)
Authors:
- David Diez david@openintro.org 
- Leah Dorazio leah.dorazio@sfuhs.org 
See Also
Useful links:
- Report bugs at https://github.com/OpenIntroStat/usdata/issues 
Convert state abbreviations to names
Description
Two utility functions. One converts state names to the state abbreviations, and the second does the opposite.
Usage
abbr2state(abbr)
Arguments
| abbr | A vector of state abbreviation. | 
Value
Returns a vector of the same length with the corresponding state names or abbreviations.
Author(s)
David Diez
See Also
state2abbr, county, county_complete
Examples
abbr2state("MN")
Airline Delays for December 2019 and 2020.
Description
Summary Data counts for airline per carrier per US City.
Usage
airline_delay
Format
A data frame with 3351 rows and 21 variables.
- year
- Year data collected 
- month
- Numeric representation of the month 
- carrier
- Carrier. 
- carrier_name
- Carrier Name. 
- airport
- Airport code. 
- airport_name
- Name of airport. 
- arr_flights
- Number of flights arriving at airport 
- arr_del15
- Number of flights more than 15 minutes late 
- carrier_ct
- Number of flights delayed due to air carrier. (e.g. no crew) 
- weather_ct
- Number of flights due to weather. 
- nas_ct
- Number of flights delayed due to National Aviation System (e.g. heavy air traffic). 
- security_ct
- Number of flights canceled due to a security breach. 
- late_aircraft_ct
- Number of flights delayed as a result of another flight on the same aircraft delayed 
- arr_cancelled
- Number of cancelled flights 
- arr_diverted
- Number of flights that were diverted 
- arr_delay
- Total time (minutes) of delayed flight. 
- carrier_delay
- Total time (minutes) of delay due to air carrier 
- weather_delay
- Total time (minutes) of delay due to inclement weather. 
- nas_delay
- Total time (minutes) of delay due to National Aviation System. 
- security_delay
- Total time (minutes) of delay as a result of a security issue . 
- late_aircraft_delay
- Total time (minutes) of delay flights as a result of a previous flight on the same airplane being late. 
Source
Bureau of Transportation Statistics
Examples
library(ggplot2)
ggplot(airline_delay, aes(arr_flights, arr_del15, color = as.factor(year))) +
  geom_point(alpha = 0.3) +
  labs(
    x = "Total Number of inbound flights",
    y = "Number of flights delayed by more than 15 mins",
    title = "Inbound vs delayed flights by year",
    color = "Year"
  )
United States Counties
Description
Data for 3142 counties in the United States. See the
county_complete data set for additional variables.
Usage
county
Format
A data frame with 3142 observations on the following 14 variables.
- name
- County names. 
- state
- State names. 
- pop2000
- Population in 2000. 
- pop2010
- Population in 2010. 
- pop2017
- Population in 2017. 
- pop_change
- Population change from 2010 to 2017. 
- poverty
- Percent of population in poverty in 2017. 
- homeownership
- Home ownership rate, 2006-2010. 
- multi_unit
- Percent of housing units in multi-unit structures, 2006-2010. 
- unemployment_rate
- Unemployment rate in 2017. 
- metro
- Whether the county contains a metropolitan area. 
- median_edu
- Median education level (2013-2017). 
- per_capita_income
- Per capita (per person) income (2013-2017). 
- median_hh_income
- Median household income. 
- smoking_ban
- Describes whether the type of county-level smoking ban in place in 2010, taking one of the values - "none",- "partial", or- "comprehensive".
Source
These data were collected from Census Quick Facts (no longer available as of 2020) and its accompanying pages. Smoking ban data were from a variety of sources.
See Also
Examples
library(ggplot2)
ggplot(county, aes(x = median_edu, y = median_hh_income)) +
  geom_boxplot()
American Community Survey 2019
Description
Data for 3142 counties in the United States with many variables of the 2019 American Community Survey.
Usage
county_2019
Format
A data frame with 3142 observations on the following 95 variables.
- state
- State. 
- name
- County name. 
- fips
- FIPS code. 
- median_individual_income
- Median individual income (2019). 
- median_individual_income_moe
- Margin of error for - median_individual_income.
- pop
- 2019 population. 
- pop_moe
- Margin of error for - pop.
- white
- Percent of population that is white alone (2015-2019). 
- white_moe
- Margin of error for - white.
- black
- Percent of population that is black alone (2015-2019). 
- black_moe
- Margin of error for - black.
- native
- Percent of population that is Native American alone (2015-2019). 
- native_moe
- Margin of error for - native.
- asian
- Percent of population that is Asian alone (2015-2019). 
- asian_moe
- Margin of error for - asian.
- pac_isl
- Percent of population that is Native Hawaiian or other Pacific Islander alone (2015-2019). 
- pac_isl_moe
- Margin of error for - pac_isl.
- other_single_race
- Percent of population that is some other race alone (2015-2019). 
- other_single_race_moe
- Margin of error for - other_single_race.
- two_plus_races
- Percent of population that is two or more races (2015-2019). 
- two_plus_races_moe
- Margin of error for - two_plus_races.
- hispanic
- Percent of population that identifies as Hispanic or Latino (2015-2019). 
- hispanic_moe
- Margin of error for - hispanic.
- white_not_hispanic
- Percent of population that is white alone, not Hispanic or Latino (2015-2019). 
- white_not_hispanic_moe
- Margin of error for - white_not_hispanic.
- median_age
- Median age (2015-2019). 
- median_age_moe
- Margin of error for - median_age.
- age_under_5
- Percent of population under 5 (2015-2019). 
- age_under_5_moe
- Margin of error for - age_under_5.
- age_over_85
- Percent of population 85 and over (2015-2019). 
- age_over_85_moe
- Margin of error for - age_over_85.
- age_over_18
- Percent of population 18 and over (2015-2019). 
- age_over_18_moe
- Margin of error for - age_over_18.
- age_over_65
- Percent of population 65 and over (2015-2019). 
- age_over_65_moe
- Margin of error for - age_over_65.
- mean_work_travel
- Mean travel time to work (2015-2019). 
- mean_work_travel_moe
- Margin of error for - mean_work_travel.
- persons_per_household
- Persons per household (2015-2019) 
- persons_per_household_moe
- Margin of error for - persons_per_household.
- avg_family_size
- Average family size (2015-2019). 
- avg_family_size_moe
- Margin of error for - avg_family_size.
- housing_one_unit_structures
- Percent of housing units in 1-unit structures (2015-2019). 
- housing_one_unit_structures_moe
- Margin of error for - housing_one_unit_structures.
- housing_two_unit_structures
- Percent of housing units in multi-unit structures (2015-2019). 
- housing_two_unit_structures_moe
- Margin of error for - housing_two_unit_structures.
- housing_mobile_homes
- Percent of housing units in mobile homes and other types of units (2015-2019). 
- housing_mobile_homes_moe
- Margin of error for - housing_mobile_homes.
- median_individual_income_age_25plus
- Median individual income (2019 dollars, 2015-2019). 
- median_individual_income_age_25plus_moe
- Margin of error for - median_individual_income_age_25plus.
- hs_grad
- Percent of population 25 and older that is a high school graduate (2015-2019). 
- hs_grad_moe
- Margin of error for - hs_grad.
- bachelors
- Percent of population 25 and older that earned a Bachelor's degree or higher (2015-2019). 
- bachelors_moe
- Margin of error for - bachelors.
- households
- Total households (2015-2019). 
- households_moe
- Margin of error for - households.
- households_speak_spanish
- Percent of households speaking Spanish (2015-2019). 
- households_speak_spanish_moe
- Margin of error for - households_speak_spanish.
- households_speak_other_indo_euro_lang
- Percent of households speaking other Indo-European language (2015-2019). 
- households_speak_other_indo_euro_lang_moe
- Margin of error for - households_speak_other_indo_euro_lang.
- households_speak_asian_or_pac_isl
- Percent of households speaking Asian and Pacific Island language (2015-2019). 
- households_speak_asian_or_pac_isl_moe
- Margin of error for - households_speak_asian_or_pac_isl.
- households_speak_other
- Percent of households speaking non European or Asian/Pacific Island language (2015-2019). 
- households_speak_other_moe
- Margin of error for - households_speak_other.
- households_speak_limited_english
- Percent of limited English-speaking households (2015-2019). 
- households_speak_limited_english_moe
- Margin of error for - households_speak_limited_english.
- poverty
- Percent of population below the poverty level (2015-2019). 
- poverty_moe
- Margin of error for - poverty.
- poverty_under_18
- Percent of population under 18 below the poverty level (2015-2019). 
- poverty_under_18_moe
- Margin of error for - poverty_under_18.
- poverty_65_and_over
- Percent of population 65 and over below the poverty level (2015-2019). 
- poverty_65_and_over_moe
- Margin of error for - poverty_65_and_over.
- mean_household_income
- Mean household income (2019 dollars, 2015-2019). 
- mean_household_income_moe
- Margin of error for - mean_household_income.
- per_capita_income
- Per capita money income in past 12 months (2019 dollars, 2015-2019). 
- per_capita_income_moe
- Margin of error for - per_capita_income.
- median_household_income
- Median household income (2015-2019). 
- median_household_income_moe
- Margin of error for - median_household_income.
- veterans
- Percent among civilian population 18 and over that are veterans (2015-2019). 
- veterans_moe
- Margin of error for - veterans.
- unemployment_rate
- Unemployment rate among those ages 20-64 (2015-2019). 
- unemployment_rate_moe
- Margin of error for - unemployment_rate.
- uninsured
- Percent of civilian noninstitutionalized population that is uninsured (2015-2019). 
- uninsured_moe
- Margin of error for - uninsured.
- uninsured_under_6
- Percent of population under 6 years that is uninsured (2015-2019). 
- uninsured_under_6_moe
- Margin of error for - uninsured_under_6.
- uninsured_under_19
- Percent of population under 19 that is uninsured (2015-2019). 
- uninsured_under_19_moe
- Margin of error for - uninsured_under_19.
- uninsured_65_and_older
- Percent of population 65 and older that is uninsured (2015-2019). 
- uninsured_65_and_older_moe
- Margin of error for - uninsured_65_and_older.
- household_has_computer
- Percent of households that have desktop or laptop computer (2015-2019). 
- household_has_computer_moe
- Margin of error for - household_has_computer.
- household_has_smartphone
- Percent of households that have smartphone (2015-2019). 
- household_has_smartphone_moe
- Margin of error for - household_has_smartphone.
- household_has_broadband
- Percent of households that have broadband internet subscription (2015-2019). 
- household_has_broadband_moe
- Margin of error for - household_has_broadband.
Source
The data were downloaded via the tidycensus R package.
See Also
Examples
library(ggplot2)
ggplot(
  county_2019,
  aes(
    x = hs_grad, y = median_individual_income,
    size = sqrt(pop) / 1000
  )
) +
  geom_point(alpha = 0.5) +
  scale_color_discrete(na.translate = FALSE) +
  guides(size = FALSE) +
  labs(
    x = "Percentage of population graduated from high school",
    y = "Median individual income"
  )
United States Counties
Description
Data for 3142 counties in the United States.
Usage
county_complete
Format
A data frame with 3142 observations on the following 188 variables.
- state
- State. 
- name
- County name. 
- fips
- FIPS code. 
- pop2000
- 2000 population. 
- pop2010
- 2010 population. 
- pop2011
- 2011 population. 
names
- pop2012
- 2012 population. 
- pop2013
- 2013 population. 
- pop2014
- 2014 population. 
- pop2015
- 2015 population. 
- pop2016
- 2016 population. 
- pop2017
- 2017 population. 
- age_under_5_2010
- Percent of population under 5 (2010). 
- age_under_5_2017
- Percent of population under 5 (2017). 
- age_under_18_2010
- Percent of population under 18 (2010). 
- age_over_65_2010
- Percent of population over 65 (2010). 
- age_over_65_2017
- Percent of population over 65 (2017). 
- median_age_2017
- Median age (2017). 
- female_2010
- Percent of population that is female (2010). 
- white_2010
- Percent of population that is white (2010). 
- black_2010
- Percent of population that is black (2010). 
- black_2017
- Percent of population that is black (2017). 
- native_2010
- Percent of population that is a Native American (2010). 
- native_2017
- Percent of population that is a Native American (2017). 
- asian_2010
- Percent of population that is a Asian (2010). 
- asian_2017
- Percent of population that is a Asian (2017). 
- pac_isl_2010
- Percent of population that is Hawaii or Pacific Islander (2010). 
- pac_isl_2017
- Percent of population that is Hawaii or Pacific Islander (2017). 
- other_single_race_2017
- Percent of population that identifies as another single race (2017). 
- two_plus_races_2010
- Percent of population that identifies as two or more races (2010). 
- two_plus_races_2017
- Percent of population that identifies as two or more races (2017). 
- hispanic_2010
- Percent of population that is Hispanic (2010). 
- hispanic_2017
- Percent of population that is Hispanic (2017). 
- white_not_hispanic_2010
- Percent of population that is white and not Hispanic (2010). 
- white_not_hispanic_2017
- Percent of population that is white and not Hispanic (2017). 
- speak_english_only_2017
- Percent of population that speaks English only (2017). 
- no_move_in_one_plus_year_2010
- Percent of population that has not moved in at least one year (2006-2010). 
- foreign_born_2010
- Percent of population that is foreign-born (2006-2010). 
- foreign_spoken_at_home_2010
- Percent of population that speaks a foreign language at home (2006-2010). 
- women_16_to_50_birth_rate_2017
- Birth rate for women ages 16 to 50 (2017). 
- hs_grad_2010
- Percent of population that is a high school graduate (2006-2010). 
- hs_grad_2016
- Percent of population that is a high school graduate (2012-2016). 
- hs_grad_2017
- Percent of population that is a high school graduate (2017). 
- some_college_2016
- Percent of population with some college education (2012-2016). 
- some_college_2017
- Percent of population with some college education (2017). 
- bachelors_2010
- Percent of population that earned a bachelor's degree (2006-2010). 
- bachelors_2016
- Percent of population that earned a bachelor's degree (2012-2016). 
- bachelors_2017
- Percent of population that earned a bachelor's degree (2017). 
- veterans_2010
- Percent of population that are veterans (2006-2010). 
- veterans_2017
- Percent of population that are veterans (2017). 
- mean_work_travel_2010
- Mean travel time to work (2006-2010). 
- mean_work_travel_2017
- Mean travel time to work (2017). 
- broadband_2017
- Percent of population who has access to broadband (2017). 
- computer_2017
- Percent of population who has access to a computer (2017). 
- housing_units_2010
- Number of housing units (2010). 
- homeownership_2010
- Home ownership rate (2006-2010). 
- housing_multi_unit_2010
- Housing units in multi-unit structures (2006-2010). 
- median_val_owner_occupied_2010
- Median value of owner-occupied housing units (2006-2010). 
- households_2010
- Households (2006-2010). 
- households_2017
- Households (2017). 
- persons_per_household_2010
- Persons per household (2006-2010). 
- persons_per_household_2017
- Persons per household (2017). 
- per_capita_income_2010
- Per capita money income in past 12 months (2010 dollars, 2006-2010) 
- per_capita_income_2017
- Per capita money income in past 12 months (2017 dollars, 2017) 
- metro_2013
- Whether the county contained a metropolitan area in 2013. 
- median_household_income_2010
- Median household income (2006-2010). 
- median_household_income_2016
- Median household income (2012-2016). 
- median_household_income_2017
- Median household income (2017). 
- private_nonfarm_establishments_2009
- Private nonfarm establishments (2009). 
- private_nonfarm_employment_2009
- Private nonfarm employment (2009). 
- percent_change_private_nonfarm_employment_2009
- Private nonfarm employment, percent change from 2000 to 2009. 
- nonemployment_establishments_2009
- Nonemployer establishments (2009). 
- firms_2007
- Total number of firms (2007). 
- black_owned_firms_2007
- Black-owned firms, percent (2007). 
- native_owned_firms_2007
- Native American-owned firms, percent (2007). 
- asian_owned_firms_2007
- Asian-owned firms, percent (2007). 
- pac_isl_owned_firms_2007
- Native Hawaiian and other Pacific Islander-owned firms, percent (2007). 
- hispanic_owned_firms_2007
- Hispanic-owned firms, percent (2007). 
- women_owned_firms_2007
- Women-owned firms, percent (2007). 
- manufacturer_shipments_2007
- Manufacturer shipments, 2007 ($1000). 
- mercent_whole_sales_2007
- Mercent wholesaler sales, 2007 ($1000). 
- sales_2007
- Retail sales, 2007 ($1000). 
- sales_per_capita_2007
- Retail sales per capita, 2007. 
- accommodation_food_service_2007
- Accommodation and food services sales, 2007 ($1000). 
- building_permits_2010
- Building permits (2010). 
- fed_spending_2009
- Federal spending, in thousands of dollars (2009). 
- area_2010
- Land area in square miles (2010). 
- density_2010
- Persons per square mile (2010). 
- smoking_ban_2010
- Describes whether the type of county-level smoking ban in place in 2010, taking one of the values - "none",- "partial", or- "comprehensive".
- poverty_2010
- Percent of population below poverty level (2006-2010). 
- poverty_2016
- Percent of population below poverty level (2012-2016). 
- poverty_2017
- Percent of population below poverty level (2017). 
- poverty_age_under_5_2017
- Percent of population under age 5 below poverty level (2017). 
- poverty_age_under_18_2017
- Percent of population under age 18 below poverty level (2017). 
- civilian_labor_force_2007
- Civilian labor force in 2007. 
- employed_2007
- Number of civilians employed in 2007. 
- unemployed_2007
- Number of civilians unemployed in 2007. 
- unemployment_rate_2007
- Unemployment rate in 2007. 
- civilian_labor_force_2008
- Civilian labor force in 2008. 
- employed_2008
- Number of civilians employed in 2008. 
- unemployed_2008
- Number of civilians unemployed in 2008. 
- unemployment_rate_2008
- Unemployment rate in 2008. 
- civilian_labor_force_2009
- Civilian labor force in 2009. 
- employed_2009
- Number of civilians employed in 2009. 
- unemployed_2009
- Number of civilians unemployed in 2009. 
- unemployment_rate_2009
- Unemployment rate in 2009. 
- civilian_labor_force_2010
- Civilian labor force in 2010. 
- employed_2010
- Number of civilians employed in 2010. 
- unemployed_2010
- Number of civilians unemployed in 2010. 
- unemployment_rate_2010
- Unemployment rate in 2010. 
- civilian_labor_force_2011
- Civilian labor force in 2011. 
- employed_2011
- Number of civilians employed in 2011. 
- unemployed_2011
- Number of civilians unemployed in 2011. 
- unemployment_rate_2011
- Unemployment rate in 2011. 
- civilian_labor_force_2012
- Civilian labor force in 2012. 
- employed_2012
- Number of civilians employed in 2012. 
- unemployed_2012
- Number of civilians unemployed in 2012. 
- unemployment_rate_2012
- Unemployment rate in 2012. 
- civilian_labor_force_2013
- Civilian labor force in 2013. 
- employed_2013
- Number of civilians employed in 2013. 
- unemployed_2013
- Number of civilians unemployed in 2013. 
- unemployment_rate_2013
- Unemployment rate in 2013. 
- civilian_labor_force_2014
- Civilian labor force in 2014. 
- employed_2014
- Number of civilians employed in 2014. 
- unemployed_2014
- Number of civilians unemployed in 2014. 
- unemployment_rate_2014
- Unemployment rate in 2014. 
- civilian_labor_force_2015
- Civilian labor force in 2015. 
- employed_2015
- Number of civilians employed in 2015. 
- unemployed_2015
- Number of civilians unemployed in 2015. 
- unemployment_rate_2015
- Unemployment rate in 2015. 
- civilian_labor_force_2016
- Civilian labor force in 2016. 
- employed_2016
- Number of civilians employed in 2016. 
- unemployed_2016
- Number of civilians unemployed in 2016. 
- unemployment_rate_2016
- Unemployment rate in 2016. 
- uninsured_2017
- Percent of population who are uninsured (2017). 
- uninsured_age_under_6_2017
- Percent of population under 6 who are uninsured (2017). 
- uninsured_age_under_19_2017
- Percent of population under 19 who are uninsured (2017). 
- uninsured_age_over_74_2017
- Percent of population under 74 who are uninsured (2017). 
- civilian_labor_force_2017
- Civilian labor force in 2017. 
- employed_2017
- Number of civilians employed in 2017. 
- unemployed_2017
- Number of civilians unemployed in 2017. 
- unemployment_rate_2017
- Unemployment rate in 2017. 
- median_individual_income_2019
- Median individual income (2019). 
- pop_2019
- 2019 population. 
- white_2019
- Percent of population that is white alone (2015-2019). 
- black_2019
- Percent of population that is black alone (2015-2019). 
- native_2019
- Percent of population that is Native American alone (2015-2019). 
- asian_2019
- Percent of population that is Asian alone (2015-2019). 
- pac_isl_2019
- Percent of population that is Native Hawaiian or other Pacific Islander alone (2015-2019). 
- other_single_race_2019
- Percent of population that is some other race alone (2015-2019). 
- two_plus_races_2019
- Percent of population that is two or more races (2015-2019). 
- hispanic_2019
- Percent of population that identifies as Hispanic or Latino (2015-2019). 
- white_not_hispanic_2019
- Percent of population that is white alone, not Hispanic or Latino (2015-2019). 
- median_age_2019
- Median age (2015-2019). 
- age_under_5_2019
- Percent of population under 5 (2015-2019). 
- age_over_85_2019
- Percent of population 85 and over (2015-2019). 
- age_over_18_2019
- Percent of population 18 and over (2015-2019). 
- age_over_65_2019
- Percent of population 65 and over (2015-2019). 
- mean_work_travel_2019
- Mean travel time to work (2015-2019). 
- persons_per_household_2019
- Persons per household (2015-2019) 
- avg_family_size_2019
- Average family size (2015-2019). 
- housing_one_unit_structures_2019
- Percent of housing units in 1-unit structures (2015-2019). 
- housing_two_unit_structures_2019
- Percent of housing units in multi-unit structures (2015-2019). 
- housing_mobile_homes_2019
- Percent of housing units in mobile homes and other types of units (2015-2019). 
- median_individual_income_age_25plus_2019
- Median individual income (2019 dollars, 2015-2019). 
- hs_grad_2019
- Percent of population 25 and older that is a high school graduate (2015-2019). 
- bachelors_2019
- Percent of population 25 and older that earned a Bachelor's degree or higher (2015-2019). 
- households_2019
- Total households (2015-2019). 
- households_speak_spanish_2019
- Percent of households speaking Spanish (2015-2019). 
- households_speak_other_indo_euro_lang_2019
- Percent of households speaking other Indo-European language (2015-2019). 
- households_speak_asian_or_pac_isl_2019
- Percent of households speaking Asian and Pacific Island language (2015-2019). 
- households_speak_other_2019
- Percent of households speaking non European or Asian/Pacific Island language (2015-2019). 
- households_speak_limited_english_2019
- Percent of limited English-speaking households (2015-2019). 
- poverty_2019
- Percent of population below the poverty level (2015-2019). 
- poverty_under_18_2019
- Percent of population under 18 below the poverty level (2015-2019). 
- poverty_65_and_over_2019
- Percent of population 65 and over below the poverty level (2015-2019). 
- mean_household_income_2019
- Mean household income (2019 dollars, 2015-2019). 
- per_capita_income_2019
- Per capita money income in past 12 months (2019 dollars, 2015-2019). 
- median_household_income_2019
- Median household income (2015-2019). 
- veterans_2019
- Percent among civilian population 18 and over that are veterans (2015-2019). 
- unemployment_rate_2019
- Unemployment rate among those ages 20-64 (2015-2019). 
- uninsured_2019
- Percent of civilian noninstitutionalized population that is uninsured (2015-2019). 
- uninsured_under_6_2019
- Percent of population under 6 years that is uninsured (2015-2019). 
- uninsured_under_19_2019
- Percent of population under 19 that is uninsured (2015-2019). 
- uninsured_65_and_older_2019
- Percent of population 65 and older that is uninsured (2015-2019). 
- household_has_computer_2019
- Percent of households that have desktop or laptop computer (2015-2019). 
- household_has_smartphone_2019
- Percent of households that have smartphone (2015-2019). 
- household_has_broadband_2019
- Percent of households that have broadband internet subscription (2015-2019). 
Source
The data prior to 2011 was from http://census.gov, though the exact page it came from is no longer available.
More recent data comes from the following sources.
- Downloaded via the - tidycensusR package.
- Download links for spreadsheets were found on https://www.ers.usda.gov/data-products/county-level-data-sets/download-data 
- Unemployment - Bureau of Labor Statistics - LAUS data - https://www.bls.gov/lau/. 
- Median Household Income - Census Bureau - Small Area Income and Poverty Estimates (SAIPE) data. 
- The original data table was prepared by USDA, Economic Research Service. 
- Census Bureau. 
- 2012-16 American Community Survey 5-yr average. 
- The original data table was prepared by USDA, Economic Research Service. 
- Tim Parker (tparker at ers.usda.gov) is the contact for much of the new data incorporated into this data set. 
See Also
Examples
library(dplyr)
library(ggplot2)
county_complete |>
  mutate(
    pop_change = 100 * ((pop2017 / pop2013) - 1),
    metro_area = if_else(metro_2013 == 1, TRUE, FALSE)
  ) |>
  ggplot(aes(
    x = poverty_2016,
    y = pop_change,
    color = metro_area,
    size = sqrt(pop2017) / 1e3
  )) +
  geom_point(alpha = 0.5) +
  scale_color_discrete(na.translate = FALSE) +
  guides(size = FALSE) +
  labs(
    x = "Percentage of population in poverty (2016)",
    y = "Percentage population change between 2013 to 2017",
    color = "Metropolitan area",
    title = "Population change and poverty"
  )
# Counties with high population change
county_complete |>
  mutate(pop_change = 100 * ((pop2017 / pop2013) - 1)) |>
  filter(pop_change < -10 | pop_change > 25) |>
  select(state, name, fips, pop_change)
# Population by metro area
county_complete |>
  mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |>
  filter(!is.na(metro_area)) |>
  ggplot(aes(x = metro_area, y = log(pop2017))) +
  geom_violin() +
  labs(
    x = "Metro area",
    y = "Log of population in 2017",
    title = "Population by metro area"
  )
# Poverty and median household income
county_complete |>
  mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |>
  ggplot(aes(
    x = poverty_2016,
    y = median_household_income_2016,
    color = metro_area,
    size = sqrt(pop2017) / 1e3
  )) +
  geom_point(alpha = 0.5) +
  scale_color_discrete(na.translate = FALSE) +
  guides(size = FALSE) +
  labs(
    x = "Percentage of population in poverty (2016)",
    y = "Median household income (2016)",
    color = "Metropolitan area",
    title = "Poverty and median household income"
  )
# Unemployment rate and poverty
county_complete |>
  mutate(metro_area = if_else(metro_2013 == 1, TRUE, FALSE)) |>
  ggplot(aes(
    x = unemployment_rate_2017,
    y = poverty_2016,
    color = metro_area,
    size = sqrt(pop2017) / 1e3
  )) +
  geom_point(alpha = 0.5) +
  scale_color_discrete(na.translate = FALSE) +
  guides(size = FALSE) +
  labs(
    x = "Unemployment rate (2017)",
    y = "Percentage of population in poverty (2016)",
    color = "Metropolitan area",
    title = "Unemployment rate and poverty"
  )
Fatal Police Shootings data.
Description
A subset of the Washington Post database. Contains records of every fatal police shooting by an on-duty officer since January 1, 2015.
Usage
fatal_police_shootings
Format
A data frame with 6421 rows and 12 variables.
- date
- date of fatal shooting. 
- manner_of_death
- shot or shot and Tasered. 
- armed
- Indicates if the victim was armed with some sort of implement that a police officer believed could inflict harm. 
- age
- the age of the victim. 
- gender
- The gender of the victim. The Post identifies victims by the gender they identify with if reports indicate that it differs from their biological sex. 
- race
- W White non-Hispanic; B Black non-Hispanic; A Asian; N Native American; H Hispanic; O Other None unknown. 
- city
- The municipality where the fatal shooting took place. Note that in some cases this field may contain a county name if a more specific municipality is unavailable or unknown. 
- state
- two-letter postal code abbreviation. 
- signs_of_mental_illness
- If news reports have indicated the victim had a history of mental health issues, expressed suicidal intentions or was experiencing mental distress at the time of the shooting. 
- threat_level
- The general criteria for the attack label was that there was the most direct and immediate threat to life that would include incidents where officers or others were shot at, threatened with a gun, attacked with other weapons or physical force, etc. ; the attack category is meant to flag the highest level of threat; the other and undetermined categories represent all remaining cases; other includes many incidents where officers or others faced significant threats. 
- flee
- If news reports have indicated the victim was moving away from officers by Foot, by Car, or Not fleeing. 
- body_camera
- If news reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident. 
Source
Examples
library(dplyr)
# List race frequency and percentage
fatal_police_shootings |>
  group_by(race) |>
  summarize(n = n()) |>
  mutate(freq = n / sum(n) * 100)
# List different weapons that victims were armed with
fatal_police_shootings |>
  distinct(armed)
Gerrymander
Description
A dataset on gerrymandering and its influence on House elections. The data set was originally built by Jeff Whitmer.
Usage
gerrymander
Format
A data frame with 435 rows and 12 variables:
- district
- Congressional district. 
- last_name
- Last name of 2016 election winner. 
- first_name
- First name of 2016 election winnner. 
- party16
- Political party of 2016 election winner. 
- clinton16
- Percent of vote received by Clinton in 2016 Presidential Election. 
- trump16
- Percent of vote received by Trump in 2016 Presidential Election. 
- dem16
- Did a Democrat win the 2016 House election. Levels of 1 (yes) and 0 (no). 
- state
- State the Representative is from. 
- party18
- Political Party of the 2018 election winner. 
- dem18
- Did a Democrat win the 2018 House election. Levels of 1 (yes) and 0 (no). 
- flip18
- Did a Democrat flip the seat in the 2018 election? Levels of 1 (yes) and 0 (no). 
- gerry
- Categorical variable for prevalence of gerrymandering with levels of low, mid and high. 
Source
Examples
library(ggplot2)
library(dplyr)
ggplot(gerrymander |> filter(gerry != "mid"), aes(clinton16, dem16, color = gerry)) +
  geom_jitter(height = 0.05, size = 3, shape = 1) +
  geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) +
  scale_color_manual(values = c("purple", "orange")) +
  labs(
    title = "Logistic Regression of 2016 House Elections",
    subtitle = "by Congressional District",
    x = "Percent of Presidential Vote Won by Clinton",
    y = "Seat Won by Democrat Candidate",
    color = "Gerrymandering"
  )
Election results for 2010 Governor races in the U.S.
Description
Election results for 2010 Governor races in the U.S.
Usage
govrace10
Format
A data frame with 37 observations on the following 23 variables.
- id
- Unique identifier for the race, which does not overlap with other 2010 races (see - houserace10and- senaterace10)
- state
- State name 
- abbr
- State name abbreviation 
- name1
- Name of the winning candidate 
- perc1
- Percentage of vote for winning candidate (if more than one candidate) 
- party1
- Party of winning candidate 
- votes1
- Number of votes for winning candidate 
- name2
- Name of candidate with second most votes 
- perc2
- Percentage of vote for candidate who came in second 
- party2
- Party of candidate with second most votes 
- votes2
- Number of votes for candidate who came in second 
- name3
- Name of candidate with third most votes 
- perc3
- Percentage of vote for candidate who came in third 
- party3
- Party of candidate with third most votes 
- votes3
- Number of votes for candidate who came in third 
- name4
- Name of candidate with fourth most votes 
- perc4
- Percentage of vote for candidate who came in fourth 
- party4
- Party of candidate with fourth most votes 
- votes4
- Number of votes for candidate who came in fourth 
- name5
- Name of candidate with fifth most votes 
- perc5
- Percentage of vote for candidate who came in fifth 
- party5
- Party of candidate with fifth most votes 
- votes5
- Number of votes for candidate who came in fifth 
Source
MSNBC.com, retrieved 2010-11-09.
Examples
table(govrace10$party1, govrace10$party2)
Election results for the 2010 U.S. House of Represenatives races
Description
Election results for the 2010 U.S. House of Represenatives races
Usage
houserace10
Format
A data frame with 435 observations on the following 24 variables.
- id
- Unique identifier for the race, which does not overlap with other 2010 races (see - govrace10and- senaterace10)
- state
- State name 
- abbr
- State name abbreviation 
- num
- District number for the state 
- name1
- Name of the winning candidate 
- perc1
- Percentage of vote for winning candidate (if more than one candidate) 
- party1
- Party of winning candidate 
- votes1
- Number of votes for winning candidate 
- name2
- Name of candidate with second most votes 
- perc2
- Percentage of vote for candidate who came in second 
- party2
- Party of candidate with second most votes 
- votes2
- Number of votes for candidate who came in second 
- name3
- Name of candidate with third most votes 
- perc3
- Percentage of vote for candidate who came in third 
- party3
- Party of candidate with third most votes 
- votes3
- Number of votes for candidate who came in third 
- name4
- Name of candidate with fourth most votes 
- perc4
- Percentage of vote for candidate who came in fourth 
- party4
- Party of candidate with fourth most votes 
- votes4
- Number of votes for candidate who came in fourth 
- name5
- Name of candidate with fifth most votes 
- perc5
- Percentage of vote for candidate who came in fifth 
- party5
- Party of candidate with fifth most votes 
- votes5
- Number of votes for candidate who came in fifth 
Details
This analysis in the Examples section was inspired by and is similar to that of Nate Silver's district-level analysis on the FiveThirtyEight blog in the New York Times: https://fivethirtyeight.com/features/2010-an-aligning-election/
Source
MSNBC.com, retrieved 2010-11-09.
Examples
hr <- table(houserace10[, c("abbr", "party1")])
nr <- apply(hr, 1, sum)
pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")]
hr <- hr[as.character(pr$state), ]
(fit <- glm(hr ~ pr$p_obama, family = binomial))
x1 <- pr$p_obama[match(houserace10$abbr, pr$state)]
y1 <- (houserace10$party1 == "Democrat") + 0
g <- glm(y1 ~ x1, family = binomial)
x <- pr$p_obama[pr$state != "DC"]
nr <- apply(hr, 1, sum)
plot(x, hr[, "Democrat"] / nr,
  pch = 19, cex = sqrt(nr), col = "#22558844",
  xlim = c(20, 80), ylim = c(0, 1),
  xlab = "Percent vote for Obama in 2008",
  ylab = "Probability of Democrat winning House seat"
)
X <- seq(0, 100, 0.1)
lo <- -5.6079 + 0.1009 * X
p <- exp(lo) / (1 + exp(lo))
lines(X, p)
abline(h = 0:1, lty = 2, col = "#888888")
Pierce County House Sales Data for 2020
Description
Real estate sales for Pierce County, WA in 2020.
Usage
pierce_county_house_sales
Format
A data frame with 16814 rows and 19 variables.
- sale_date
- Date the legal document (deed) was executed. 
- sale_price
- Dollar amount recorded for the sale. 
- house_square_feet
- Sum of the square feet for the building. 
- attic_finished_square_feet
- Finished living area in the attic. 
- basement_square_feet
- Total square footage of the basement.. 
- attached_garage_square_feet
- Total square footage of the attached or built in garage(s). 
- detached_garage_square_feet
- Total detached garage(s) square footage. 
- fireplaces
- Total count of single, double or PreFab stoves. 
- hvac_description
- Text description associated with the predominant heating source for the built-as structure i.e. Forced Air, Electric Baseboard, Steam, etc. . 
- exterior
- Predominant type of construction materials used for the exterior siding on Residential Buildings. 
- interior
- Predominant type of materials used on the interior walls. i.e. Sheetrock or Paneling. 
- stories
- Number of floors/building levels above grade. Stories do not include attic or basement areas. 
- roof_cover
- Material used for the roof. I.e. Composition Shingles, Wood Shake, Concrete Tile, etc. 
- year_built
- Year the building was built, as stated by the building permit or a historical record. 
- bedrooms
- Number of bedrooms listed for a residential property. 
- bathrooms
- Number of baths listed for a residential property. The number is listed as a decimal, i.e. 2.75 = two full and one three-quarter baths. A tub/sink/toilet combination (plus any additional fixtures) is considered 1.0 bath. A shower/sink/toilet combination (plus any additional fixtures) is 0.75 bath. A sink/toilet combination is .5 bath. 
- waterfront_type
- Describes the type of waterfront the property adjoins or has legal access to. 
- view_quality
- Assigned to reflect the market appeal of the overall view available from the dwelling or property. 
- utility_sewer
- Identifies if sewer/septic is installed, available or not available or if the property does not support an on site sewage disposal system. 
Source
Examples
library(dplyr)
library(lubridate)
# List house sales frequency and average price grouped by month
pierce_county_house_sales |>
  mutate(month_sale = month(sale_date)) |>
  group_by(month_sale) |>
  summarize(freq = n(), mean_price = mean(sale_price)) |>
  arrange(desc(freq))
# List house sales frequency and average price group by waterfront type
pierce_county_house_sales |>
  group_by(waterfront_type) |>
  summarize(freq = n(), mean_price = mean(sale_price)) |>
  arrange(desc(mean_price))
Population Age 2019 Data.
Description
State level data on population by age.
Usage
pop_age_2019
Format
A data frame with 2820 rows and 4 variables.
- state
- State as 2 letter abbreviation. 
- state_name
- State name. 
- age
- Age cohort for population. 
- population
- Population of age cohort. 
- state_total_population
- total estimated state population in 2019 
Source
Centers for Disease Control and Prevention
Examples
library(dplyr)
# List age population for each state with percent of total
pop_age_2019 |>
  group_by(state_name, age) |>
  mutate(percent = population / state_total_population * 100) |>
  select(state_name, age, population, percent)
pop_age_2019 |>
  select(state_name, state_total_population) |>
  distinct() |>
  arrange(desc(state_total_population))
Population Race 2019 Data.
Description
State level data on population by race.
Usage
pop_race_2019
Format
A data frame with 2820 rows and 4 variables.
- state
- State as 2 letter abbreviation. 
- state_name
- State name. 
- race
- race cohort for population. 
- hispanic
- indicates whether population is Hispanic or Latino 
- population
- Population of race cohort. 
- state_total_population
- total estimated state population in 2019 
Source
Centers for Disease Control and Prevention
Examples
library(dplyr)
# List race population for each state with percent of total
pop_race_2019 |>
  group_by(state_name, race, hispanic) |>
  mutate(percent = population / state_total_population * 100) |>
  select(state_name, race, hispanic, population, percent)
pop_race_2019 |>
  select(state_name, state_total_population) |>
  distinct() |>
  arrange(desc(state_total_population))
Presidential Power.
Description
Data from a Pew Research Center poll about Presidential power/control over gas prices.
Usage
prez_pwr
Format
A data frame with 365 rows and 3 variables.
- president
- Sitting President at time of the poll. 
- party
- Political party of the respondent with levels d(emocrat) and r(epublican). 
- has_pwr
- Respondent answer to the question: "Is the price of gasoline something the president can do alot about, or is that beyond the president's control?" 
Source
Pew Research Center, May 2006 & March 2012.
Examples
library(ggplot2)
ggplot(prez_pwr, aes(has_pwr, fill = party)) +
  geom_bar() +
  labs(
    title = "Is the price of gasoline something the president can do alot about?",
    x = "",
    y = "Number of respondents",
    fill = "Respondent Party"
  ) +
  facet_wrap(~president)
Election results for the 2008 U.S. Presidential race
Description
Election results for the 2008 U.S. Presidential race
Usage
prrace08
Format
A data frame with 51 observations on the following 7 variables.
- state
- State name abbreviation 
- state_full
- Full state name 
- n_obama
- Number of votes for Barack Obama 
- p_obama
- Proportion of votes for Barack Obama 
- n_mc_cain
- Number of votes for John McCain 
- p_mc_cain
- Proportion of votes for John McCain 
- el_votes
- Number of electoral votes for a state 
Details
In Nebraska, 4 electoral votes went to McCain and 1 to Obama. Otherwise the electoral votes were a winner-take-all.
Source
Presidential Election of 2008, Electoral and Popular Vote Summary, retrieved 2011-04-21.
Examples
# ===> Obtain 2010 US House Election Data <===#
hr <- table(houserace10[, c("abbr", "party1")])
nr <- apply(hr, 1, sum)
# ===> Obtain 2008 President Election Data <===#
pr <- prrace08[prrace08$state != "DC", c("state", "p_obama")]
hr <- hr[as.character(pr$state), ]
(fit <- glm(hr ~ pr$p_obama, family = binomial))
# ===> Visualizing Binomial outcomes <===#
x <- pr$p_obama[pr$state != "DC"]
nr <- apply(hr, 1, sum)
plot(x, hr[, "Democrat"] / nr,
  pch = 19, cex = sqrt(nr), col = "#22558844",
  xlim = c(20, 80), ylim = c(0, 1), xlab = "Percent vote for Obama in 2008",
  ylab = "Probability of Democrat winning House seat"
)
# ===> Logistic Regression <===#
x1 <- pr$p_obama[match(houserace10$abbr, pr$state)]
y1 <- (houserace10$party1 == "Democrat") + 0
g <- glm(y1 ~ x1, family = binomial)
X <- seq(0, 100, 0.1)
lo <- -5.6079 + 0.1009 * X
p <- exp(lo) / (1 + exp(lo))
lines(X, p)
abline(h = 0:1, lty = 2, col = "#888888")
Election results for the 2010 U.S. Senate races
Description
Election results for the 2010 U.S. Senate races
Usage
senaterace10
Format
A data frame with 38 observations on the following 23 variables.
- id
- Unique identifier for the race, which does not overlap with other 2010 races (see - govrace10and- houserace10)
- state
- State name 
- abbr
- State name abbreviation 
- name1
- Name of the winning candidate 
- perc1
- Percentage of vote for winning candidate (if more than one candidate) 
- party1
- Party of winning candidate 
- votes1
- Number of votes for winning candidate 
- name2
- Name of candidate with second most votes 
- perc2
- Percentage of vote for candidate who came in second 
- party2
- Party of candidate with second most votes 
- votes2
- Number of votes for candidate who came in second 
- name3
- Name of candidate with third most votes 
- perc3
- Percentage of vote for candidate who came in third 
- party3
- Party of candidate with third most votes 
- votes3
- Number of votes for candidate who came in third 
- name4
- Name of candidate with fourth most votes 
- perc4
- Percentage of vote for candidate who came in fourth 
- party4
- Party of candidate with fourth most votes 
- votes4
- Number of votes for candidate who came in fourth 
- name5
- Name of candidate with fifth most votes 
- perc5
- Percentage of vote for candidate who came in fifth 
- party5
- Party of candidate with fifth most votes 
- votes5
- Number of votes for candidate who came in fifth 
Source
MSNBC.com, retrieved 2010-11-09.
Examples
library(ggplot2)
ggplot(senaterace10, aes(x = perc1)) +
  geom_histogram(binwidth = 5) +
  labs(x = "Winning candidate vote percentage")
Convert state names to abbreviations
Description
Two utility functions. One converts state names to the state abbreviations, and the second does the opposite.
Usage
state2abbr(state)
Arguments
| state | A vector of state name, where there is a little fuzzy matching. | 
Value
Returns a vector of the same length with the corresponding state names or abbreviations.
Author(s)
David Diez
See Also
abbr2state, county, county_complete
Examples
state2abbr("Minnesota")
# Some spelling/capitalization errors okay
state2abbr("mINnesta")
State-level data
Description
Information about each state collected from both the official US Census website and from various other sources.
Usage
state_stats
Format
A data frame with 51 observations on the following 23 variables.
- state
- State name. 
- abbr
- State abbreviation (e.g. - "MN").
- fips
- FIPS code. 
- pop2010
- Population in 2010. 
- pop2000
- Population in 2000. 
- homeownership
- Home ownership rate. 
- multiunit
- Percent of living units that are in multi-unit structures. 
- income
- Average income per capita. 
- med_income
- Median household income. 
- poverty
- Poverty rate. 
- fed_spend
- Federal spending per capita. 
- land_area
- Land area. 
- smoke
- Percent of population that smokes. 
- murder
- Murders per 100,000 people. 
- robbery
- Robberies per 100,000. 
- agg_assault
- Aggravated assaults per 100,000. 
- larceny
- Larcenies per 100,000. 
- motor_theft
- Vehicle theft per 100,000. 
- soc_sec
- Percent of individuals collecting social security. 
- nuclear
- Percent of power coming from nuclear sources. 
- coal
- Percent of power coming from coal sources. 
- tr_deaths
- Traffic deaths per 100,000. 
- tr_deaths_no_alc
- Traffic deaths per 100,000 where alcohol was not a factor. 
- unempl
- Unemployment rate (February 2012, preliminary). 
Source
Census Quick Facts (no longer available as of 2020),
InfoChimps (also no longer available as of 2020),
National Highway Traffic Safety Administration
(tr_deaths, tr_deaths_no_alc),
Bureau of Labor Statistics
(unempl).
Examples
library(ggplot2)
library(dplyr)
library(maps)
states_selected <- state_stats |>
  mutate(region = tolower(state)) |>
  select(region, unempl, murder, nuclear)
states_map <- map_data("state") |>
  inner_join(states_selected)
# Unemployment map
ggplot(states_map, aes(map_id = region)) +
  geom_map(aes(fill = unempl), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat) +
  scale_fill_viridis_c() +
  labs(x = "", y = "", fill = "Unemployment\n(%)")
# Murder rate map
states_map |>
  filter(region != "district of columbia") |>
  ggplot(aes(map_id = region)) +
  geom_map(aes(fill = murder), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat) +
  scale_fill_viridis_c() +
  labs(x = "", y = "", fill = "Murders\nper 100k")
# Nuclear energy map
ggplot(states_map, aes(map_id = region)) +
  geom_map(aes(fill = nuclear), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat) +
  scale_fill_viridis_c() +
  labs(x = "", y = "", fill = "Nuclear energy\n(%)")
Summary of many state-level variables
Description
Census data for the 50 states plus DC and Puerto Rico.
Usage
urban_owner
Format
A data frame with 52 observations on the following 28 variables.
- state
- State 
- total_housing_units_2000
- Total housing units available in 2000. 
- total_housing_units_2010
- Total housing units available in 2010. 
- pct_vacant
- a numeric vector 
- occupied
- Occupied. 
- pct_owner_occupied
- a numeric vector 
- pop_st
- a numeric vector 
- area_st
- a numeric vector 
- pop_urban
- a numeric vector 
- poppct_urban
- a numeric vector 
- area_urban
- a numeric vector 
- areapct_urban
- a numeric vector 
- popden_urban
- a numeric vector 
- pop_ua
- a numeric vector 
- poppct_urban.1
- a numeric vector 
- area_ua
- a numeric vector 
- areapct_ua
- a numeric vector 
- popden_ua
- a numeric vector 
- pop_uc
- a numeric vector 
- poppct_uc
- a numeric vector 
- area_uc
- a numeric vector 
- areapct_uc
- a numeric vector 
- popden_uc
- a numeric vector 
- pop_rural
- a numeric vector 
- poppct_rural
- a numeric vector 
- area_rural
- a numeric vector 
- areapct_rural
- a numeric vector 
- popden_rural
- a numeric vector 
Source
US Census.
Examples
urban_owner
State summary info
Description
Census info for the 50 US states plus DC.
Usage
urban_rural_pop
Format
A data frame with 51 observations on the following 5 variables.
- state
- US state. 
- urban_in
- a numeric vector 
- urban_out
- a numeric vector 
- rural_farm
- a numeric vector 
- rural_nonfarm
- a numeric vector 
Source
US census.
Examples
urban_rural_pop
US Crime Rates
Description
National data on the number of crimes committed in the US between 1960 and 2019.
Usage
us_crime_rates
Format
A data frame with 60 rows and 12 variables.
- year
- Year data was collected. 
- population
- Population of the United States the year data was collected. 
- total
- Total number of violent and property crimes committed. 
- violent
- Total number of violent crimes committed. 
- property
- Total number of property crimes committed. 
- murder
- Number of murders committed. Counted in violent total. 
- forcible_rape
- Number of forcible rapes committed. Counted in violent total. 
- robbery
- Number of robberies committed. Counted in violent total. 
- aggravated_assault
- Number of aggravated assaults committed. Counted in violent total. 
- burglary
- Number of burglaries committed. Counted in property total. 
- larceny_theft
- Number of larcency thefts committed. Counted in property total. 
- vehicle_theft
- Number of vehicle thefts committed. Counted in property total. 
Source
Examples
library(ggplot2)
ggplot(us_crime_rates, aes(x = population, y = total)) +
  geom_point() +
  labs(
    title = "Crimes V Population",
    x = "Population",
    y = "Total Number of Crimes"
  )
ggplot(us_crime_rates, aes(x = murder)) +
  geom_boxplot() +
  labs(
    title = "US Murders",
    subtitle = "1960 - 2019",
    x = "Number of Murders"
  ) +
  theme(axis.text.y = element_blank())
US Temperature Data
Description
A representative set of monitoring locations were taken from NOAA data that had both years of interest (1950 and 2022). The information was collected so as to spread the measurements across the continental United States. Daily high and low temperatures are given for each of 24 weather stations.
Usage
us_temp
Format
A data frame with 17250 observations on the following 9 variables.
- station
- Station ID, measurements from 24 stations. 
- name
- Name of the station. 
- latitude
- Latitude of the station. 
- longitude
- Longitude of the station. 
- elevation
- Elevation of the station. 
- date
- Date of observed temperature. 
- tmax
- High temp for the observed day. 
- tmin
- Low temp for the observed day. 
- year
- Factor variable for year, levels: - 1950and- 2022.
Details
Please keep in mind that these are two annual snapshots from a few dozen arbitrarily selected weather stations. A complete analysis would consider more than two years of data and a more precise random sample uniformly distributed across the United States.
Source
https://www.ncei.noaa.gov/cdo-web/, retrieved 2023-09-23.
Examples
library(ggplot2)
library(maps)
library(sf)
library(dplyr)
# Summarize temperature by station and year for plotting
summarized_temp <- us_temp |>
  group_by(station, year, latitude, longitude) |>
  summarize(tmax_med = median(tmax, na.rm = TRUE), .groups = "drop") |>
  mutate(plot_shift = ifelse(year == "1950", 0, 2))
# Make a map of the US as a baseline
usa <- st_as_sf(maps::map("state", fill = TRUE, plot = FALSE))
# Layer the US map with summarized temperatures
ggplot(data = usa) +
  geom_sf() +
  geom_point(
    data = summarized_temp,
    aes(x = longitude + plot_shift, y = latitude, fill = tmax_med, shape = year),
    color = "black", size = 3
  ) +
  scale_fill_gradient(high = "red", low = "yellow") +
  scale_shape_manual(values = c(21, 24)) +
  labs(
    title = "Median high temperature, 1950 and 2022",
    x = "Longitude",
    y = "Latitude",
    fill = "Median\nhigh temp",
    shape = "Year"
  )
American Time Survey 2009 - 2019
Description
Average Time Spent on Activities by Americans
Usage
us_time_survey
Format
A data frame with 11 rows and 8 variables.
- year
- Year data collected 
- household_activities
- Average hours per day spent on household activities - travel included 
- eating_and_drinking
- Average hours per day spent eating and drinking including travel. 
- leisure_and_sports
- Average hours per day spent on leisure and sports - including travel. 
- sleeping
- Average Hours spent sleeping. 
- caring_children
- Average hours spent per day caring for and helping children under 18 years of age. 
- working_employed
- Average hours spent working for those employed. (15 years and older) 
- working_employed_days_worked
- Average hours per day spent working on days worked (15 years and older) 
Source
Examples
library(ggplot2)
us_time_survey$year <- as.factor(us_time_survey$year)
ggplot(us_time_survey, aes(year, sleeping)) +
  geom_point(alpha = 0.3) +
  labs(
    x = "Year",
    y = "Average hours spent Sleeping",
    title = "US Average hours spent sleeping, 2009 - 2019"
  )
Predicting who would vote for NSA Mass Surveillance
Description
In 2013, the House of Representatives voted to not stop the National Security Agency's (NSA's) mass surveillance of phone behaviors. We look at two predictors for how a representative voted: their party and how much money they have received from the private defense industry.
Usage
vote_nsa
Format
A data frame with 434 observations on the following 5 variables.
- name
- Name of the Congressional representative. 
- party
- The party of the representative: - Dfor Democrat and- Rfor Republican.
- state
- State for the representative. 
- money
- Money received from the defense industry for their campaigns. 
- phone_spy_vote
- Voting to rein in the phone dragnet or continue allowing mass surveillance. 
Source
MapLight. Available at http://s3.documentcloud.org/documents/741074/amash-amendment-vote-maplight.pdf.
References
Kravets, D., 2020. Lawmakers Who Upheld NSA Phone Spying Received Double The Defense Industry Cash. WIRED. Available at https://www.wired.com/2013/07/money-nsa-vote/.
Examples
table(vote_nsa$party, vote_nsa$phone_spy_vote)
boxplot(vote_nsa$money / 1000 ~ vote_nsa$phone_spy_vote,
  ylab = "$1000s Received from Defense Industry"
)
US Voter Turnout Data.
Description
State-level data on federal elections held in November between 1980 and 2014.
Usage
voter_count
Format
A data frame with 936 rows and 7 variables.
- year
- Year election was held. 
- region
- Specifies if data is state or national total. 
- voting_eligible_population
- Number of citizens eligible to vote; does not count felons. 
- total_ballots_counted
- Number of ballots cast. 
- highest_office
- Number of ballots that contained a vote for the highest office of that election. 
- percent_total_ballots_counted
- Overall voter turnout percentage. 
- percent_highest_office
- Highest office voter turnout percentage. 
Source
United States Election Project
Examples
library(ggplot2)
ggplot(voter_count, aes(x = percent_highest_office, y = percent_total_ballots_counted)) +
  geom_point() +
  labs(
    title = "Total Ballots V Highest Office",
    x = "Highest Office",
    y = "Total Ballots"
  )