import os
os.environ["CENSUS_API_KEY"] = "your_census_api_key_here"Tutorial: Accessing and Exploring the Data
This tutorial walks through fetching, cleaning, and doing initial exploration of the Utah housing affordability dataset.
Installation
Install the utah-housing package and set your Census API key before running any code cells.
pip install utah-housingSet your Census API key (obtain a free key at https://api.census.gov/data/key_signup.html):
Reading in the Data
import pandas as pd
from utah_housing import fetch_all_years, OUTCOME, BASE_PREDICTORS, COMPLEX_PREDICTORS
# Fetch ACS 5-year estimates for Utah census tracts, 2009–2023
df = fetch_all_years(years=range(2009, 2024))
df.head()# Save to CSV so you can reload without hitting the API again
df.to_csv("data/utah_housing.csv", index=False)# Reload from CSV (faster than re-fetching)
df = pd.read_csv("data/utah_housing.csv")
df.head()Initial Exploration
print("Shape:", df.shape)
print("\nOutcome variable:", OUTCOME)
print("Base predictors:", BASE_PREDICTORS)
print("Complex predictors:", COMPLEX_PREDICTORS)# Check for missing values in key columns
analysis_cols = ["year", "GEOID"] + [OUTCOME] + COMPLEX_PREDICTORS
df[analysis_cols].isnull().sum()Cleaning the Data (not necessary for analysis with package)
# Drop rows missing the outcome variable
df_clean = df.dropna(subset=[OUTCOME]).copy()
# Extract county name from the NAME field for easier filtering
df_clean["county_name"] = df_clean["NAME"].str.extract(r",\s*(.+?)\s+County", expand=False)
print(f"Rows after cleaning: {len(df_clean):,}")
df_clean.head()Subsetting by County or Year
# Filter to a specific county
salt_lake = df_clean[df_clean["county_name"] == "Salt Lake"]
print(salt_lake.shape)
# Filter to a specific year
df_2023 = df_clean[df_clean["year"] == 2023]
print(df_2023.shape)Basic Summary Statistics
summary_cols = [OUTCOME] + COMPLEX_PREDICTORS
df_clean[summary_cols].describe().round(2)The data is now ready for EDA and modeling (see the Technical Report).