--- title: "Working with Current Population Survey (CPS) Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with Current Population Survey (CPS) Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ```{r setup} library(BLSloadR) library(dplyr) library(ggplot2) ``` ## Introduction The Bureau of Labor Statistics' Current Population Survey (LN series) contains comprehensive labor force statistics including employment, unemployment, and labor force participation data broken down by various demographic and economic characteristics. BLSloadR provides three complementary functions to make working with CPS data easier: 1. **`explore_cps_characteristics()`** - Discover available demographic/economic filters 2. **`explore_cps_series()`** - Search for specific series by keywords or characteristics 3. **`get_cps_subset()`** - Retrieve the actual time series data This vignette demonstrates how to use these functions together to efficiently discover and analyze CPS data. ## Basic Workflow ### Step 1: Explore Available Characteristics Before filtering data, you need to know what characteristics are available and their valid codes. ```{r} # List all available characteristics all_characteristics <- explore_cps_characteristics() head(all_characteristics, 10) ``` This returns a data frame showing: - `characteristic`: The name of the characteristic (e.g., "ages", "sexs", "education") - `code_column`: How it appears in filters (e.g., "ages_code", "sexs_code") - `description`: What the characteristic represents ### Step 2: Examine Specific Characteristic Codes Once you identify a characteristic of interest, explore its valid codes: ```{r} # See all valid sex/gender codes sex_codes <- explore_cps_characteristics("sexs") sex_codes # sexs_code sexs_text # 1 0 Both Sexes # 2 1 Men # 3 2 Women # See age group codes age_codes <- explore_cps_characteristics("ages") head(age_codes, 10) ``` ### Step 3: Search for Relevant Series Use keywords and filters to find the specific series you need: ```{r} # Simple keyword search unemployment_series <- explore_cps_series( search = "unemployment rate", max_results = 10 ) # Filter by demographics women_series <- explore_cps_series( search = "unemployment rate", characteristics = list( sexs_code = "2", # Women ages_code = "00" # 16 years and over ), seasonal = "S", # Seasonally adjusted max_results = 5 ) women_series[, c("series_id", "series_title")] ``` ### Step 4: Retrieve the Data Once you've identified the series ID(s) you need, retrieve the data: ```{r} # Get data for a specific series data <- get_cps_subset( series_ids = "LNS14000002", # Unemployment rate - Women simplify_table = TRUE, cache = TRUE ) # View the data head(data$data) ``` ## Practical Examples ### Example 1: Comparing Unemployment by Sex ```{r} # Step 1: Find unemployment rate series by sex unemployment_by_sex <- explore_cps_series( search = "unemployment rate", characteristics = list(ages_code = "00"), seasonal = "S", max_results = 10 ) # Filter to just the main series for each sex main_series <- unemployment_by_sex |> filter(grepl("^(Seas) Unemployment Rate - (Men|Women)$", series_title)) # Step 2: Get the data unemp_data <- get_cps_subset( series_ids = main_series$series_id, simplify_table = TRUE, cache = TRUE ) # Step 3: Analyze recent trends recent_data <- unemp_data$data |> filter(year >= "2020") |> select(date, value, series_title, sexs_text) # Plot comparison ggplot(recent_data, aes(x = date, y = value, color = sexs_text)) + geom_line(size = 1) + labs( title = "Unemployment Rate by Sex (2020-Present)", x = "Date", y = "Unemployment Rate (%)", color = "Sex" ) + theme_minimal() ``` ### Example 2: Education and Unemployment ```{r} # Step 1: Explore education codes education_codes <- explore_cps_characteristics("education") # Focus on key education levels key_education <- education_codes |> filter(education_code %in% c("11", "19", "40")) # Less than HS, HS grad, Bachelor's+ # Step 2: Search for unemployment series by education edu_series <- explore_cps_series( search = "unemployment rate", characteristics = list( education_code = c("11", "19", "40"), ages_code = "00" ), seasonal = "S", max_results = 20 ) # Step 3: Get the data edu_unemployment <- get_cps_subset( series_ids = edu_series$series_id, simplify_table = TRUE, cache = TRUE ) # Step 4: Compare rates latest_rates <- edu_unemployment$data |> filter(year == "2025") |> group_by(education_text) |> summarize(avg_rate = mean(value, na.rm = TRUE)) |> arrange(avg_rate) latest_rates ``` ### Example 3: Labor Force Participation Trends ```{r} # Find labor force participation rate for women aged 25-54 lfpr_series <- explore_cps_series( search = "labor force participation rate", characteristics = list( sexs_code = "2", # Women ages_code = "33" # 25-54 years ), seasonal = "S", max_results = 5 ) # Get historical data lfpr_data <- get_cps_subset( series_ids = lfpr_series$series_id[1], simplify_table = TRUE, cache = TRUE ) # Analyze long-term trend ggplot(lfpr_data$data, aes(x = date, y = value)) + geom_line(color = "steelblue", size = 1) + geom_smooth(method = "loess", se = FALSE, color = "red", linetype = "dashed") + labs( title = "Labor Force Participation Rate: Women Aged 25-54", subtitle = "Seasonally Adjusted", x = "Year", y = "Participation Rate (%)" ) + theme_minimal() ``` ### Example 4: Demographic Deep Dive Analyze unemployment across multiple demographic dimensions: ```{r} # Explore race codes race_codes <- explore_cps_characteristics("race") race_codes # Find unemployment data by race for young adults race_unemployment <- explore_cps_series( search = "unemployment rate", characteristics = list( ages_code = "20", # 20-24 years race_code = c("01", "03", "04") # White, Black, Asian ), seasonal = "S", max_results = 20 ) # Get the data race_data <- get_cps_subset( series_ids = race_unemployment$series_id, simplify_table = TRUE, cache = TRUE ) # Compare recent rates race_data$data |> filter(year >= "2023") |> group_by(race_text) |> summarize( avg_rate = mean(value, na.rm = TRUE), min_rate = min(value, na.rm = TRUE), max_rate = max(value, na.rm = TRUE) ) ``` ## Advanced Tips ### Combining Multiple Filters You can combine search terms with multiple characteristic filters: ```{r} # Find unemployment data for Hispanic women with some college specific_series <- explore_cps_series( search = c("unemployment", "rate"), characteristics = list( sexs_code = "2", orig_code = "04", # Hispanic or Latino origin education_code = "20" # Some college or associate degree ), seasonal = "S" ) ``` ### Efficient Caching Use a persistent cache directory to avoid re-downloading data: ```{r} # Set up a permanent cache location cache_location <- "C:/BLS_data_cache" # All three functions support caching characteristics <- explore_cps_characteristics("ages", cache_dir = cache_location) series <- explore_cps_series(search = "unemployment", cache_dir = cache_location) data <- get_cps_subset(series_ids = series$series_id[1], cache_dir = cache_location) ``` ### Working with Multiple Series Retrieve data for many series at once: ```{r} # Get data for multiple related series all_age_groups <- explore_cps_series( search = "unemployment rate", characteristics = list(sexs_code = "0"), # Both sexes seasonal = "S", max_results = 50 ) # Filter to specific age breakdowns age_series <- all_age_groups |> filter(grepl("yrs", series_title)) # Get all data at once multi_series_data <- get_cps_subset( series_ids = age_series$series_id, simplify_table = TRUE, cache = TRUE ) ``` ## Understanding the Data Structure The `get_cps_subset()` function returns a `bls_data_collection` object with several components: ```{r} # Get sample data sample_data <- get_cps_subset(series_ids = "LNS14000000") # Access the data str(sample_data) # Main data table head(sample_data$data) # Download diagnostics sample_data$download_diagnostics # Processing summary sample_data$summary ``` ## Best Practices 1. **Start broad, then narrow**: Use `explore_cps_characteristics()` first to understand what's available 2. **Use keyword search**: The `search` parameter in `explore_cps_series()` is very flexible - try different terms 3. **Enable caching**: Set `cache = TRUE` (the default) to speed up repeated queries 4. **Check data ranges**: Different series have different start/end dates - use `begin_year` and `end_year` columns 5. **Simplify when possible**: Use `simplify_table = TRUE` to get clean, analysis-ready data with proper date columns 6. **Review series titles**: Always check the `series_title` to ensure you're getting exactly what you want ## Additional Resources - [BLS CPS Overview](https://www.bls.gov/cps/) - [BLS Labor Force Statistics](https://www.bls.gov/cps/tables.htm) - For other BLS datasets, see `vignette("BLSloadR-intro")` ## Conclusion The CPS data discovery functions in BLSloadR make it easy to: - Discover what demographic breakdowns are available - Search for specific series without knowing exact IDs - Retrieve clean, analysis-ready data By combining `explore_cps_characteristics()`, `explore_cps_series()`, and `get_cps_subset()`, you can efficiently navigate the complex CPS dataset and focus on your analysis.