This patch includes a critical fix to resolve rate limit issues downloading data from the BLS. It implements a BLS_USER_AGENT environment variable which is called to populate the file download requests to BLS. Users encountering a 403 error on most requests will need to set this environment variable to ensure smooth downloads. Additional documentation and warning messages will be implemented in a future patch.
explore_cps_characteristics() has been upgraded with a static argument. When set to TRUE it will use a built-in data lookup table to determine available CPS subset codes. Because the BLS mapping files contain codes not in use in the data, this lookup table was built from actual CPS data and will avoid filtering to CPS subsets that do not exist. If this argument is not set or is set to FALSE, this function will continue to infer potential mapping from the live BLS data.
national_cps_availability is a new data table included in the package containing a list and description of filters by which a user may subset the CPS data. When a particular data element is specified it provides a list of data subsetting codes, their description, and other available codes by which this data may be subset further.
get_cps_subset() - these allow users to pull slices of the full CPS database into a more readable tableexplore_cps_characteristics() and explore_cps_series() provide console-based access to the CPS data to guide usage of get_cps_subset()This release reflows significant sections of code to improve readability, and also consolidates the creation of headers for sending data requests to the BLS, laying the groundwork to improvement to these headers in the future so they can be more responsive to a user's actual machine.
The package now includes explicit memory cleanup and notifications about large data files and expected memory usage.
When working with large files, the data file pre-cleaning now samples 10,000 rows instead of the full file to improve performance.
Help documentation for the new functions has been added, as have help articles to guide users in interacting with the CPS data.
A large test suite has been added to the package using testthat for the major functions (both user-facing and background helpers) to identify bugs.
This patch updates BLSloadR to better address 403 and other download errors with to updates.
fread_bls()Because some BLS series update only infrequently, using a local file cache reduces demand for regularly re-downloading data from the BLS.
BLS_CACHE_DIR can be set to a file path to use as the BLSloadR cache folder. If this is not set, but caching is selected, the system will default to the path given by tools::R_user_dir("BLSloadR", which = "cache")USE_BLS_CACHE can be used to allow functions to default to using the cache, without needing to manually set an argument in each call.cache=TRUE argument in your function call or set the USE_BLS_CACHE environment variable to "TRUE"In addition to implementing a local file cache, some improvements have been made to the operation of fread_bls() behind the scenes to more efficiently check BLS files for issues like phantom columns. It is becoming evident that with the implementation of a local cache for files this is now the slowest part of the process, so future enhancements may include options to skip some of this processing for files where the BLS file structure is already known and verified.
Added fast_read option in get_oews() to improve function performance. This option pasrses the series_id within the data file instead of reading in the full series file in order to avoid redundant downloads.
Added vignette documenting use of file cache.
Added article describing usage of get_qcew()
load_bls_dataset():
which_data argument to this function, which allows use of this function in a pipeline without needing manual entry in the console for any BLS datasets which have exactly 1 series file and at most 1 aspect file.get_salt()get_ces() - Major performance improvements with new filtering options:
states parameter: Download data for specific states only (90%+ faster than full download)industry_filter parameter: Focus on specific industries (retail_trade, manufacturing, etc.)current_year_only parameter: Get only recent data (2006-present) instead of complete historyget_national_ces() - New specialized dataset options for optimal performance:
dataset_filter parameter with 4 options: all_data, current_seasonally_adjusted, real_earnings_all_employees, real_earnings_productionget_qcew() - New function designed to access the Quarterly Census of Employment and Wages(QCEW):
list_ces_states() - List available states for filteringlist_ces_industries() - List available industry filters with descriptionsshow_ces_options() - Comprehensive usage guide for CES optionslist_national_ces_options() - List national dataset filter optionsshow_national_ces_options() - Usage guide for national CES datasetsarea_lookup data table has details on QCEW area codes to pre-filter data requests.ind_lookup data table has details on NAICS codes used in QCEW files.download_bls_files() (downloads[['key']] vs downloads$key)suppress_warnings instead of mixed naming)show_warnings changed to suppress_warnings in get_national_ces() for consistency