Function Documentation

Self-Sufficiency Data

Code to define and interact with tables for SSS data.

sss.sss_table.check_extra_columns(df: DataFrame)

Add three boolean columns to dataframe to indicate special cost breakdowns.

Specifically, this function checks if the sss dataframe contains additional cost breakdown for arpa, health care, and broadband_&_cell_phone. If these additional breakdowns are present in the dataframe, we fill each of the breakdown dataframe. If these breakdowns are not present, the new dataframes will be empty. We then update the boolean column to represent whether these breakdowns occur in other helper tables.

Parameters:

df (pandas.dataframe) – this is the dataframe

Returns:

df (pandas.datafranme) – the returned dataframe has additional columns these columns indicate whether there are any secondary tables
arpa (pandas.dataframe) – this dataframe contains the additional breakdown of arpa
health_care (pandas.dataframe) – this dataframe contains the additional breakdown of health care
miscellaneous (pandas.dataframe) – this dataframe contains the additional breakdown of miscellaneous

sss.sss_table.data_folder_to_database(data_path, testing=False)

Read path of the data, add data to SQL table.

Here, we are using a path that holds the data to loop through the SSS data. SSS data is found here: https://selfsufficiencystandard.org/state-data/. We create the pandas.dataframe from the file being read. Then we call a previous function to add boolean columns. We then create the SQL table.

Parameters:

data_folder (str) – path name of the folder or file that we want to read into the database
testing (bool) – If true, use the testing database rather than the default database

sss.sss_table.prepare_for_database(df)

Prepare pandas dataframe for database.

Specifically, this function takes in the dataframe containing data for the primary table. The dataframe is prepared by updating certain columns, adding a new weighted children column, and by dropping all duplicates. This functions is meant to update the dataframe so there are no issues transferring data into the database.

Parameters:: df (pandas.dataframe) – this is the sss dataframe
Returns:: df – the returned dataframe does not contain duplicates
Return type:: pandas.datafranme

sss.sss_table.read_file(file: str)

Read SSS data file into a Pandas dataframe.

Read the sheet of interest (family_type) and standardize the column names.

Parameters:

file (str) – file is the path of the file you want to add to the database

Returns:

df (pandas.datafranme) – the returned dataframe has columns similar to that of the primary table
file (str) – file name that was read into the dataframe

sss.sss_table.remove_state_year(state, year, testing=False)

Remove rows for a specific state and year.

Parameters:

state (str) – The state to remove from live database.
year (int) – The year to remove from live database.
testing (bool) – If true, use the testing database rather than the default database

sss.sss_table.update_columns(data_path, columns=None, testing=False)

Update the specified columns using data in the data_path.

Read in the data from the data_path and update only the specified columns. Assumes that the rows for this data already exist in the database.

Parameters:

data_path (str) – path name of the folder or file that we want to read into the database
columns (list of str) – List of column names to update
testing (bool) – If true, use the testing database rather than the default database

Geographical Identifiers

Code that defines the GeoID table.

sss.geoid.geo_identifier_creator(county_table, cpi_table)

Create geoidentifier info for database.

This function needs the following files: SSScounty-place-list_20220720.xlsx (county_table) StateAbbreviation_Regions_07192022_AKu.xlsx (cpi)

Parameters:

county_file (string) – The path of the county table including FIPS code
cpi_file (string) – The path of the cpi_table including cpi region

Returns:

df_combine – Return the merged table from county_table and cpi_table

Return type:

dataframe

Raises:

ValueError – If columns are not named properly in the input files.

sss.geoid.geoid_to_db(county_table, cpi_table, testing=False)

Read city data into data frame and insert it to database.

Parameters:

county_file (str) – The path of the county table including FIPS code
cpi_file (str) – The path of the cpi_table including cpi region
testing (bool) – If true, use the testing database rather than the default database

PUMA

Create PUMA table.

sss.puma.puma_crosswalk(path, year, nyc_wa_path=None)

Create a puma crosswalk with county and subcounty.

If the puma is in New England Area, use sub county(level 797) as sss_place(place). Otherwise the county is assigned to sss_place(place). For the states of Washington and New York City, some puma code area names are replaced by more meaningful names.

Parameters:

path (str) – path name of txt puma file
year (int) – year of the puma data collected
nyc_wa_path (str) – excel file path that create special areas in NYC and WA

Returns:

A crosswalked dataframe with variables for the database including county_fips, subcounty_fips, puma_code, population, and population weight.

Return type:

pandas.datafranme

Raises:

ValueError – If the nyc_wa_path is None and the path includes NY or WA data.

sss.puma.puma_to_db(path, year, nyc_wa_path=None, testing=False)

Put PUMA data into database.

Requires the 2021 PUMA .txt files found from https://www2.census.gov/geo/docs/reference/puma/ and the crosspath file “SSSplaces_NY&WA_PUMAcode.xlsx”.

Parameters:

path (str) – path name of the PUMA .txt file or folder containing puma .txt files to read into the database
year (int) – year of the puma data collected
nyc_wa_path (str) – excel file path of nyc and washington replacement name list
testing (bool) – If true, use the testing database rather than the default database

sss.puma.read_puma(path, year)

Read puma data as data frame.

The puma data stores by states in txt format.

Parameters:

path (str) – path name of txt puma file
year (int) – year of the puma data collected

Returns:

A dataframe built just from the puma text file with the different levels of fips, population size and other details.

Return type:

pandas.datafranme

City

Code to define and interact with the City table.

sss.city.add_city(path, year)

Read city data into data frame and prepare for database table.

Parameters:

path (str) – path name of city excel file
year (int) – year of the population data collected

Returns:

the returned dataframe has population size by city

Return type:

pandas.datafranme

Raises:

ValueError – If the state cannot be extracted from SSS_city.

sss.city.city_to_db(data_path, year, testing=False)

Read city data as data frame and insert it to database.

The data file needed is called “2020_PopulationDatabyCity_20220804_Ama.xlsx”

Parameters:

path (str) – path name of city excel file
year (int) – year of the population data collected
testing (bool) – If true, use the testing database rather than the default database

Report

Create the report table and functions to interact with table.

sss.report.add_one_entry_reportdb(year, state, analysis_type, cpi_month, cpi_year, update_date, update_person, testing=False)

Insert one record into report table.

Parameters:

year (int) – year of report
state (str) – name of state
analysis_type (str) – analysis type of sss, e.g. full, partial
cpi_month (str) – cpi month of the report, like May
cpi_year (int) – cpi year of the report
update_date (date) – update date, the format is date(2021,6,22)
update_person (str) – who update the report
testing (bool) – If true, use the testing database rather than the default database

sss.report.add_report(path)

Read report data into data frame and perprare for database report table.

Parameters:: path (str) – path name of report excel file
Returns:: the returned dataframe has report record
Return type:: pandas.datafranme

sss.report.delete_one_entry_reportdb(year, state, analysis_type, testing=False)

Delete one report entry.

Parameters:

year (int) – year of report
state (str) – name of state
analysis_type (str) – analysis type of sss, e.g. full, partial
testing (bool) – If true, use the testing database rather than the default database

sss.report.report_to_db(path, testing=False)

Insert report file to the report table.

The report file is named “Year_Type_SSS_CPI month year_20220715_DBu.xlsx”

Parameters:

path (str) – path name of report excel file
testing (bool) – If true, use the testing database rather than the default database