Function Documentation

Self-Sufficiency Data

Code to define and interact with tables for SSS data.

sss.sss_table.check_extra_columns(df: DataFrame)

Add three boolean columns to dataframe to indicate special cost breakdowns.

Specifically, this function checks if the sss dataframe contains additional cost breakdown for arpa, health care, and broadband_&_cell_phone. If these additional breakdowns are present in the dataframe, we fill each of the breakdown dataframe. If these breakdowns are not present, the new dataframes will be empty. We then update the boolean column to represent whether these breakdowns occur in other helper tables.

Parameters:

df (pandas.dataframe) – this is the dataframe

Returns:

  • df (pandas.datafranme) – the returned dataframe has additional columns these columns indicate whether there are any secondary tables

  • arpa (pandas.dataframe) – this dataframe contains the additional breakdown of arpa

  • health_care (pandas.dataframe) – this dataframe contains the additional breakdown of health care

  • miscellaneous (pandas.dataframe) – this dataframe contains the additional breakdown of miscellaneous

sss.sss_table.data_folder_to_database(data_path, testing=False)

Read path of the data, add data to SQL table.

Here, we are using a path that holds the data to loop through the SSS data. SSS data is found here: https://selfsufficiencystandard.org/state-data/. We create the pandas.dataframe from the file being read. Then we call a previous function to add boolean columns. We then create the SQL table.

Parameters:
  • data_folder (str) – path name of the folder or file that we want to read into the database

  • testing (bool) – If true, use the testing database rather than the default database

sss.sss_table.prepare_for_database(df)

Prepare pandas dataframe for database.

Specifically, this function takes in the dataframe containing data for the primary table. The dataframe is prepared by updating certain columns, adding a new weighted children column, and by dropping all duplicates. This functions is meant to update the dataframe so there are no issues transferring data into the database.

Parameters:

df (pandas.dataframe) – this is the sss dataframe

Returns:

df – the returned dataframe does not contain duplicates

Return type:

pandas.datafranme

sss.sss_table.read_file(file: str)

Read SSS data file into a Pandas dataframe.

Read the sheet of interest (family_type) and standardize the column names.

Parameters:

file (str) – file is the path of the file you want to add to the database

Returns:

  • df (pandas.datafranme) – the returned dataframe has columns similar to that of the primary table

  • file (str) – file name that was read into the dataframe

sss.sss_table.remove_state_year(state, year, testing=False)

Remove rows for a specific state and year.

Parameters:
  • state (str) – The state to remove from live database.

  • year (int) – The year to remove from live database.

  • testing (bool) – If true, use the testing database rather than the default database

sss.sss_table.update_columns(data_path, columns=None, testing=False)

Update the specified columns using data in the data_path.

Read in the data from the data_path and update only the specified columns. Assumes that the rows for this data already exist in the database.

Parameters:
  • data_path (str) – path name of the folder or file that we want to read into the database

  • columns (list of str) – List of column names to update

  • testing (bool) – If true, use the testing database rather than the default database

Geographical Identifiers

Code that defines the GeoID table.

sss.geoid.geo_identifier_creator(county_table, cpi_table)

Create geoidentifier info for database.

This function needs the following files: SSScounty-place-list_20220720.xlsx (county_table) StateAbbreviation_Regions_07192022_AKu.xlsx (cpi)

Parameters:
  • county_file (string) – The path of the county table including FIPS code

  • cpi_file (string) – The path of the cpi_table including cpi region

Returns:

df_combine – Return the merged table from county_table and cpi_table

Return type:

dataframe

Raises:

ValueError – If columns are not named properly in the input files.

sss.geoid.geoid_to_db(county_table, cpi_table, testing=False)

Read city data into data frame and insert it to database.

Parameters:
  • county_file (str) – The path of the county table including FIPS code

  • cpi_file (str) – The path of the cpi_table including cpi region

  • testing (bool) – If true, use the testing database rather than the default database

PUMA

Create PUMA table.

sss.puma.puma_crosswalk(path, year, nyc_wa_path=None)

Create a puma crosswalk with county and subcounty.

If the puma is in New England Area, use sub county(level 797) as sss_place(place). Otherwise the county is assigned to sss_place(place). For the states of Washington and New York City, some puma code area names are replaced by more meaningful names.

Parameters:
  • path (str) – path name of txt puma file

  • year (int) – year of the puma data collected

  • nyc_wa_path (str) – excel file path that create special areas in NYC and WA

Returns:

A crosswalked dataframe with variables for the database including county_fips, subcounty_fips, puma_code, population, and population weight.

Return type:

pandas.datafranme

Raises:

ValueError – If the nyc_wa_path is None and the path includes NY or WA data.

sss.puma.puma_to_db(path, year, nyc_wa_path=None, testing=False)

Put PUMA data into database.

Requires the 2021 PUMA .txt files found from https://www2.census.gov/geo/docs/reference/puma/ and the crosspath file “SSSplaces_NY&WA_PUMAcode.xlsx”.

Parameters:
  • path (str) – path name of the PUMA .txt file or folder containing puma .txt files to read into the database

  • year (int) – year of the puma data collected

  • nyc_wa_path (str) – excel file path of nyc and washington replacement name list

  • testing (bool) – If true, use the testing database rather than the default database

sss.puma.read_puma(path, year)

Read puma data as data frame.

The puma data stores by states in txt format.

Parameters:
  • path (str) – path name of txt puma file

  • year (int) – year of the puma data collected

Returns:

A dataframe built just from the puma text file with the different levels of fips, population size and other details.

Return type:

pandas.datafranme

City

Code to define and interact with the City table.

sss.city.add_city(path, year)

Read city data into data frame and prepare for database table.

Parameters:
  • path (str) – path name of city excel file

  • year (int) – year of the population data collected

Returns:

the returned dataframe has population size by city

Return type:

pandas.datafranme

Raises:

ValueError – If the state cannot be extracted from SSS_city.

sss.city.city_to_db(data_path, year, testing=False)

Read city data as data frame and insert it to database.

The data file needed is called “2020_PopulationDatabyCity_20220804_Ama.xlsx”

Parameters:
  • path (str) – path name of city excel file

  • year (int) – year of the population data collected

  • testing (bool) – If true, use the testing database rather than the default database

Report

Create the report table and functions to interact with table.

sss.report.add_one_entry_reportdb(year, state, analysis_type, cpi_month, cpi_year, update_date, update_person, testing=False)

Insert one record into report table.

Parameters:
  • year (int) – year of report

  • state (str) – name of state

  • analysis_type (str) – analysis type of sss, e.g. full, partial

  • cpi_month (str) – cpi month of the report, like May

  • cpi_year (int) – cpi year of the report

  • update_date (date) – update date, the format is date(2021,6,22)

  • update_person (str) – who update the report

  • testing (bool) – If true, use the testing database rather than the default database

sss.report.add_report(path)

Read report data into data frame and perprare for database report table.

Parameters:

path (str) – path name of report excel file

Returns:

the returned dataframe has report record

Return type:

pandas.datafranme

sss.report.delete_one_entry_reportdb(year, state, analysis_type, testing=False)

Delete one report entry.

Parameters:
  • year (int) – year of report

  • state (str) – name of state

  • analysis_type (str) – analysis type of sss, e.g. full, partial

  • testing (bool) – If true, use the testing database rather than the default database

sss.report.report_to_db(path, testing=False)

Insert report file to the report table.

The report file is named “Year_Type_SSS_CPI month year_20220715_DBu.xlsx”

Parameters:
  • path (str) – path name of report excel file

  • testing (bool) – If true, use the testing database rather than the default database