Self-Sufficiency Standard Database
Background
The Self-Sufficiency Standard(SSS) was created by the Center of Women’s Welfare (CWW) at the University of Washington as an alternative to the Official Poverty Measure (OPM). The Self-Sufficiency Standard data is spread across the CWW website and this repository creates a database to hold the Self-Sufficiency Standard data.
Computer setup
See directions here for detailed instructions for users who are not familiar with working with bash, git and python. These instructions cover installation and setup in more detail than the brief sections below.
Installation
Clone this repository using
git clone https://github.com/Center-for-Women-s-Welfare/SSS.git
,
change directories into the newly created SSS
folder and install the
package using pip install .
(including the dot). Note that this will
attempt to automatically install any missing dependencies. If you use
conda you might prefer to first install the dependencies as described in
Dependencies.
To install without dependencies, run pip install --no-deps
Dependencies
If you are using conda
to manage your environment, you may wish to
install the following packages before installing sss
:
Required:
alembic>=1.10
numpy>=1.21
openpyxl>=3.1.0,!=3.1.1
pandas>=1.5.0
pyxlsb>=1.0.8
setuptools_scm>=7.0.3
sqlalchemy>=1.4.16
If you want to do development on sss, in addition to the other dependencies you will also need the following packages:
pytest
pytest-cov
coverage
sphinx
pypandoc
One way to ensure you have all the needed packages is to use the
included sss.yaml
file to create a new environment that will contain
all the optional dependencies along with dependencies required for
testing and development (conda env create -f sss.yml
).
Alternatively, you can specify dev
when installing sss (as in
pip install .[dev]
) to install the packages needed for testing and
documentation development.
Tests
Uses the pytest
package to execute test suite. From the source sss
directory run pytest
or python -m pytest
.
Using the sss package
More detailed usage descriptions are here but brief descriptions are included below for the experienced user.
Developers wishing to modify the code and/or database schema should see the detailed documentation here
Database configuration file
A configuration file, located at ~/.sss/sss_config.json
, is needed
to define where the database file is located on your machine.
It should look like the following, with <<<path-to-dbfile>>>
replaced with the full path (including the file name) on your machine to
the database file and <<<path-to-test-dbfile>>>
replaced with the
full path (including a file name) on your machine to a location where a
test database file can be created (one reasonable option is a file named
test_sss.sqlite
inside the top-level folder for the sss package):
{
"default_db_file": "<<<path-to-dbfile>>>",
"test_db_file": "<<<path-to-test-dbfile>>>"
}
Creating the Database
In normal use, once you create the database, you will not need to do
this again. When testing, however, you may need to delete it (just
delete the sqlite file, with a file browser or with rm
in a
terminal) and re-make it.
To create the database, use the create_database.py
script, which
will create a new database file in the location specified in your
~/.sss/sss_config.json
file.
Inserting Data
To insert data into the database, use the data_to_primary.py
,
data_to_city.py
and data_to_puma.py
scripts. These scripts take
a file or folder containing the data to upload as an argument.
The
data_to_primary.py
script will insert Self-Sufficiency data into the database. It takes either an excel file or folder as an argument, if a folder is passed it will read in all the excel files in that folder.To have the full data (as of August 2022), it must be a folder containing 144 file of the SSS data from 2017-2022, excluding the following files NYC2018_SSS_Full.xlsx and NYC2021_SSS_Full.xlsx. These files are exlcuded because they contain duplicated information.
The
data_to_report.py
script will insert data about the SSS reports into the report table. It takes a single excel file in a specific format as input.The
data_to_geoid.py
script will insert data linking the SSS places to FIPS codes from the census and the CPI regions. It takes two excel files (one for the FIPS info and one for the CPI region info) in specific formats as input.The
data_to_puma.py
script will insert Public Use Microdata Area files from the census into the database. It can take a file or folder containing the puma files for multiple states as input. It also requires a single excel file containing the Washington state and New York City SSS place to census place mappings.The
data_to_city.py
script will insert data linking cities to SSS places with population into the database. It takes a single excel file in a specific format as input.