This is a simple python script for extracting, transforming, and analyzing BEA data in order to compare various geographies by various measures with the output being a collection of similar geographies.
The purpose is to provide autosuggest functionality based upon similar geographies as an addition to regional information.
To determine similar geographies, this script relies on sklearn library's k-nearest-neighbor.
This is done for two reasons:
- There is a requirement for a minimum number of similar geographies (in this case 6) which K-Means and other clustering algorithms cannot generally provide.
- It is fairly well understood and replicable.
I found this post by kevinzakka particularly useful for some more information on K-Nearest-Neighbor (KNN).
Note: This application of KNN is not perfect. You will notice at no point is there a training or test dataset created and there are no goodness of fit tests. The primary reason for this is that I am intentionally overfitting this data and have no intention of making predictions or further classifications on incoming data. Also, by analyzing each measure independently, normalization issues can be avoided.
Further, this analysis is a point in time. I really enjoyed this presentation of KNN for time series (though for regression and some of the slides are messed up). It may be interesting in the future to do this analysis over a period of time instead.
Requirements can be installed using standard
pip install -r requirements.txt
This is written in python 3.6, but also runs in 2.7
Running the comparator can be run as:
python Comparator.py
An output CSV is created in the output directory with a schema of:
Field | Description |
---|---|
Rank | The ranking of nearest neighbor based on distance |
comparator | The geofips code matching the rank |
comparator_name | The comparators geographical name |
index | The geofips code of the geography being compared |
index_name | The name of the index geography |
code | The measure being compared e.g. GDP_SAN is State GDP in current dollars |
geo | The geographic scope of the row (MSA, State, or County) |