You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We generate the spatial.parcel table based on a County geoDB that is supposed to have geometries for every parcel. There are two scripts that perform this extraction and transformation from the geo DB, one to pull the raw data and another to clean it up and save it to spatial.parcel.
A comparison between iasworld.pardat (our source of truth for parcel data) and spatial.parcel reveals that they do not always contain an identical set of PIN10s. Run this query in the Athena console to see the counts over time:
with pins_in_pardat_not_county_geodb as (
selectpar.taxyras year,
count(*) as pins_in_pardat_not_county_geodb
fromiasworld.pardatas par
left joinspatial.parcelas sp
on substr(par.parid, 1, 10) =sp.pin10andpar.taxyr=sp.yearwherepar.cur='Y'andpar.deactivat is nullandsp.pin10 is null-- Exclude years that we know are missing from County geoDBandpar.taxyr between '2000'and'2024'group bypar.taxyr
),
pins_in_county_geodb_not_pardat as (
selectsp.year,
count(*) as pins_in_county_geodb_not_pardat
fromspatial.parcelas sp
left joiniasworld.pardatas par
onsp.pin10= substr(par.parid, 1, 10)
andsp.year=par.taxyrandpar.cur='Y'andpar.deactivat is nullwherepar.parid is nullgroup bysp.year
)
select*from pins_in_pardat_not_county_geodb as par
full outer join pins_in_county_geodb_not_pardat as geodb
using(year)
order by year
It's possible that these discrepancies are the result of irreducible messiness in one or both of the raw data sources, in which case we'll need to seek help from the stakeholders who own these raw data sources or geocode the missing parcels ourselves (see #720). However, both of those paths forward will require a lot of work, and before we pursue either of them I want to be confident that we're not causing the discrepancies in our own transformation code. The results of the query above make me feel suspicious that there may be a bug in our code, since the variance in pins_in_pardat_not_county_geodb is much higher than the other way around (pins_in_county_geodb_not_pardat).
Deliverable
Let's double-check the transformation script to confirm that we are not accidentally removing rows from the county geoDB that map to parcels in pardat. I'm imagining this will look like:
Query iasworld.pardat to get all of the PIN10s in 2016, the most recent year with the highest count of missing shapes (this isn't a step in the transformation script, you'll have to write it yourself)
Background
We generate the
spatial.parcel
table based on a County geoDB that is supposed to have geometries for every parcel. There are two scripts that perform this extraction and transformation from the geo DB, one to pull the raw data and another to clean it up and save it tospatial.parcel
.A comparison between
iasworld.pardat
(our source of truth for parcel data) andspatial.parcel
reveals that they do not always contain an identical set of PIN10s. Run this query in the Athena console to see the counts over time:It's possible that these discrepancies are the result of irreducible messiness in one or both of the raw data sources, in which case we'll need to seek help from the stakeholders who own these raw data sources or geocode the missing parcels ourselves (see #720). However, both of those paths forward will require a lot of work, and before we pursue either of them I want to be confident that we're not causing the discrepancies in our own transformation code. The results of the query above make me feel suspicious that there may be a bug in our code, since the variance in
pins_in_pardat_not_county_geodb
is much higher than the other way around (pins_in_county_geodb_not_pardat
).Deliverable
Let's double-check the transformation script to confirm that we are not accidentally removing rows from the county geoDB that map to parcels in
pardat
. I'm imagining this will look like:iasworld.pardat
to get all of the PIN10s in 2016, the most recent year with the highest count of missing shapes (this isn't a step in the transformation script, you'll have to write it yourself)process_parcel_file
function for 2016 onlyOnce we're confident our code looks good, we can plan for a follow-up solution like #720.
The text was updated successfully, but these errors were encountered: