You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think this is necessarily a bug, but it is something that caught me off guard until I thought it through, and could trip up other users, so maybe the solution is adding a bit of documentation.
In areal interpolation (not sure about other cases), if the source geometries have duplicates or overlaps, the results are wrong. At least for categoricals (I'm not sure what would happen to intensive/extensive, but I think something similar), some percentages add up to more than 1. My sense is this comes from more than one source geometry covering the same patch of land, which then causes it to be counted more than once. Again, this is what the method would do and, arguably, a strange case (it's unusual to have overlapping/duplicate source geometries), but maybe worth adding a line on the source_df documentation?
In areal interpolation (not sure about other cases), if the source geometries have duplicates or overlaps, the results are wrong.
not quite. The validity depends on the question. if you've got data on, say, overlapping school districts (some private, some public) and you're sending average test scores to a smaller geometry, then the target geometry contains the weighted average of the area covered by the overlapping polys (which is what you want in this case). If that small poly is covered entirely by two different overlapping schools, one private and one public, then the target gets 50/50 shares
if you've got an extensive variable with overlapping sources (and those overlaps are conceptually valid in the source data,) then the overlapping sum is correct
non-planar geometries are something that can obviously surface a lot in interpolation problems, so i've thought a few times about includng some sort of check, but ultimately non-planarity also a basic data check and something the user needs to understand about their data, so i've landed on the idea that folks should use https://github.com/sjsrey/geoplanar when they need to check their data
I don't think this is necessarily a bug, but it is something that caught me off guard until I thought it through, and could trip up other users, so maybe the solution is adding a bit of documentation.
In areal interpolation (not sure about other cases), if the source geometries have duplicates or overlaps, the results are wrong. At least for categoricals (I'm not sure what would happen to intensive/extensive, but I think something similar), some percentages add up to more than 1. My sense is this comes from more than one source geometry covering the same patch of land, which then causes it to be counted more than once. Again, this is what the method would do and, arguably, a strange case (it's unusual to have overlapping/duplicate source geometries), but maybe worth adding a line on the
source_df
documentation?tobler/tobler/area_weighted/area_interpolate.py
Line 221 in df0cbc6
What do you think?
The text was updated successfully, but these errors were encountered: