You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is work going in for get_locations(locations, resolution) on issue #36 (PR #37) that checks incoming values against the filesystem to see if they might be a CSV file to load from.
This might be better off split into a separate get_locations_csv(file, resolution) function for a few reasons:
Speculatively calling Path(location_id).exists() on every incoming location ID might cause performance issues when a large number of locations are being requested
There's the possibility that files could exist that have the same name as a location ID string, leading to unexpected errors or erroneous results
Security-wise, it's good to guard against people probing a system via unexpected avenues. If for example I was calling this library function with input passed through from a web form, errors from something like getLocations(["CHC", "AKL, ".dockerenv"]) could tell an attacker whether the code was running in a Docker container.
A separate get_locations_csv(file_location, resolution) function would make interacting with the host file system more intentional. It also provides us with an opportunity to take in either a Path or a file-like object (with a type under typing.TextIO) to read from, which would allow users to read CSV data in from non-traditional sources (e.g. S3 buckets) as well as from their local filesystems.
The text was updated successfully, but these errors were encountered:
DrCuriosity
changed the title
Refactor nzshm_common.location.location.py around reading from CSV files
Proposal: refactor nzshm_common.location.location.py around reading from CSV files
Apr 12, 2024
There is work going in for
get_locations(locations, resolution)
on issue #36 (PR #37) that checks incoming values against the filesystem to see if they might be a CSV file to load from.This might be better off split into a separate
get_locations_csv(file, resolution)
function for a few reasons:Path(location_id).exists()
on every incoming location ID might cause performance issues when a large number of locations are being requestedgetLocations(["CHC", "AKL, ".dockerenv"])
could tell an attacker whether the code was running in a Docker container.A separate
get_locations_csv(file_location, resolution)
function would make interacting with the host file system more intentional. It also provides us with an opportunity to take in either a Path or a file-like object (with a type undertyping.TextIO
) to read from, which would allow users to read CSV data in from non-traditional sources (e.g. S3 buckets) as well as from their local filesystems.The text was updated successfully, but these errors were encountered: