Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading Shapefiles directly from zipfiles #75

Closed
dgleich opened this issue Sep 21, 2022 · 5 comments
Closed

Reading Shapefiles directly from zipfiles #75

dgleich opened this issue Sep 21, 2022 · 5 comments

Comments

@dgleich
Copy link
Contributor

dgleich commented Sep 21, 2022

Many Shapefiles are distributed directly as zip files.

The routine (below) shows how it is possible to read them directly from the zip file without decompressing it on disk. I used this to read all 3000 zip files from the us road database.

This seems like it might be a useful feature to add to the library. If that's something that might be of interest, let me know as there would be a few different ways this could be integrated into the library.

## Code to read shapefiles from zips
import ZipFile, Shapefile
function read_shp_from_zipfile(zipfile)
  r = ZipFile.Reader(zipfile)
  # need to get dbx
  shpdata, shxdata, dbfdata, prjdata = nothing, nothing, nothing, nothing
  for f in r.files
    fn = f.name
    lfn = lowercase(fn)
    if endswith(lfn, ".shp")
      shpdata = IOBuffer(read(f))
    elseif endswith(lfn, ".shx")
      shxdata = read(f, Shapefile.IndexHandle)
    elseif endswith(lfn, ".dbf")
      dbfdata = Shapefile.DBFTables.Table(IOBuffer(read(f)))
    elseif endswith(lfn, "prj")
      prjdata = try
        Shapefile.GeoFormatTypes.ESRIWellKnownText(read(f, String))
      catch
        @warn "Projection file $zipfile/$lfn appears to be corrupted. `nothing` used for `crs`"
        nothing 
      end
    end
  end
  close(r)
  @assert shpdata !== nothing
  shp = if shxdata !== nothing # we have shxdata/index 
    read(shpdata, Shapefile.Handle, shxdata)
  else
    read(shpdata, Shapefile.Handle)
  end 
  if prjdata !== nothing
    shp.crs = prjdata 
  end 
  return Shapefile.Table(shp, dbfdata)
end 
@visr
Copy link
Member

visr commented Sep 21, 2022

Thanks for raising the issue and sharing the code. I think indeed just using zipped shapefiles is becoming more common with other software like GDAL supporting it directly. One alternative approach I can think of is using https://github.com/JuliaIO/TranscodingStreams.jl, where users can supply the decompressor from CodecZlib. That way we avoid the JLL dependency while still making it easier to load from a compressed file (not just zipfiles).

Though since zipfiles are the most common and the JLL dependency is small perhaps just directly depending on CodecZlib is also reasonable.

@dgleich
Copy link
Contributor Author

dgleich commented Sep 21, 2022

So the simplest thing might be to setup Shapefile.jl to allow it to take in any object with an iterator over file IOs where each file has a .name entry. E.g. so you could call...

shp = Shapefile.Table(ZipFile.Reader("myfile.zip").files)

the ".files" object is really a Vector of IOs. So the generic input could be Vector{T} where T <: IO (but this doesn't always give a way to list filenames... hmm...)

This would avoid any dependencies, and still make it pretty easy to use.

It sounds like something similar might exist at some point for Tar files too.

@rafaqz
Copy link
Member

rafaqz commented May 25, 2023

@dgleich if you ever wanted to PR this change it would be useful.

@asinghvi17
Copy link
Member

This could also be implemented as an extension, with a nice error message saying that you have to load ZipFiles.jl for this to work correctly!

@asinghvi17
Copy link
Member

Solved by #113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants