Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Refactor of Ulmo #109

Open
dharhas opened this issue Jun 8, 2015 · 8 comments
Open

Complete Refactor of Ulmo #109

dharhas opened this issue Jun 8, 2015 · 8 comments

Comments

@dharhas
Copy link
Contributor

dharhas commented Jun 8, 2015

I'm planning a complete refactor of ulmo with the following (major) target features:

  • Move to a plugin system for the services (using stevedore)
    • This will allow enforcement of a more consistent api (i.e. get_sites, get_stations etc would be harmonized, while still retaining flexibilty for individual plugins).
    • Allow including closed plugins that are for non-public datasets
  • Consistently use Pandas DataFrames internally with options to serialize to Python Dicts and GeoJSON
  • Move duplicate/common functionality into a common place
  • Create a common caching system. i.e. expand the hdf5 cache ability that ulmo.usgs.nwis.hdf has to all services
  • Python 3 support (While retaining Python 2.7 compatibility) using the 'six' module

I will probably tag the release 1.0 to indicate that it will break backwards compatibility.

If folks are interested in contributing to the refactor or wish to discuss these changes in more detail. Please comment on this issue.

@emiliom
Copy link
Contributor

emiliom commented Jun 9, 2015

I might be able to contribute, depending on the timing, etc. If nothing else, by posting this comment I'm adding myself to the notifications on discussions on this issue.

Quick question/comment: Are you considering using GeoPandas when the DataFrame has a spatial component? I've used GeoDataFrames a bit, but not enough to have a solid opinion regarding its maturity.

@dharhas
Copy link
Contributor Author

dharhas commented Jun 9, 2015

I'm actually researching spatial indexes right now. I'm leaning away from GeoPandas right now since it has dependencies on shapely and fiona and hence GEOS and GDAL which are a pain to install easily cross platform. I'm considering having a geometry column that is just an array of coords (Point/Line/Poly) to enable some simple bbox filtering.

@emiliom
Copy link
Contributor

emiliom commented Jun 9, 2015

Makes sense.

@dharhas
Copy link
Contributor Author

dharhas commented Jul 23, 2015

@emiliom @wilsaj @nathanhilbert @cameronbracken

So do folks have any preference between these two api approaches. I'm leaning towards b) but wanted to get some input.

a) Flat API. You pass service name and dataset name and any other parameters to each call:

stations = ulmo.get_features('usgs-nwis', 'iv', state='TX')
data = ulmo.get_data('usgs-nwis', 'iv', features='00824562', start=2014-01-01, parameters='00600')

This makes the api simpler and clearer to use but potentially less flexible. i.e not all services have something analogous to get_features (see usgs.eddn, we would have to raise a 'Not Implemented' on that). It is also a bit more verbose to type. The API would have to cover all the main use cases.

b) load a plugin by specifying service and dataset and then use that.

nwis = ulmo.load_service('usgs-nwis', 'iv')
stations = nwis.get_features(state='TX')
data = nwis.get_data(features='00824562', start=2014-01-01, parameters='00600')

This api is more flexible since each plugin could define its own api, we would have some base classes to maintain consistency for similar plugins (i.e. timeseries, raster etc) to keep the api reasonably consistent across plugins.

@jirikadlec2
Copy link
Contributor

how would you use b) with the CUAHSI WaterOneFlow / WaterML web services?

would it be something like:

cuahsi = ulmo.load_service('cuahsi-his', 'http://hydroportal.cuahsi.org/GLEON_Sunapee/cuahsi_1_1.asmx')
stations = cuahsi.get_features()
data = cuahsi.get_data(features = 'GLEON_Sunapee:SUNAPEE', variable = 'GLEON_Sunapee:watertemp', method = 9)

or would you consider the 'CUAHSI HIS Central' as a service and each of the HydroServers as a dataset?

@jirikadlec2
Copy link
Contributor

About the Python 3 support, I suggest that you can remove the dependency on suds, according to my knowledge the suds package only exists for Python 2 and it's not really used by ulmo except for the CUAHSI WaterOneFlow. One actively maintained replacement package to consider is PySimpleSOAP: https://pypi.python.org/pypi/PySimpleSOAP

@dharhas
Copy link
Contributor Author

dharhas commented Jul 23, 2015

The api I am considering is:

ulmo.list_services -> Gets a list of services available to ulmo. (i.e. nwis, cdec etc)
ulmo.list_datasets -> Gets a list of datasets available for a given service. (There might only be one)
ulmo.load_service -> Loads a specific service/dataset combo.

So I think 'CUAHSI HIS Central' would be the service and each HydroServer would be a dataset.

The other change I'm considering is moving each service into its own repo as an extension, potentially with its own maintainer. The pattern would be similar to the system the 'flask' package uses. i.e. we would have separate python ulmo (that has common functions and base classes) and ulmo_cuahsi, ulmo_nwis, etc.

Advantages:

  • ulmo.list_services() would list which extensions were installed
  • individual service extensions could be maintained by folks who use them heavily and are available to contribute.
  • each extension can have its own dependencies. i.e. cuahsi-his requires suds but other services don't.
  • We could also pin extensions to explicit versions of ulmo. For example, if cuahsi-his is not py3 ready we could pin it to the last py2 version of ulmo.
  • you could easily write your own closed/internal plugins

Disadvantage:

The main disadvantage of this approach is you would know have to install several modules to get full functionality, but I guess we could make a meta package that pulled in all supported plugins.

I don't have the bandwidth to support all the data sources so distributing the load would help out enormously. @jirikadlec2 would you be available convert from suds to PySimpleSOAP? I currently don't use HIS services much.

@dharhas
Copy link
Contributor Author

dharhas commented Jul 23, 2015

I'm going to experiment with the approach of having a package called 'ulmo-common' and converting the services to a extensions named 'ulmo-extensionname'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants