API for downloading files from dataunderground.org with Python #24

santisoler · 2021-04-16T13:21:40Z

santisoler
Apr 16, 2021
Maintainer

Slack Channel

has to be created

People

Santiago Soler

Feel free to join

Description

Dataundeground.org is an awesome place to find open data. It's great for discovering new datasets that might be helpful in our research, can be used as real world examples when teaching or as sample datasets for any piece of software we are developing.
Currently the way we can download any dataset from dataunderground.org is by going into its website with a browser and manually download the file we want. The UI is very intuitive, all the important information about the dataset is displayed and the download links are very easily find. Although, once we spotted a very cool dataset we want to start working with, we would have to repeat the process of manually downloading it every time we change computers. So for example, if a colleague wants to run the same notebook we've developing, they would have to manually download the dataset beforehand.

I was thinking about building a Python API that could allow us to download any dataset present in dataunderground.org directly from our Python scripts or Jupyter notebooks. All the backend implementation for downloading the files, caching them and running checksums is already present in Pooch, so this API would be very thin if we use Pooch as the main dependency.

For example, the API could be used like this:

import dataundeground

# Fetch a dataset from dataundeground.org
fname = dataundeground.fetch(id="...")

The function could work in a similar way as the Pooch.fetch method does: it downloads the dataset (if not cached already), runs a checksum and returns its path so we can then load it with the suited tool (Pandas, Xarray, LASIO, ObsPy, etc).

Goals

Create a thin API for downloading datasets from dataunderground.org
Write a fetch() function in the API that downloads the dataset, caches it, runs a checksum and returns its path.

Resources

Skills (if needed)

Because the heavy lifting work is already done in Pooch, this project is very accessible. We would need people that could help writing the API and its documentation. We would also need to test how it works with different types of datasets and to find bugs or mistakes on the documentation. Everyone is welcomed, you don't need to be a developer guru to join!

santisoler · 2021-04-16T13:22:24Z

santisoler
Apr 16, 2021
Maintainer Author

@kwinkunks @EvanBianco you might be interested in this and surely could give us some insight about the details of dataunderground.org

2 replies

kwinkunks Apr 16, 2021
Maintainer

Definitely interested, I think it sounds awesome.

Have you seen Intake? I think it might solve part of the problem (or maybe an adjacent problem). I never got around to really digging into it, but check out the use-cases -- they sound awesome: https://intake.readthedocs.io/en/latest/use_cases.html

brendonhall Apr 21, 2021

This is a great idea. Ben Lasscock (@blasscoc) and I have experimented with some ideas like this for getting data for the FORCE contest:
https://github.com/brendonhall/FORCE-2020-Lithology/blob/master/notebooks/01-Log-Plot-MPL.ipynb
https://github.com/blasscoc/easy-as/blob/master/notebooks/Access%20Ichthys3D%20Seismic%20Data.ipynb

We've also messed around with an API for getting seismic data for the GSH contest as well:
https://github.com/VapeJordan/rss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for downloading files from dataunderground.org with Python #24

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

API for downloading files from dataunderground.org with Python #24

santisoler Apr 16, 2021 Maintainer

Slack Channel

People

Description

Goals

Resources

Skills (if needed)

Replies: 1 comment · 2 replies

santisoler Apr 16, 2021 Maintainer Author

kwinkunks Apr 16, 2021 Maintainer

brendonhall Apr 21, 2021

santisoler
Apr 16, 2021
Maintainer

Replies: 1 comment 2 replies

santisoler
Apr 16, 2021
Maintainer Author

kwinkunks Apr 16, 2021
Maintainer