API for downloading files from dataunderground.org with Python #24
santisoler
started this conversation in
Projects
Replies: 1 comment 2 replies
-
@kwinkunks @EvanBianco you might be interested in this and surely could give us some insight about the details of dataunderground.org |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Slack Channel
People
Feel free to join
Description
Dataundeground.org is an awesome place to find open data. It's great for discovering new datasets that might be helpful in our research, can be used as real world examples when teaching or as sample datasets for any piece of software we are developing.
Currently the way we can download any dataset from dataunderground.org is by going into its website with a browser and manually download the file we want. The UI is very intuitive, all the important information about the dataset is displayed and the download links are very easily find. Although, once we spotted a very cool dataset we want to start working with, we would have to repeat the process of manually downloading it every time we change computers. So for example, if a colleague wants to run the same notebook we've developing, they would have to manually download the dataset beforehand.
I was thinking about building a Python API that could allow us to download any dataset present in dataunderground.org directly from our Python scripts or Jupyter notebooks. All the backend implementation for downloading the files, caching them and running checksums is already present in Pooch, so this API would be very thin if we use Pooch as the main dependency.
For example, the API could be used like this:
The function could work in a similar way as the
Pooch.fetch
method does: it downloads the dataset (if not cached already), runs a checksum and returns its path so we can then load it with the suited tool (Pandas, Xarray, LASIO, ObsPy, etc).Goals
fetch()
function in the API that downloads the dataset, caches it, runs a checksum and returns its path.Resources
Skills (if needed)
Because the heavy lifting work is already done in Pooch, this project is very accessible. We would need people that could help writing the API and its documentation. We would also need to test how it works with different types of datasets and to find bugs or mistakes on the documentation. Everyone is welcomed, you don't need to be a developer guru to join!
Beta Was this translation helpful? Give feedback.
All reactions