What Is Databuoy?

Jump to bottom Edit New page

Sam edited this page Sep 3, 2015 · 15 revisions

Databuoy is a spreadsheet-backed data catalog that anyone can put online for free. It helps organizations compile a machine-readable data inventory while simultaneously creating a public website that presents it.

With Databuoy, this spreadsheet turns into this website.

What does it mean that Databuoy is spreadsheet-backed?

Databuoy draws its data from a publicly viewable spreadsheet. Whenever that spreadsheet is updated, the website will be updated as well. That spreadsheet can either be in the form of a Google Sheet (which updates automatically) or a .csv file (which must be manually updated on GitHub). If you've got a compliant data.json file, Databuoy can also serve that.

How does the website get online?

Databuoy uses GitHub Pages to automatically provide a free, publicly-accessible website. By copying Databuoy's open-source code on GitHub ("forking" it) and pasting your spreadsheet's URL into the data_location file, you'll have a website at https://your_github_username.github.io/databuoy. You can even set up a custom domain name!

Does my spreadsheet need a particular format?

Yes! You should just make a copy of this example spreadsheet whose columns are based on the US Federal Government's Project Open Data Metadata Schema v1.1. Click here for more information on how to fill out the spreadsheet.

Do datasets need to be online for them to be in Databuoy?

No! The schema allows for datasets that are not public, so people can know about a dataset even if they don't have access to its contents (that way, we can know that the police has a list_of_criminals_unfit_for_public_circulation.xls file, even if we can't see who's in it).

How do I start?

Check out our setup guides for developers and non-developers.