Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include JSON in repository? #5

Open
jochakovsky opened this issue May 26, 2016 · 14 comments
Open

Include JSON in repository? #5

jochakovsky opened this issue May 26, 2016 · 14 comments

Comments

@jochakovsky
Copy link

http://data.okfn.org/data/core/country-list makes both CSV and JSON formats available for download, but only the CSV is directly available in this repository. Would it be possible to include the JSON in this repository as well? Thank you!

@rufuspollock
Copy link
Member

@jochakovsky what's the exact use-case?

In general, we want to keep this is a clean tabular data package which means CSV only.

However, we also want to support user needs so good to know the requirement :-)

@Glutnix
Copy link

Glutnix commented Oct 13, 2016

A popular package such as https://www.npmjs.com/package/country-list could really do with getting its dependancies from here.

@fannarsh
Copy link

Currently the npm package country-list is getting the data from here, but it needs to convert it from cvs to json.

@rufuspollock
Copy link
Member

@fannarsh really useful data point. How are you getting this data package? Are you submoduling it, puling it direct from raw or getting it from data.okfn.org/data/country-list? I note the latter already has a JSON version via the API but it sounds like having json would be useful.

Let me know a bit more about what you'd like and let's see if we can get it working for you 😄

@fannarsh
Copy link

Currently I'm pulling it once (per update) from raw and then converting it to json and storing it in my repo.
To be honest, whether the data is in csv or json doesn't matter much to me, but of course it would be nicer and a little bit more convenient to be able to download json directly.
On the other hand if you would publish the data as an npm package then I could simple require that package as an dependency and never worry about needing to update the data myself. And that would also make it easier for other developers to include your data in other projects (if there are such developers that wouldn't like to use my module 😀 ).

@rufuspollock
Copy link
Member

@fannarsh that's really useful clarification.

And the idea about building to a npm package is a really nice idea. If we did to npm package i assume you'd want it as JSON or would CSV work (or both)? And would it literally just be the raw data in there (plus datapackage.json)?

@fannarsh
Copy link

I would just add the JSON, and I wouldn't bother adding the datapackage.json either since its my understanding it's metadata about the CSV data/structure.
If I would need to do any edits to the data or bigger work I would probably clone the repo and work from that. The npm package itself would never be a source for doing actual editing work on the data.

@fannarsh
Copy link

I added a pull request with an npm package definition that would be good enough for my use case.
#6

@rufuspollock
Copy link
Member

@fannarsh reviewing the PR and thank-you for this

Just wondering atm about whether we want in this repo or in a separate repo - we may not want node stuff in here but rather in a small separate repo.

@rufuspollock
Copy link
Member

@fannarsh just to say i'm working on this - classic coder issue of trying to make something generic to generate these automatically. If i don't get this sorted soon I'll just take your version and post ...

@fannarsh
Copy link

hehe, no sweat, I recognise that problem :)

@rufuspollock
Copy link
Member

@fannarsh @jochakovsky we have something to check out now - a npm/node branch in this repo and a published package on npm

https://github.com/datasets/country-list/tree/npm

https://www.npmjs.com/package/@datasets/country-list

It would be great to get your feedback and thoughts here, especially as going forward we are committed to doing this node packaging for more and more of the core datasets. For example:

  • What would be the best way to generate these node packages so they are useful to other folks in node community both end user developers and other package maintainers? For example, should it just be totally minimal i.e. just json or should it include a minimal API (for the moment we've add a small API inspired by what you did @fannarsh).
  • What other datasets are a priority for node packaging? Currency codes? Country flags?
  • Should we publish inside a npm org e.g. @datasets as we did with this one or is it better to have something like country-list-data

Any other comments or thoughts warmly welcome.

Aside: we have never forgotten about this. It has just taken a crazy long time for various reasons including some classic yak-shaving: we've been doing a major reboot of https://data.okfn.org and https://datahub.io/ -- which have merged together. Part of that is being able to do a lot of automation ranging from generating the json to generating node packages from data packages ...).

@fannarsh
Copy link

fannarsh commented Sep 5, 2018

Hi @rufuspollock,

I like what you guys have done so far.

I think that you should keep the packages under @datasets org and I like the idea of providing a minimal api like you have done with @datasets/country-list.
However I would like to see another package that would be data only, could be named @datasets/country-list-data so that f ex. me could require only the data and keep my package up to date in a easy way.
The @datasets/country-list package could even

// https://github.com/datasets/country-list/blob/npm/index.js
let countryList = require('@datasets/country-list-data');
// instead of 
let countryList = require('./data.json');

But that all depends on how you want to maintain the packages/repos.

If you would not want to release a pure @datasets/country-list-data package then I would suggest adding data exports to the api so that I could access the data and use directly.

rawData = countries.data()

Regarding other datasets, I would say bring them all on 😄 but in reality, workwise it would maybe make sense to start of with *-data packages without the minimal api since that could take more time to figure out. But if the *-data packages are out there then it allows other developers to pick up the thread and do something useful/fun with the data.
And churning out packages containing the datasets could just be a question of right tooling.

@rufuspollock
Copy link
Member

@fannarsh cool and really useful feedback. Would you like to contribute here - we could give you perms. Also folks on our team like @zelima and @svetozarstojkovic can provide support and guidance 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants