Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: data-govt-nz/schema
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 1.1.0
Choose a base ref
...
head repository: data-govt-nz/schema
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
  • 4 commits
  • 12 files changed
  • 4 contributors

Commits on Feb 25, 2018

  1. Add CKAN API json to CSV exporter

    This feature allows an agency to export their current set of dataset metadata matching the data.govt.nz dcat schema.
    anotheredward authored and camfindlay committed Feb 25, 2018
    Copy the full SHA
    015dc9b View commit details
  2. 1.2.0

    camfindlay authored Feb 25, 2018
    Copy the full SHA
    a11148f View commit details

Commits on Aug 20, 2019

  1. bump version of package.json

    ebuckley authored and camfindlay committed Aug 20, 2019
    Copy the full SHA
    7e1c565 View commit details

Commits on Nov 20, 2019

  1. Update README.md

    shafquatnz authored Nov 20, 2019
    Copy the full SHA
    77e25e6 View commit details
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Data.json conversion tool

## 1.2.0
- Add CKAN CSV exporter tool

## 1.1.0
- Add Heroku support

## 1.0.0
- Convertion tool turned into web app

## 0.1.0
- Inital public release of the node conversion tool.
- Documentation provided in README.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -133,7 +133,7 @@ The data.json file should have ASCII or UTF-8 character encoding (as per the JSO

## Comparison with USA data.json schema

The NZ data.json format is based on both the [Data.gov.uk data.json](http://guidance.data.gov.uk/dcat_fields.html) and the [Project Open Data data.json schema](https://project-open-data.cio.gov/v1.1/schema/) used by the U.S. Federal Government. Both are derivatives of the [DCAT (data catalog vocabulary)](https://www.w3.org/TR/vocab-dcat/) schema standard.
The NZ data.json format is based on both the [Data.gov.uk data.json](https://guidance.data.gov.uk/publish_and_manage_data/harvest_or_add_data/harvest_data/dcat/#accepted-dcat-and-data-json-fields) and the [Project Open Data data.json schema](https://project-open-data.cio.gov/v1.1/schema/) used by the U.S. Federal Government. Both are derivatives of the [DCAT (data catalog vocabulary)](https://www.w3.org/TR/vocab-dcat/) schema standard.

There are a few conscious differences that are listed here for reference:

@@ -151,5 +151,4 @@ See [LICENSE.md](LICENSE.md)
See [CONTRIBUTING.md](CONTRIBUTING.md)

## Maintainer
- Cam Findlay <cam.findlay@dia.govt.nz>
- Data.govt.nz team <info@data.govt.nz>
41 changes: 41 additions & 0 deletions jsonToCsv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Convert a data.json to CSV

This module is intended to help new organizations onboard with automation by using their exsisting [data.govt](https://data.govt.nz) catalogue to populate a CSV.

Once you have the csv you can update your data.json file by using the CLI/Web interface of the CSV to DCAT converter.


## How to use this tool

**requirements**
node `8.9.4` and npm `5.6.0`

**Usage**

1. clone the repo, cd into the repo.
2. install the modules in this directory `npm install`.
3. run the script (exporting the ministry of health organization) `node jsonToCsv.js -a ministry-of-health`.

**Options**

```
Usage: jsonToCsv [options]
Tool for downloading an agency's existing data from data.govt in a format they can use to re-submit it via the schema tool.
Options:
-V, --version output the version number
-a, --agency [agency] CKAN agency identifier, eg: ministry-of-health
-f, --file [fileName] CSV filename to write to, eg: output.csv, defaults to: <agency-identifer>.csv
-r, --row [count] Number of records to be download, defaults to: 1000
-u, --url [url] ckan url, eg: https://catalogue.data.govt.nz, defaults to: https://catalogue.data.govt.nz
-h, --help output usage information
```

**Tests**
We have tests.
```
./node_modules/.bin/jest
```
59 changes: 59 additions & 0 deletions jsonToCsv/jsonToCsv.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
'use strict'
const jsonpath = require('jsonpath')
const fetch = require('node-fetch')
const json2csv = require('json2csv')
const flatten = require('lodash.flatten')
const uniq = require('lodash.uniq')
const fs = require('fs')
const program = require('commander')
const jsonResultToCsvRow = require('./lib/jsonResultToCsvRow')

program
.version('0.1.0')
.description('Tool for downloading an agency\'s existing data from data.govt in a format they can use to re-submit it via the schema tool.')
.option('-a, --agency [agency]', 'CKAN agency identifier, eg: ministry-of-health')
.option('-f, --file [fileName]', 'CSV filename to write to, eg: output.csv, defaults to: <agency-identifer>.csv')
.option('-r, --row [count]', 'Number of records to be download, defaults to: 1000')
.option('-u, --url [url]', 'ckan url, eg: https://catalogue.data.govt.nz, defaults to: https://catalogue.data.govt.nz')
.parse(process.argv)

const agencyIdentifier = program.agency

if (!agencyIdentifier) {
console.error('--agency CKAN agency identifier, eg: ministry-of-health, is a required parameter. try `node jsonToCsv.js --agency ministry-of-health`')
process.exit()
}
const host = program.url || 'https://catalogue.data.govt.nz'
const recordCount = program.row || 1000
const url = `${host}/api/action/package_search?fq=organization:${agencyIdentifier}&rows=${recordCount}`
const fileName = program.file || agencyIdentifier + '.csv'

convertOnlineReport()

async function convertOnlineReport () {
const response = await fetch(url)
const body = await response.json()

const data = mapObjects(body)
const fields = getHeaderFieldNames(data)
const csv = json2csv({data, fields})
fs.writeFileSync(`./${fileName}`, csv)
console.log('Finished, written to file.')
}

function mapObjects (jsonInput) {
const results = jsonpath.query(jsonInput, '$.result.results.*')
const mappedResults = results.map(jsonResultToCsvRow.map)
return mappedResults
}

function getHeaderFieldNames (data) {
const nestedFields = data.map(obj => Object.keys(obj))
const flattenedFields = flatten(nestedFields)
const uniqueFields = uniq(flattenedFields)
return uniqueFields
}

module.exports = {
convert: mapObjects
}
57 changes: 57 additions & 0 deletions jsonToCsv/lib/jsonResultToCsvRow.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
'use strict'
const jsonpath = require('jsonpath')
const mergeObjects = Object.assign

module.exports = {
map: mapObject
}

function mapObject (jsonInput) {
let mappedObject = {
title: jsonInput.title,
description: jsonInput.notes,
identifier: jsonInput.url,
licence: jsonInput.license_url,
keywords: getKeywords(jsonInput),
issued: jsonInput.issued,
modified: jsonInput.modified,
'publisher.name': jsonInput.author,
'publisher.mbox': jsonInput.author_email,
'contactPoint.fn': jsonInput.maintainer,
'contactPoint.hasPhone': jsonInput.maintainer_phone,
'contactPoint.hasEmail': jsonInput.maintainer_email,
landingPage: jsonInput.url,
updateFrequency: jsonInput.frequency_of_update,
theme: jsonInput.theme,
temporal: jsonInput.temporal,
spatial: jsonInput.spatial
}
const distributions = getDistributions(jsonInput)
mergeObjects(mappedObject, distributions)
return mappedObject
}

function getKeywords (jsonInput) {
const tags = jsonpath.query(jsonInput, '$.tags.*.display_name')
const keywords = tags.join(', ')
return keywords
}

function getDistributions (jsonInput) {
const distributions = jsonpath.query(jsonInput, '$.resources.*')
const mappedDistributions = distributions.map((val, i) => mapDistribution(val, i))
const result = {}
mappedDistributions.forEach(distribution => {
mergeObjects(result, distribution)
})
return result
}

function mapDistribution (distribution, i) {
return {
[`distribution.${i}.downloadURL`]: distribution.url,
[`distribution.${i}.format`]: distribution.format,
[`distribution.${i}.size`]: distribution.size,
[`distribution.${i}.title`]: distribution.name
}
}
Loading