-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INF-427: Download country/institution data in CSV format #56
Conversation
eb74fa4
to
c4dca76
Compare
Modified the API worker to pull the entity data from KV storage, parse the timeseries and repository data into a CSV like arrays and then zip them up for download when the link of /api/download/[entity.type]/[entity.id] is hit. The clickable download link for the zip on the COKI OA website for the specific country or institution appears as |
01ab518
to
634a894
Compare
6b944fa
to
03c4b62
Compare
Having to force flexsearch to stay at v0.7.2 as there are issues with type definitions in v0.7.3. When the issue has been fixed we can considering upgrading. nextapps-de/flexsearch#342 Also added necessary config so it will ignore library files when doing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alex, thanks for the PR, it is looking good, the main feedback is about:
- Using the
MetadataLink
object for the download UI functionality. - Using routing features from itty-router so that you don't have to manually parse route parameters.
- And using a JSON to CSV converter library as it will reduce the amount of code that we have to write for conversion and handle things such as quoting when commas are in strings etc.
workers-api/src/types.ts
Outdated
@@ -105,3 +122,17 @@ export interface EntityRequest extends Request { | |||
}; | |||
query: {}; | |||
} | |||
|
|||
export interface dataRequest extends Request { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As dataRequest is an interface for an object rather than a function it should use CamelCase.
Also take a look at how the params for EntityRequest is defined so that you don't have to manually parse the entityType and id fields in downloadDataZipHandler
.
workers-api/src/downloadDataZip.ts
Outdated
export const parseEntityTimeseriesToCSV = async (entityData: Array<Year>) => { | ||
const dataToCSV = []; | ||
|
||
// Obtain headers from the Entity data. | ||
// Scan through all years for columns that exist. | ||
var numColumns = 0; | ||
var columnHeaders: Array<string> = ["year"]; | ||
for (const dataLine of entityData) { | ||
if (numColumns < Object.keys(dataLine.stats).length) { | ||
numColumns = Object.keys(dataLine.stats).length; | ||
for (const column in dataLine.stats) { | ||
columnHeaders.push(`${column}`); | ||
} | ||
} | ||
} | ||
dataToCSV.push(columnHeaders); | ||
|
||
// Transform data from the enitity object to csv like array | ||
var dataLineTimeSeries: Array<string> = [""]; | ||
for (const entityTimeSeriesData of entityData) { | ||
const statsArray = Object.keys(entityTimeSeriesData.stats).map(key => `${entityTimeSeriesData.stats[key]}`); | ||
dataLineTimeSeries = [`${entityTimeSeriesData.year}`].concat(statsArray); | ||
dataToCSV.push(dataLineTimeSeries); | ||
} | ||
|
||
// Add commas, line breaks and join into large string | ||
let csvContent: string = dataToCSV.map(e => e.join(",")).join("\n"); | ||
|
||
return csvContent; | ||
}; | ||
|
||
export const parseEntityRepositoriesToCSV = async (entityData: Array<Repository>) => { | ||
const dataToCSV = []; | ||
|
||
// Obtain headers from the Entity data. | ||
var numColumns = 0; | ||
var columnHeaders: Array<string> = []; | ||
for (const dataLine of entityData) { | ||
if (numColumns < Object.keys(dataLine).length) { | ||
numColumns = Object.keys(dataLine).length; | ||
for (const column in dataLine) { | ||
columnHeaders.push(`${column}`); | ||
} | ||
} | ||
} | ||
dataToCSV.push(columnHeaders); | ||
|
||
// Transform data from the enitity object to csv like array | ||
// Some of the repository names have commas in them and need to be removed | ||
for (const entityRepositories of entityData) { | ||
const repository = Object.keys(entityRepositories).map(key => `${entityRepositories[key]}`.replaceAll(",", "")); | ||
dataToCSV.push(repository); | ||
} | ||
|
||
// Add the commas and line breaks | ||
let csvContent: string = dataToCSV.map(e => e.join(",")).join("\n"); | ||
|
||
return csvContent; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there would be less code to maintain and it would be more robust to use a JSON to CSV library, e.g. https://juanjodiaz.github.io/json2csv/#/parsers/parser. You could use the Parser object.
It would be more robust because they already handle string quoting when there are commas in the repository names etc.
if (data != undefined) { | ||
var a = document.createElement("a"); | ||
a.href = window.URL.createObjectURL(data); | ||
a.download = `COKI_data_${entity.entity_type}_${entity.id}.zip`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we call the file Country name + COKI Dataset.zip, e.g "Mali COKI Dataset.zip"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alex, the changes look great. It looks like Typescript support is coming for the json2csv package, so we can update it when that is released.
I've left a few smaller comments. I also need to do a few things, including making the icon, checking about the authors and license of the jszip library.
workers-api/package.json
Outdated
@@ -35,8 +37,11 @@ | |||
"typescript": "^4.4.4" | |||
}, | |||
"dependencies": { | |||
"flexsearch": "^0.7.21", | |||
"@json2csv/node": "^6.1.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be @json2csv/plainjs
rather than @json2csv/node
.
workers-api/package.json
Outdated
"itty-router": "^2.6.6", | ||
"jszip": "^3.10.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache might be compatible with GPL, but GPL is not compatible with Apache, so I need to check with Cameron to make sure the dual licensing is fine.
81a6984
to
a6d46a0
Compare
Co-authored-by: Alex Massen-Hane <[email protected]> Co-authored-by: Jamie Diprose <[email protected]>
a6d46a0
to
692f613
Compare
Use fetch to grab the json from Cloudflare KV storage, parse it to CSV and allow it to be downloaded when user clicks the button.
Need to figure out a few kinks with how Cloudflare worker-api gives it's data.