Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester 2.0: [1/4d] Create a DCATUS dataset on CKAN #4463

Closed
1 task
rshewitt opened this issue Sep 13, 2023 · 6 comments
Closed
1 task

Harvester 2.0: [1/4d] Create a DCATUS dataset on CKAN #4463

rshewitt opened this issue Sep 13, 2023 · 6 comments
Assignees

Comments

@rshewitt
Copy link
Contributor

rshewitt commented Sep 13, 2023

User Story

datagov wants the ability to create a DCATUS dataset using the CKAN API in the new harvesting repo

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN a valid DCATUS catalog
    AND a data.gov catalog-like CKAN system is running independently
    AND a CKAN dataset schema reflecting the information on a dataset dataset page
    WHEN the "create dataset"-equivalent function is called
    THEN the DCATUS catalog will be transformed into the CKAN-compatible schema
    THEN the transformed DCATUS catalog will be added to CKAN

Background

  • the current harvesting solution is being replaced. in the short-term, the CKAN catalog is intended to persist so the "0.1" harvester replacement needs to offer CKAN dataset creation.
  • we have a transformation function which can convert a dcatus catalog to a ckan format. The intention is to start fresh and make an improved version.
    • examples of unnecessary or overly custom code:
      • checking if a parent package id starts with a specific string. link
      • are we using schema version 1.0? link
        • it appears there are 15 packages which use 1.0. do we upgrade these packages to 1.1?
      • do we need save_object_error?
      • do we still need MAPPING and SKIP when we have MAPPING_V1_1 and SKIP_V1_1?

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • create a function which creates a DCATUS dataset via CKAN API
  • verify dataset exists
  • Document any bugs or strangeness with the dataset display on the UI, or any differences in the data via the API.

Resources

@rshewitt rshewitt added the H2.0/Harvest-General General Harvesting 2.0 Issues label Sep 13, 2023
@hkdctol hkdctol moved this to 📟 Sprint Backlog [7] in data.gov team board Sep 14, 2023
@rshewitt rshewitt self-assigned this Sep 15, 2023
@rshewitt rshewitt moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Sep 15, 2023
@rshewitt
Copy link
Contributor Author

package_create route is disabled. is it possible to allow this for dev only?

@rshewitt
Copy link
Contributor Author

branch

@rshewitt rshewitt moved this from 🏗 In Progress [8] to 📡 Blocked in data.gov team board Sep 18, 2023
@rshewitt rshewitt moved this from 📡 Blocked to 🏗 In Progress [8] in data.gov team board Sep 20, 2023
@rshewitt rshewitt moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Sep 20, 2023
@rshewitt
Copy link
Contributor Author

rshewitt commented Sep 22, 2023

an mvp ckan dataset only contains information that is present on the dataset page. here's a list of fields which appear to be on that dataset page. this list excludes any values calculated by CKAN ( e.g. modified date, schema information, harvest source id, source hash, etc... )

  • title
  • owner_org
  • notes
  • accessLevel
  • license_id
  • publisher
    • value
  • maintainer
  • resources
    • description
    • url
    • name
  • tags
  • type
  • identifier
  • modified
  • bureauCode
  • programCode
  • publisher_hierarchy

@rshewitt rshewitt moved this from 👀 Needs Review [2] to 🏗 In Progress [8] in data.gov team board Sep 26, 2023
@rshewitt
Copy link
Contributor Author

ckan catalog dev test dcatus catalog
Image

prod ckan dataset
Image

comparing the pictures indicates a similarity between the two. the catalog dev dataset is created using a dcatus-to-ckan conversion function in this branch

@rshewitt rshewitt moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Sep 26, 2023
@rshewitt rshewitt moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Sep 27, 2023
@rshewitt
Copy link
Contributor Author

include the remaining information that needs to be added to the transformed dataset ( dcatus-to-ckan )

@rshewitt rshewitt moved this from ✔ Done to 🏗 In Progress [8] in data.gov team board Sep 28, 2023
@rshewitt
Copy link
Contributor Author

ckan schemas

@rshewitt rshewitt moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Oct 20, 2023
@rshewitt rshewitt moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Oct 20, 2023
@hkdctol hkdctol closed this as completed Oct 26, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Oct 26, 2023
@btylerburton btylerburton added H2.0/Load and removed H2.0/Harvest-General General Harvesting 2.0 Issues labels Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants