Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform FGDC/CSDGM source to DCAT-US #4564

Open
7 tasks
btylerburton opened this issue Dec 19, 2023 · 3 comments
Open
7 tasks

Transform FGDC/CSDGM source to DCAT-US #4564

btylerburton opened this issue Dec 19, 2023 · 3 comments
Labels
H2.0/Harvest-Transform Transform Logic for Harvesting 2.0

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Dec 19, 2023

User Story

In order to test another facet of the transformation step in our pipeline, datagovteam would like to transform an FGDC/CSDGM WAF source to DCAT-US and push that into CKAN.

This story depends on:

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN a harvest source with an CSDGM schema, stored in a WAF file format
    WHEN I add that to our ETL Pipeline with the appropriate harvest source config
    THEN I expect the mdTranslator to transform the source into valid DCAT-US datasets
    AND I expect it to be able to pushed into a CKAN development instance without error.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

No work planned outside the cloud.gov boundary.

Sketch

  • Configure new Harvest source pointing to a test WAF instance of CSDGM standard datasets
  • Extract that WAF using datagov-harvesting-logic WAF extract task
  • Transform datasets into DCAT-US using the MDTranslator application
  • Validate that the transformation conforms to DCAT-US standard
  • Push valid DCAT-US datasets into CKAN
  • Confirm that integration is successful and idempotent by running a test harvest source through the pipeline multiple times.
@btylerburton btylerburton added the H2.0/Harvest-General General Harvesting 2.0 Issues label Dec 19, 2023
@btylerburton btylerburton added H2.0/Harvest-Transform Transform Logic for Harvesting 2.0 and removed H2.0/Harvest-General General Harvesting 2.0 Issues labels Dec 19, 2023
@btylerburton
Copy link
Contributor Author

Combined with this ticket #4565

@btylerburton
Copy link
Contributor Author

Reopening in light of the fact that DCAT-US writer is not yet available.

@btylerburton btylerburton reopened this Dec 19, 2023
@btylerburton btylerburton changed the title Transform FGDC source to DCAT-US Transform FGDC/CSGM source to DCAT-US Dec 19, 2023
@btylerburton btylerburton changed the title Transform FGDC/CSGM source to DCAT-US Transform FGDC/CSDGM source to DCAT-US Dec 19, 2023
@gujral-rei gujral-rei moved this to 📔 Product Backlog in data.gov team board Dec 21, 2023
@btylerburton btylerburton moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Feb 29, 2024
@btylerburton btylerburton moved this from 📟 Sprint Backlog [7] to 📔 Product Backlog in data.gov team board Feb 29, 2024
@btylerburton btylerburton moved this from 📔 Product Backlog to Harvester 2.0 in data.gov team board May 2, 2024
@rshewitt
Copy link
Contributor

confirmed with johnathan that fgdc reader is in good shape. "it’s one of the better ones".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H2.0/Harvest-Transform Transform Logic for Harvesting 2.0
Projects
Status: 📥 Queue
Development

No branches or pull requests

2 participants