-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🍀 Document the FOUR Data Pipelines #4433
Comments
Note: TWO harvesting pipelines have been deprecated (I believe both of these are FGDC/ISO, but not sure):
|
Comment is in history. I deleted it to make the ticket cleaner. See diagrams below for the most up to date information. |
Link to MD Translator spike: #4200 |
The diagrams in the comments above represent the core of the harvesting optimization problem. What happens when.. What errors are not being captured... What assumptions are made that fail to be true.. The next step is reviewing the code, abstracting it into meaningful chunks, testing the functionality, preserving the best parts and fixing the broken parts. One of the corner stones of implementing a new version of this code deals with the following requirement:
The controller diagram highlights some high-level abstractions for input/output definitions. However, for example, within the |
User Story
In order to inform existing and new harvesting processes and procedures, the Data.go Architect Team wants to document the FOUR pipelines that all harvesting travels through. These pipelines will either be (1) optimized in the current system or (2) fed into building a better new system from the start.
Acceptance Criteria
WHEN I look at this ticket
THEN there is documentation about how the pipelines are structured with supporting details or insight into the complex intricacies
Background
Security Considerations (required)
...
Sketch
file json DCAT
DataJsonHarvester
,DatasetHarvesterBase
,HarvesterBase
file xml FDGC/ISO
GeoDataGovDocHarvester
,DocHarvester
,GeoDataGovHarvester
,SpatialHarvester
,HarvesterBase
file xml waf FDGC
GeoDataGovWAFHarvester
,WAFHarvester
,GeoDataGovHarvester
,SpatialHarvester
,HarvesterBase
api json ARCGIS
ArcGISHarvester
,SpatialHarvester
,HarvesterBase
specification (dcat, fdgc, arcgis). The idea behind specifying these is to highlight areas of abstraction to make the code more relevant and reusable. A proposed NEW pipeline consists of
api json DCAT
for large DCAT data.json files that are unwieldy when processed as a single entity.api json DCAT
The text was updated successfully, but these errors were encountered: