-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace DCAT-US validator #3503
Comments
So... the current state of this is that the route works in the The general flow steps are:
The problem is in the @FuhuXia pointed me to the The closest thing that is doing the validation is the
|
You're probably right, the validation is embedded in the harvester class. |
What I didn't say is: the validation that occurs in the harvester logic does seem to be solid, and has worked fine in the past. So the logic there should be able to be kept. |
Got it! That all makes sense to me. I like the idea of throwing out a bunch of stuff. My tentative plan:
🚧 |
Just for the record ➡️ I removed the |
I'm going to start documenting and cleaning up things. |
Okay. I didn't do much documentation (...yet). But here's the updates:
TODO...
"@type": "dcat:Catalog",
"describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
"conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
"@context": "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld", |
Also, do want want to change the route? @jbrown-xentity @FuhuXia @hkdctol @Jin-Sun-tts @btylerburton @jbrown-xentity As it stands right now, the public route (not available yet) would be https://catalog.data.gov/pod/validate... do we want to make it something more meaningful? |
Answering the questions in the ticket:
Per previous comments, the current implementation does work. The
Most of the old tests are useless now because they were testing that
The [GET] route displays an empty form page (with some helper text) asking for url input. The [POST] route takes only one input (i.e. 'url') and does the work to validate it.
The page already exists and it is not an api route.
There is no JSON output for the results. If we want to output that, that would be a new feature of |
I would use |
TODO:
|
Form is live at: https://catalog.data.gov/dcat-us/validator |
Note that it failed to handle https://data.doi.gov/data.json. The dashboard was inconsistent on whether it could handle it or not, it contains around 30K records and file size is around 80M. So consider anything above that to be untestable at this point, unless we really scale up. |
We will make follow up tickets to match the functionality of dashboard, and decide how important/useful those features are... Looks great! |
User Story
In order to ensure the dashboard can be deprecated, data.gov admins want a validator of DCAT-US files built into catalog (ckanext-datajson extension).
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
WHEN a URL to a file (or file) is provided with a DCAT-US format
THEN the user is provided with meaningful output of valid and invalid format
THEN my browser is redirected to the new validator location
Background
Current dashboard validator exists here.
An API route already exists (but is never used/tested) to validate pod files here.
Security Considerations (required)
Need to ensure the SSRF protections currently in the dashboard are also in this implementation.
Sketch
Would involve investigating the following:
/pod/validate
route?/pod/validate
route provide?The text was updated successfully, but these errors were encountered: