-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden WAF ETL pipeline #4598
Comments
processing reached 12 hours for the noaa waf so i stopped it ( the conclusion being...it's gonna take awhile ). I duplicated our waf test but added a new fixture with an updated url. I didn't commit anything. considering how long it was running, I didn't see the benefit of knowing exactly how much longer it would take. the bottleneck is requesting/downloading the documents. requesting the initial page, parsing it with beautifulsoup, and getting a list of all the anchors with a populated conclusion of test
|
json with list of all waf urls |
pausing on this. more discussion on waf needed. |
User Story
In order to harvest WAF sources effectively and at scale, datagovteam would like to harden the current WAF ETL pipeline.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
[AND optionally another precondition]
WHEN [a triggering event] happens
THEN [a verifiable outcome]
[AND optionally another verifiable outcome]
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
The text was updated successfully, but these errors were encountered: