-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[timebox: 5d] Catalog load- and stress-testing in cloud.gov #2701
Comments
Let's consider this as part of the cloud.gov work. |
Updates:
|
Having recently done some tuning on Solr, we're now waiting for a Solr reindex before we conduct more tests. |
To summarize the state of this issue:
Total ETA for Solr Index: 59 hours |
Update: I just realized that the current version of the solr-brokerpak doesn't have persistent storage enabled at the solrcloud level, even though eks-brokerpak supports it now... so we'll have to restart the index once the correct solr is deployed. |
Here's a useful reference on optimizing Solr that I just rediscovered. |
Concerning the most recent load testing, the data.gov team believes that our cloud.gov setup is resilient enough to migrate and then address two uncommon situations as a later time. At the current time, we are achieving upwards of 20 sustained requests per second. With certain network-capable testing environments, we've achieved ~70 requests per second. The two problems carrying forward related to: Deep pagination Solr search issues (#3636 and #3642)Harvest source page issues (#3749)Summary of Load testing results:Sustained >30 RPM
|
User Story
In order to ensure sufficient capacity and performance for catalog, data.gov wants spend up to 5 days running load- and stress-testing activities on the deployment in cloud.gov.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
THEN the sustainable request load is sufficient to handle typical traffic loads we've seen to catalog in the FCS environment
AND sufficient to handle routine peak traffic.
THEN I see a comfortable margin for spinning up more instances if needed.
Background
See notes from the previous load-testing
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
By right-sizing our app deployments, we'll be less likely to suffer self-imposed DoS conditions when normal peak traffic happens. (Actual attack is a different concern.)
Sketch
Verify datasets with deep pagination are returning OKThe text was updated successfully, but these errors were encountered: