Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timebox: 5d] Catalog load- and stress-testing in cloud.gov #2701

Closed
10 of 11 tasks
adborden opened this issue Jan 29, 2021 · 7 comments
Closed
10 of 11 tasks

[timebox: 5d] Catalog load- and stress-testing in cloud.gov #2701

adborden opened this issue Jan 29, 2021 · 7 comments
Assignees
Labels
component/catalog Related to catalog component playbooks/roles Testing

Comments

@adborden
Copy link
Contributor

adborden commented Jan 29, 2021

User Story

In order to ensure sufficient capacity and performance for catalog, data.gov wants spend up to 5 days running load- and stress-testing activities on the deployment in cloud.gov.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • WHEN I perform load-testing on catalog
    THEN the sustainable request load is sufficient to handle typical traffic loads we've seen to catalog in the FCS environment
    AND sufficient to handle routine peak traffic.
  • WHEN I look at the available quota vs the memory in use
    THEN I see a comfortable margin for spinning up more instances if needed.
  • We're satisfied that any other immediate concerns that arose during the 5d are properly triaged.

Background

See notes from the previous load-testing

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
By right-sizing our app deployments, we'll be less likely to suffer self-imposed DoS conditions when normal peak traffic happens. (Actual attack is a different concern.)

Sketch

  • Implement a traffic profile in GSA/datagov-load-testing
  • Perform a baseline test with the current catalog.
  • Decide how many instances of catalog are sufficient for expected traffic levels
  • Run harvests and see that they don't fail
  • Run reindexes and see that they don't fail
  • Verify datasets with deep pagination are returning OK
  • Verify we can add/delete users
  • Verify Solr 504 Gateway Time-out #3636 is addressed
@adborden
Copy link
Contributor Author

adborden commented Feb 8, 2021

Let's consider this as part of the cloud.gov work.

@mogul mogul changed the title Inventory load testing [appname] load testing in cloud.gov Sep 16, 2021
@nickumia-reisys nickumia-reisys self-assigned this Nov 17, 2021
@jbrown-xentity jbrown-xentity changed the title [appname] load testing in cloud.gov Catalog load testing in cloud.gov Nov 19, 2021
@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Nov 22, 2021

Updates:

  • Recent access log analysis done
  • Minor optimizations to datagov-load-testing repo
  • Will perform complete load test once more data is populated in the cloud.gov environment.

@mogul
Copy link
Contributor

mogul commented Jan 20, 2022

Having recently done some tuning on Solr, we're now waiting for a Solr reindex before we conduct more tests.

@mogul mogul changed the title Catalog load testing in cloud.gov [timebox: 5d] Catalog load- and stress-testing in cloud.gov Feb 8, 2022
@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Mar 7, 2022

To summarize the state of this issue:

Total ETA for Solr Index: 59 hours
Estimated completed time: Wednesday, March 9th, morning

@nickumia-reisys
Copy link
Contributor

Update: I just realized that the current version of the solr-brokerpak doesn't have persistent storage enabled at the solrcloud level, even though eks-brokerpak supports it now... so we'll have to restart the index once the correct solr is deployed.

@mogul
Copy link
Contributor

mogul commented Mar 18, 2022

Here's a useful reference on optimizing Solr that I just rediscovered.

@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Mar 22, 2022

Concerning the most recent load testing, the data.gov team believes that our cloud.gov setup is resilient enough to migrate and then address two uncommon situations as a later time. At the current time, we are achieving upwards of 20 sustained requests per second. With certain network-capable testing environments, we've achieved ~70 requests per second.

The two problems carrying forward related to:

Deep pagination Solr search issues (#3636 and #3642)

image

Harvest source page issues (#3749)

image

Summary of Load testing results:

Sustained >30 RPM

image
image

Response time percentiles (approximated)
 Type     Name                                                                                  50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|--------------------------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 GET      api-group-list                                                                        130    130    140    150    280    300    310    330    330    330    330     55
 GET      api-organization-list                                                                 130    140    150    160    420    430    450    710    710    710    710     62
 GET      api-package-search                                                                    280    310    330    340    400    470    620    900   1900   5000   5300  23644
 GET      api-package-search-harvest                                                            190    220    230    250    300    310    320    610    610    610    610     63
 GET      api-package-show                                                                      200    220    240    250    300    360    450    610   1300   2100   3000  13043
 GET      dataset                                                                               810    920   1000   1100   1300   1500   1700   1900   2700   5600   5600   4636
 GET      dataset_search                                                                       2600   3100   3400   3600   4100   4700   5400   6000   9000  65000  65000   8271
 GET      datasets-home                                                                        2800   3200   3500   3700   4200   4700   5400   6100  14000  63000  63000   3750
 GET      group                                                                                2100   2500   2900   3100   3700   4200   4900   5500   9800  63000  63000  13953
 GET      groups-home                                                                           650    790    810    900   1300   1900   1900   2100   2100   2100   2100     52
 GET      home                                                                                 3000   3400   3600   3800   4400   4900   5600   6500  12000  64000  64000  16420
 GET      organization                                                                         2000   2500   2900   3100   3800   4500   5300   6000  10000  63000  64000  21376
 GET      organizations-home                                                                   1700   1900   2100   2200   2700   3000   3300   3700   3800   3800   3800    158
 GET      static_assets                                                                         170    210    210    220    240    260    310    350   1200   2700   4200  14950
--------|--------------------------------------------------------------------------------|---------|------|------|------|------|------|------|------|------|------|------|------|
 None     Aggregated                                                                            760   2100   2600   2800   3500   4100   4800   5400   8800  63000  65000 120433

 Name                                                                              # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 GET api-group-list                                                                    55     0(0.00%)  |     146     105     329     130  |    0.02    0.00
 GET api-organization-list                                                             62     0(0.00%)  |     182     113     705     130  |    0.02    0.00
 GET api-package-search                                                             23644     0(0.00%)  |     302     161    5337     280  |    6.57    0.00
 GET api-package-search-harvest                                                        63     0(0.00%)  |     211     149     608     190  |    0.02    0.00
 GET api-package-show                                                               13043     0(0.00%)  |     226     144    2979     200  |    3.62    0.00
 GET dataset                                                                         4636     0(0.00%)  |     885     413    5641     810  |    1.29    0.00
 GET dataset_search                                                                  8271     3(0.04%)  |    2719     186   64881    2600  |    2.30    0.00
 GET datasets-home                                                                   3750     0(0.00%)  |    3078    1609   63476    2800  |    1.04    0.00
 GET group                                                                          13953     0(0.00%)  |    2240     505   62959    2100  |    3.88    0.00
 GET groups-home                                                                       52     0(0.00%)  |     796     490    2062     650  |    0.01    0.00
 GET home                                                                           16420     1(0.01%)  |    3196     586   64159    3000  |    4.56    0.00
 GET organization                                                                   21376    11(0.05%)  |    2090     298   64382    2000  |    5.94    0.00
 GET organizations-home                                                               158     0(0.00%)  |    1787    1043    3813    1700  |    0.04    0.00
 GET static_assets                                                                  14950     0(0.00%)  |     180     100    4215     170  |    4.15    0.00
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregated                                                                        120433    15(0.01%)  |    1492     100   64881     760  |   33.46    0.00

@hkdctol hkdctol closed this as completed Mar 31, 2022
@hkdctol hkdctol added this to the Sprint 20220331 milestone Apr 14, 2022
@nickumia-reisys nickumia-reisys added component/catalog Related to catalog component playbooks/roles Testing labels Oct 7, 2023
@nickumia-reisys nickumia-reisys moved this to 🗄 Closed in data.gov team board Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/catalog Related to catalog component playbooks/roles Testing
Projects
Archived in project
Development

No branches or pull requests

4 participants