Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Load and Performance Doc #2140

Merged
merged 6 commits into from
Mar 16, 2017
Merged

Conversation

rachelwhitton
Copy link
Member

@rachelwhitton rachelwhitton commented Jan 26, 2017

Closes #1851
Replaces #2109

Effect

PR includes the following changes:

  • Rewrite of performance test documentation

Todo:

  • Peer review from another EOM (@aeligature?) and/or @ari-gold
  • Additional copy review from @alexfornuto following peer review

###Warning {.info}
We do not recommend load testing on the Live environment if the site has already launched because you risk overwhelming your live site and causing downtime.
</div>
Note the start time for the test. As the test executes, it's a good idea to keep a close eye on [log files](/docs/logs). Make note of any errors and warnings that pop up during test to fix.
Copy link
Member Author

@rachelwhitton rachelwhitton Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentekwork Which log files should I watch? Do they differ based on the type of test being run?

3. Determine how much load to apply for your test.

* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users.
* **Load Tests**: Determine how many concurrent users the site is expected to serve based on historical analytics for the site. Identify the peak hourly sessions and average session duration, then do some math: `hourly_sessions / (60 / average_duration) = Concurrent Users`
Copy link
Member Author

@rachelwhitton rachelwhitton Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentekwork How do I determine load to apply in the test after calculating concurrent users?

Copy link
Contributor

@adamedgmond adamedgmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems thorough on first and second reading.

Copy link
Contributor

@ari-gold ari-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great update! This is coming along nicely. Added a few comments. Would be happy to review again.

### Performance Testing
Performance testing is the process in which you measure an application's response time to proactively expose bottlenecks. These tests should be regularly executed as part of routine maintenance. Additionally, you should run these test before any load testing. If your application is not performing well, then you can be assured that the load test will not go well.

The scope of performance tests should be limited to the application itself on a development environment (Dev or [Multidev](/docs/multidev)) without caching. This will give you an honest look into your application and show exactly how uncached requests will perform. You can bypass cache by [setting the `no-cache` HTTP headers](/docs/cache-control) in responses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offer alternatives to bypass cache by setting a no-cache header? How about just disabling cache completely on Dev/Multidev during testing through Drupal/WordPress Admin UI?

Copy link
Contributor

@obicke obicke Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the dev environment has a default time-to-live of zero for dev, which implies no caching, but that things like Pantheon Advanced Page Cache may override this to be non-zero value. While a no-cache header may help, this may depend on when this get executed. Suggesting to disable caching via the UI is an option, with an emphasis to remember to re-enable prior to pushing to prod.

### Load Testing
Load testing is the process in which you apply requests to your site that will represent the most load that your site will face once it is live. This test will ensure that the site can withstand the peak traffic spikes after launch. This test should be done on the Live environment before the site has launched, after performance testing.

If your site is already live, then you should run load tests on the Test environment. Keep in mind that the Test environment has one application container, while Live environments on sites with a service level of Business and above can have multiple application containers serving the site. So try to run a proportionate amount of traffic based on how many containers you currently have on your Live environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offer concrete example with math?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EOM team is the best source for the algorithm we use.

3. Determine how much load to apply.

* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users.
* **Load Tests**: Determine how many concurrent users the site is expected to serve based on historical analytics for the site. Identify the peak hourly sessions and average session duration, then do some math: `hourly_sessions / (60 / average_duration) = Concurrent Users`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reiterate difference between load test on Live vs non-live, and include app containers in calculation for scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load tests should not be run on Test, rather performance test can/should be run there. In terms of providing formulas, it is complicated by the fact that to run "proportionate amount of traffic" on Test involves knowing the number of appservers on Live, which clients can't determine on their own (other than asking Support, or looking at New Relic, which will include decommissioned appservers for some time).


Finally, review the **Error analytics** tab in New Relic. PHP errors often indicate huge performance bottlenecks. If you have errors, fix them.

### Calculating Load Capacity After Launch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we highlight this scenario? And flesh it out with concrete example explaining how to collect RPM and response time from New Relic?

## Load vs Performance Testing
Before you start, it's important to understand the difference between load and performance testing and know when to use each.
### Performance Testing
Performance testing is the process in which you measure an application's response time to proactively expose bottlenecks. These tests should be regularly executed as part of routine maintenance. Additionally, you should run these test before any load testing. If your application is not performing well, then you can be assured that the load test will not go well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should these tests be run regularly as part of routine maintenance? To ensure performance doesn't degrade with a code or configuration change?

Copy link
Contributor

@obicke obicke Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'd favor suggesting that clients:

  • "refer to New Relic reports regularly to identify improvements or degradation of performance
  • "perform performance test occasionally to proactively exposed potential bottlenecks and to identify opportunities for optimization" and to
  • perform load tests in advance of anticipated major-traffic events, or prior to launching sites after major overhauls, remembering to provide enough time to fix any issues identified".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some of these notions.

* [Jmeter](http://jmeter.apache.org)
* [Locust](http://locust.io/)

 The Pantheon onboarding team uses Locust, an open source load testing tool. Locust makes it easy to build out test scripts, and it allows you to crawl the site instead of using predefined URLs. Crawling the site has the added benefit of loading every page that is linked to anywhere on the site. This exposes edge case performance bottlenecks that would have gone undetected under tests with predifined URLs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"makes it easy" -- link to example script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EOM team should be asked to update this section.


 The Pantheon onboarding team uses Locust, an open source load testing tool. Locust makes it easy to build out test scripts, and it allows you to crawl the site instead of using predefined URLs. Crawling the site has the added benefit of loading every page that is linked to anywhere on the site. This exposes edge case performance bottlenecks that would have gone undetected under tests with predifined URLs.

 Ultimately, it doesn't matter what tool you use as long as you to test your site properly. Be sure to allow for any authenticated traffic as well as anonymous.  
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Be sure to allow for any authenticated traffic as well as anonymous" - Not sure we should just assert this in passing. Load testing authenticated users can be difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that authenticated user testing is a complex task and thus the generic statement should be along the lines of "It is important for Load Testing to test against the anticipated traffic patterns of the site, both in terms of traffic volume and authenticated/anonymous proportion. Note that testing authenticated workflows is considerably more complex requiring more time, skills and iterations."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited


3. Determine how much load to apply.

* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10-20? A single request can give you all you need, no?
We should explain how to use tools like Google Dev Tools for website performance optimization or at least link to resources like:
https://hpbn.co/
https://www.udacity.com/course/website-performance-optimization--ud884

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, you want to generate more than single request to tease out potential bottlenecks.

Also, I know that we have a Quicksilver example that will use free loader.io account to automatically run this level of test on each push to Test environment. Not only does this result in automated testing procedures, it provides a standard profile that you can see in New Relic. Here's a related link, but we need better: pantheon-systems/quicksilver-examples#110

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is good to go (i.e. no edits needed). A separate issue should be created, if/when we want to include reference to the loader.io Quicksilver example.


High-performance is the ability to deliver a page in under a second; scalability is the ability to deliver that page in under a second for many requests. It's important to understand the difference between these two dimensions and that there are trade-offs between performance and scalability.

## Verify Varnish is Working
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like verifying Varnish is working is still important before doing a load test? Maybe this can be more concise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case now that Global CDN is in place?

@rachelwhitton
Copy link
Member Author

We're going to deploy this as an iterative improvement and circle back to address suggestions not implemented here. See #2251 to track

@rachelwhitton rachelwhitton merged commit 63a07f0 into master Mar 16, 2017
@alexfornuto alexfornuto deleted the bentekwork-load-test-1851 branch October 13, 2017 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants