Create self-scaling EC2 AMI WPT Instance #310

pmeenan · 2014-09-24T22:59:28Z

The initial use case is to support the grunt-perfbudget project but this could be really useful for getting people started with private instances.

Some rough thoughts:

Instance would be fetch latest stable UI code and config on startup
Config would include definitions for all available AMI locations
On startup, read user data config information and populate settings
When a test is submitted for a location, if no agents are available it will automatically spin up an instance for the given location (limit to 1 instance for now)
Hourly check for idle agents and terminate them automatically (just under an hour since it bills hourly)
Auto-update UI and agent code code (git pull on cron?)
Optionally archive test results to S3 so they can persist beyond the server's lifetime (optionally provide a TTL on the archived items)
Allow for the UI instance to self-terminate if it hasn't been used for an extended period.

Config items that would need to be passed through user data

EC2 key and secret
S3 bucket name for archiving tests (optional)
S3 archive TTL (optional, defaults to never expiring)
Idle time before terminating web UI instance (optional - defaults to 24 hours?)
Idle time before terminating test agents (optional - defaults to 10 minutes, checked hourly)
Headless flag (prevents tests from being submitted through the UI) (optional - defaults to false?)
Instance size for test agents (optional - defaults to C3.medium)
Instance type (optional - defaults to spot)
Max spot price (optional - defaults to $0.30?)

I'm sure I'm missing a bunch but those are the basics and shouldn't be too hard to pull off.

sergeychernyshev · 2014-09-25T16:27:57Z

Sounds like a long-awaited project, can it be an AWS CloudFormation thing where you just ask "give me a copy of WebPageTest" and get it?

pmeenan · 2014-09-25T16:38:16Z

Potentially. I was going to build most of the scaling myself so you'd just spin up the server and get it all anyway. Another way to do it might be to publish the location queue lengths as custom metrics and use the auto-scaling directly to manage the instances.

sergeychernyshev · 2014-09-25T17:04:21Z

I see, so you just want to let people use your farm of servers?

pmeenan · 2014-09-25T17:16:12Z

Oh, no - quite the opposite.

The one server that they spin up would know how to launch agents as needed using their ec2 key. The key needs to be passed in to the agent anyway for S3 archiving and I already have code to manage ec2 instances so it should be easy enough to hook up.

If you spin up an instance of the "WPT Web Server" ami it would know how to launch and terminate instances in all of the locations as needed by itself. As far as users go they would just need to launch the one instance and pass a few config parameters through user data.

tkadlec · 2014-09-29T18:58:10Z

👍

Talked to a few people who all agreed this would be a nice thing to have for perfbudget. Plenty of value outside of that, as well. This fills a nice gap between the public instance and installed private instances, I think.

ceeaspb · 2014-10-02T08:59:24Z

it would be good if the ami could be built from scratch(iso) using packer.io .
in this way the same build scripts can both generate ec2 images(ami) and other platforms like vagrant for private instances. see http://www.packer.io/docs/templates/builders.html

if the scripts were structured reasonably then both ubuntu and centos for example could be supported without too much problem (he says...). think you are intending that the wet server would be linux and the agent windows?

for completeness create vm build for the nodejs agent to assist support for tests from linux/ devices.

maybe what I am talking about is a pre-ticket in a build chain, starting with packer. then this ticket remains aws/ec2 specific.

pmeenan · 2014-10-20T20:39:56Z

OK, an initial AMI is available that can auto-scale a single test agent for any of the regions where there are AMI's.

https://github.com/WPO-Foundation/webpagetest/blob/master/docs/EC2/Server%20AMI.md

I did some testing with the server on a micro instance and it was super-responsive so I think it will be a great low-cost and easy alternative for private instances. The lag for a new test agent coming online is a little annoying (5-10 minutes) but should be manageable.

The server auto-updates from trunk hourly and automatically pulls the latest agents from the public WPT so it should automatically pick up any fixes and features as we enhance it.

etcook · 2014-10-20T20:42:14Z

@pmeenan Kudos on this.

etcook · 2014-10-22T12:54:05Z

Currently, only one node per region is supported, correct?

pmeenan · 2014-10-22T14:11:28Z

Technically one node per "location" so you could have separate IE 9 and IE
10 nodes running in a given region. Supporting scaling higher is on the
list to address but given the 5-10 minute lag in spinning them up it will
need a bit of tuning. Historically I've used a configurable divisor to
determine how many test machines to spin up (up to a fixed limit). With a
divisor of 100 for example, if a given location has a work queue with 350
test runs it would spin up 3 nodes.

I'm also thinking about how to specify a minimum for a specific location so
you could keep X running at all times and ready to do work immediately with
the ability to burst higher for large workloads. Right now it's tuned very
much for reducing costs.

Feedback on expected usage patterns would be really helpful:

Amount of testing
Distribution across locations (usually using just one location for
budgeting or global testing)
Cost vs Turn-around time tradeoffs (need results within a minute and cost
is not an issue, etc)

Using on-demand pricing, keeping an instance running 24x7 would be
$50-$100/mo depending on the instance size so it might be worthwhile for a
good number of people. I just don't have a good sense for where the cost
sensitivities are since the current API on the public instance is free (but
quite limited).

On Wed, Oct 22, 2014 at 8:54 AM, E.T.Cook [email protected] wrote:

Currently, only one node per region is supported, correct?

—
Reply to this email directly or view it on GitHub
#310 (comment)
.

rarkins · 2014-11-21T09:16:39Z

For people wanting to use this on AWS for small private use and keep it running 24/7, can the whole thing be run on a single ec2 or must here be at least 2 x ec2?

pmeenan · 2014-11-21T14:05:18Z

Not with this image (which is Linux). You could take the agent AMI (Windows) and install the server code and get an all-in-one AMI but it's probably not worth the effort.

The web server can run on a micro instance without any issues and you'll probably want a medium instance for the test agents. Running 24x7 the micro instance should still be REALLY inexpensive. Decoupling the two also lets you kill test agents if they get into a bad state or change what agents you use without having to make any server changes.

etcook · 2014-11-21T14:58:10Z

@pmeenan I didn't see your previous message for some reason. We're actually looking to setup a scalable system that can bring up instances and process through potentially 100,000 entries or more, and then scale them down. As it stands, there's no reasonable way to spin up, let's say, 40 instances long enough to run through a list, and then bring them down again, because the cost at those levels is truly unsustainable for this project.

@rarkins Reasonably, you need two. However, the linux "controller" image can be run on a tiny. Only the test servers need more resources.

pmeenan · 2014-11-21T16:16:11Z

@etcook would a configurable auto-scale limit give you what you need? I already have some logic that does it for testing that I do but it works best for large batches where it spins up the instances as the queue grows and when the queue is done it shuts them down.

The logic I have doesn't do so well at figuring out that you need a consistent X number of machines (where X is lower than the top limit) for sustained testing though.

I could have a "keep X at a minimum for these locations" and "Scale any location up to Y as needed for big batches" fairly easily (could implement it next week).

etcook · 2014-11-21T16:18:01Z

@pmeenan Absolutely. In fact, I'd be ok with something even simpler than that. Just a hard # of instances to start up, which we can then bring down once the tests are complete.

Have you seen how hirefire handles this kind of instancing?

Many thanks for your work. This is huge.

pmeenan · 2014-11-26T00:02:39Z

ok, I pushed code updates that allow you to specify per-location min/max instance counts. You can also specify a scaling factor where it will spin up a new instance for every X tests in the queue (it defaults to 100).

The updates will work with the existing AMI but I also rolled out an updated AMI that uses beanstalkd for all of the low priority test queues. If you're anticipating 100,000 tests at a time you're going to want to use that one (the normal queues start to perform poorly when the queues get up to 10k+).

adamwintle · 2015-01-20T15:24:26Z

Is there a master AMI of this in the Singapore region, or only us-east-1 for now?

pmeenan · 2015-01-20T15:28:38Z

Just in us-east-1 but it should be trivial to copy. If you can't make a
copy of the us-east one let me know and I can copy it directly to different
regions.

On Tue, Jan 20, 2015 at 10:24 AM, Adam Wintle [email protected]
wrote:

Is there a master AMI of this in the Singapore region, or only us-east-1
for now?

—
Reply to this email directly or view it on GitHub
#310 (comment)
.

adamwintle · 2015-01-20T15:47:31Z

ok got it. Anyone else wondering, you can right-click any AMI and copy to another region :)

pmeenan · 2015-02-19T17:44:53Z

Closing this as done. The AMI is now available is all regions and any enhancements can be treated as regular WPT issues.

pmeenan mentioned this issue Sep 24, 2014

Reducing need for WPT API keys tkadlec/grunt-perfbudget#13

Open

pmeenan closed this as completed Feb 19, 2015

soulgalore mentioned this issue Dec 8, 2015

Question: Workflow for spinning up new agent instance on AWS #542

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create self-scaling EC2 AMI WPT Instance #310

Create self-scaling EC2 AMI WPT Instance #310

pmeenan commented Sep 24, 2014

sergeychernyshev commented Sep 25, 2014

pmeenan commented Sep 25, 2014

sergeychernyshev commented Sep 25, 2014

pmeenan commented Sep 25, 2014

tkadlec commented Sep 29, 2014

ceeaspb commented Oct 2, 2014

pmeenan commented Oct 20, 2014

etcook commented Oct 20, 2014

etcook commented Oct 22, 2014

pmeenan commented Oct 22, 2014

rarkins commented Nov 21, 2014

pmeenan commented Nov 21, 2014

etcook commented Nov 21, 2014

pmeenan commented Nov 21, 2014

etcook commented Nov 21, 2014

pmeenan commented Nov 26, 2014

adamwintle commented Jan 20, 2015

pmeenan commented Jan 20, 2015

adamwintle commented Jan 20, 2015

pmeenan commented Feb 19, 2015

Create self-scaling EC2 AMI WPT Instance #310

Create self-scaling EC2 AMI WPT Instance #310

Comments

pmeenan commented Sep 24, 2014

Some rough thoughts:

Config items that would need to be passed through user data

sergeychernyshev commented Sep 25, 2014

pmeenan commented Sep 25, 2014

sergeychernyshev commented Sep 25, 2014

pmeenan commented Sep 25, 2014

tkadlec commented Sep 29, 2014

ceeaspb commented Oct 2, 2014

pmeenan commented Oct 20, 2014

etcook commented Oct 20, 2014

etcook commented Oct 22, 2014

pmeenan commented Oct 22, 2014

rarkins commented Nov 21, 2014

pmeenan commented Nov 21, 2014

etcook commented Nov 21, 2014

pmeenan commented Nov 21, 2014

etcook commented Nov 21, 2014

pmeenan commented Nov 26, 2014

adamwintle commented Jan 20, 2015

pmeenan commented Jan 20, 2015

adamwintle commented Jan 20, 2015

pmeenan commented Feb 19, 2015