Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create self-scaling EC2 AMI WPT Instance #310

Closed
pmeenan opened this issue Sep 24, 2014 · 20 comments
Closed

Create self-scaling EC2 AMI WPT Instance #310

pmeenan opened this issue Sep 24, 2014 · 20 comments

Comments

@pmeenan
Copy link
Contributor

pmeenan commented Sep 24, 2014

The initial use case is to support the grunt-perfbudget project but this could be really useful for getting people started with private instances.

Some rough thoughts:

  • Instance would be fetch latest stable UI code and config on startup
  • Config would include definitions for all available AMI locations
  • On startup, read user data config information and populate settings
  • When a test is submitted for a location, if no agents are available it will automatically spin up an instance for the given location (limit to 1 instance for now)
  • Hourly check for idle agents and terminate them automatically (just under an hour since it bills hourly)
  • Auto-update UI and agent code code (git pull on cron?)
  • Optionally archive test results to S3 so they can persist beyond the server's lifetime (optionally provide a TTL on the archived items)
  • Allow for the UI instance to self-terminate if it hasn't been used for an extended period.

Config items that would need to be passed through user data

  • EC2 key and secret
  • S3 bucket name for archiving tests (optional)
  • S3 archive TTL (optional, defaults to never expiring)
  • Idle time before terminating web UI instance (optional - defaults to 24 hours?)
  • Idle time before terminating test agents (optional - defaults to 10 minutes, checked hourly)
  • Headless flag (prevents tests from being submitted through the UI) (optional - defaults to false?)
  • Instance size for test agents (optional - defaults to C3.medium)
  • Instance type (optional - defaults to spot)
  • Max spot price (optional - defaults to $0.30?)

I'm sure I'm missing a bunch but those are the basics and shouldn't be too hard to pull off.

@sergeychernyshev
Copy link

Sounds like a long-awaited project, can it be an AWS CloudFormation thing where you just ask "give me a copy of WebPageTest" and get it?

@pmeenan
Copy link
Contributor Author

pmeenan commented Sep 25, 2014

Potentially. I was going to build most of the scaling myself so you'd just spin up the server and get it all anyway. Another way to do it might be to publish the location queue lengths as custom metrics and use the auto-scaling directly to manage the instances.

@sergeychernyshev
Copy link

I see, so you just want to let people use your farm of servers?

@pmeenan
Copy link
Contributor Author

pmeenan commented Sep 25, 2014

Oh, no - quite the opposite.

The one server that they spin up would know how to launch agents as needed using their ec2 key. The key needs to be passed in to the agent anyway for S3 archiving and I already have code to manage ec2 instances so it should be easy enough to hook up.

If you spin up an instance of the "WPT Web Server" ami it would know how to launch and terminate instances in all of the locations as needed by itself. As far as users go they would just need to launch the one instance and pass a few config parameters through user data.

@tkadlec
Copy link
Contributor

tkadlec commented Sep 29, 2014

👍

Talked to a few people who all agreed this would be a nice thing to have for perfbudget. Plenty of value outside of that, as well. This fills a nice gap between the public instance and installed private instances, I think.

@ceeaspb
Copy link

ceeaspb commented Oct 2, 2014

it would be good if the ami could be built from scratch(iso) using packer.io .
in this way the same build scripts can both generate ec2 images(ami) and other platforms like vagrant for private instances. see http://www.packer.io/docs/templates/builders.html

if the scripts were structured reasonably then both ubuntu and centos for example could be supported without too much problem (he says...). think you are intending that the wet server would be linux and the agent windows?

for completeness create vm build for the nodejs agent to assist support for tests from linux/ devices.

maybe what I am talking about is a pre-ticket in a build chain, starting with packer. then this ticket remains aws/ec2 specific.

@pmeenan
Copy link
Contributor Author

pmeenan commented Oct 20, 2014

OK, an initial AMI is available that can auto-scale a single test agent for any of the regions where there are AMI's.

https://github.com/WPO-Foundation/webpagetest/blob/master/docs/EC2/Server%20AMI.md

I did some testing with the server on a micro instance and it was super-responsive so I think it will be a great low-cost and easy alternative for private instances. The lag for a new test agent coming online is a little annoying (5-10 minutes) but should be manageable.

The server auto-updates from trunk hourly and automatically pulls the latest agents from the public WPT so it should automatically pick up any fixes and features as we enhance it.

@etcook
Copy link

etcook commented Oct 20, 2014

@pmeenan Kudos on this.

@etcook
Copy link

etcook commented Oct 22, 2014

Currently, only one node per region is supported, correct?

@pmeenan
Copy link
Contributor Author

pmeenan commented Oct 22, 2014

Technically one node per "location" so you could have separate IE 9 and IE
10 nodes running in a given region. Supporting scaling higher is on the
list to address but given the 5-10 minute lag in spinning them up it will
need a bit of tuning. Historically I've used a configurable divisor to
determine how many test machines to spin up (up to a fixed limit). With a
divisor of 100 for example, if a given location has a work queue with 350
test runs it would spin up 3 nodes.

I'm also thinking about how to specify a minimum for a specific location so
you could keep X running at all times and ready to do work immediately with
the ability to burst higher for large workloads. Right now it's tuned very
much for reducing costs.

Feedback on expected usage patterns would be really helpful:

  • Amount of testing
  • Distribution across locations (usually using just one location for
    budgeting or global testing)
  • Cost vs Turn-around time tradeoffs (need results within a minute and cost
    is not an issue, etc)

Using on-demand pricing, keeping an instance running 24x7 would be
$50-$100/mo depending on the instance size so it might be worthwhile for a
good number of people. I just don't have a good sense for where the cost
sensitivities are since the current API on the public instance is free (but
quite limited).

On Wed, Oct 22, 2014 at 8:54 AM, E.T.Cook [email protected] wrote:

Currently, only one node per region is supported, correct?


Reply to this email directly or view it on GitHub
#310 (comment)
.

@rarkins
Copy link

rarkins commented Nov 21, 2014

For people wanting to use this on AWS for small private use and keep it running 24/7, can the whole thing be run on a single ec2 or must here be at least 2 x ec2?

@pmeenan
Copy link
Contributor Author

pmeenan commented Nov 21, 2014

Not with this image (which is Linux). You could take the agent AMI (Windows) and install the server code and get an all-in-one AMI but it's probably not worth the effort.

The web server can run on a micro instance without any issues and you'll probably want a medium instance for the test agents. Running 24x7 the micro instance should still be REALLY inexpensive. Decoupling the two also lets you kill test agents if they get into a bad state or change what agents you use without having to make any server changes.

@etcook
Copy link

etcook commented Nov 21, 2014

@pmeenan I didn't see your previous message for some reason. We're actually looking to setup a scalable system that can bring up instances and process through potentially 100,000 entries or more, and then scale them down. As it stands, there's no reasonable way to spin up, let's say, 40 instances long enough to run through a list, and then bring them down again, because the cost at those levels is truly unsustainable for this project.

@rarkins Reasonably, you need two. However, the linux "controller" image can be run on a tiny. Only the test servers need more resources.

@pmeenan
Copy link
Contributor Author

pmeenan commented Nov 21, 2014

@etcook would a configurable auto-scale limit give you what you need? I already have some logic that does it for testing that I do but it works best for large batches where it spins up the instances as the queue grows and when the queue is done it shuts them down.

The logic I have doesn't do so well at figuring out that you need a consistent X number of machines (where X is lower than the top limit) for sustained testing though.

I could have a "keep X at a minimum for these locations" and "Scale any location up to Y as needed for big batches" fairly easily (could implement it next week).

@etcook
Copy link

etcook commented Nov 21, 2014

@pmeenan Absolutely. In fact, I'd be ok with something even simpler than that. Just a hard # of instances to start up, which we can then bring down once the tests are complete.

Have you seen how hirefire handles this kind of instancing?

Many thanks for your work. This is huge.

@pmeenan
Copy link
Contributor Author

pmeenan commented Nov 26, 2014

ok, I pushed code updates that allow you to specify per-location min/max instance counts. You can also specify a scaling factor where it will spin up a new instance for every X tests in the queue (it defaults to 100).

The updates will work with the existing AMI but I also rolled out an updated AMI that uses beanstalkd for all of the low priority test queues. If you're anticipating 100,000 tests at a time you're going to want to use that one (the normal queues start to perform poorly when the queues get up to 10k+).

@adamwintle
Copy link

Is there a master AMI of this in the Singapore region, or only us-east-1 for now?

@pmeenan
Copy link
Contributor Author

pmeenan commented Jan 20, 2015

Just in us-east-1 but it should be trivial to copy. If you can't make a
copy of the us-east one let me know and I can copy it directly to different
regions.

On Tue, Jan 20, 2015 at 10:24 AM, Adam Wintle [email protected]
wrote:

Is there a master AMI of this in the Singapore region, or only us-east-1
for now?


Reply to this email directly or view it on GitHub
#310 (comment)
.

@adamwintle
Copy link

ok got it. Anyone else wondering, you can right-click any AMI and copy to another region :)

@pmeenan
Copy link
Contributor Author

pmeenan commented Feb 19, 2015

Closing this as done. The AMI is now available is all regions and any enhancements can be treated as regular WPT issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants