Skip to content
This repository has been archived by the owner on Jul 16, 2020. It is now read-only.

resource estimation #101

Open
tpepper opened this issue May 5, 2016 · 1 comment
Open

resource estimation #101

tpepper opened this issue May 5, 2016 · 1 comment

Comments

@tpepper
Copy link

tpepper commented May 5, 2016

User workloads will request some amount of resource, eg: 4 vCPU, or 8GB RAM, etc. For long lived or frequently run workloads, comparing the requested resource amount versus actual usage allows us to establish trends and act on that information for better cloud performance. Eg:

The workload may not actually use all of the requested resource, in which case knowing this trend enables us to more successfully overcommit.

The workload may use all of the requested resource, in which case the user could be informed that allocating more resource may allow their workload to run more efficiently.

We currently report resource usage over time from launcher per workload to controller, but don't do analysis and don't feed that analysis into user facing info, don't use the data to impact scheduler placement, and don't use the data to trigger opportunistic actions at launcher level (eg: there are all manner of technologies available to proactively implement QoS or opportunistically reclaim unused resources).

@tpepper
Copy link
Author

tpepper commented May 5, 2016

As with issue #100 we likely should add an optional config parameter for memory overcommit. Unlike CPU which is renewable and mostly non-fatal (ie: things run, just slower), workloads will page in memory on use. An 8GB VM will not necessarily consume all of that 8GB. Tracking workloads over time can allow us to measure real versus requested resource usage. A RAM overcommit knob would allow the gap of unused resource to be more safely overcommitted to other workloads. The risk with RAM overcommit is workloads failing when paging fails, or workloads running from swap instead of RAM and the horrible performance that comes with it. Ie: RAM overcommit without the feedback loop of guidance from a resource estimation analysis system (and without any evacuation/migration support) is dangerous.

@kaccardi kaccardi added the P2 label Jun 6, 2016
@amyleeland amyleeland modified the milestones: Sprint 2, Sprint 3 Jun 9, 2016
@tpepper tpepper modified the milestones: Later (2017), Sprint 2 Jul 19, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants