Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: RFC Implementation Supporting AWS On-Demand Capacity Reservations #1263

Closed

Conversation

tvonhacht-apple
Copy link
Contributor

Needed for RFC aws/karpenter-provider-aws#5716
Helps RFC implementation for karpenter-provider-aws aws/karpenter-provider-aws#6198

Description

RFC: aws/karpenter-provider-aws#5716

How was this change tested?

Running in personal EKS cluster

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tvonhacht-apple
Once this PR has been reviewed and has the lgtm label, please assign ellistarn for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @tvonhacht-apple. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 17, 2024
@coveralls
Copy link

Pull Request Test Coverage Report for Build 9123804018

Details

  • 6 of 6 (100.0%) changed or added relevant lines in 1 file are covered.
  • 9 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.05%) to 78.395%

Files with Coverage Reduction New Missed Lines %
pkg/test/expectations/expectations.go 2 94.68%
pkg/controllers/disruption/drift.go 2 89.29%
pkg/scheduling/requirements.go 2 97.98%
pkg/controllers/disruption/helpers.go 3 90.83%
Totals Coverage Status
Change from base Build 9115796606: -0.05%
Covered Lines: 8262
Relevant Lines: 10539

💛 - Coveralls

@sftim
Copy link

sftim commented May 23, 2024

In the PR description, I recommend mentioning AWS On-Demand Capacity Reservations (not just ODCR).

@tvonhacht-apple tvonhacht-apple changed the title feat: RFC Implementation Supporting ODCR feat: RFC Implementation Supporting AWS On-Demand Capacity Reservations May 23, 2024
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -294,6 +294,16 @@ func mapCandidates(proposed, current []*Candidate) []*Candidate {
// on an instance type. If the instance type has a spot offering available, then it uses the spot offering
// to get the launch price; else, it uses the on-demand launch price
func worstLaunchPrice(ofs []cloudprovider.Offering, reqs scheduling.Requirements) float64 {
if reqs.Get(v1beta1.CapacityTypeLabelKey).Has("capacity-reservation") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also update our cloudprovider types.go file to have an "integer" number for the available instances in the offering? Given that we are going to be scheduling using that count value, we don't want to continually overshoot the capacity that we think that we can schedule to the capacity-reservation offering

@@ -294,6 +294,16 @@ func mapCandidates(proposed, current []*Candidate) []*Candidate {
// on an instance type. If the instance type has a spot offering available, then it uses the spot offering
// to get the launch price; else, it uses the on-demand launch price
func worstLaunchPrice(ofs []cloudprovider.Offering, reqs scheduling.Requirements) float64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also think about the interactions that we have today with our current scheduling behavior and capacity reservation prioritization. Today, we are going to continually pack pods as tightly as we can on bigger and bigger instance types. This can cause us to rule out capacity reservations that would be cheaper to use if we supported a combination of spot, on-demand, and ODCR capacity. This probably works fine if we are just trying to build ODCR prioritized with on-demand/spot fallback to the same instance type sets. The second you extend the set of instance types to a combination of those that have ODCR offerings and others that don't, you start weird modeling during scheduling -- because you can overpack pods, causing us to rule out ODCR capacity and only leave behind spot and on-demand.

Considering the capacity is free -- we could choose to build affordances so that we always choose a free offering if it's available and don't continue to pack pods if it would cause us to work outside of this free ODCR offering e.g. create a new node instead of creating a bigger instance type.

Something to think about as we are considering the design options here.

@jonathan-innis
Copy link
Member

/assign jonathan-innis

Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2024
@tvonhacht-apple tvonhacht-apple deleted the feature/odcr branch July 3, 2024 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants