-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RFC Implementation Supporting ODCR #6198
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
5c02862
to
6867ca8
Compare
acd61c4
to
9cdb574
Compare
pkg/providers/amifamily/resolver.go
Outdated
@@ -154,12 +157,63 @@ func (r Resolver) Resolve(ctx context.Context, nodeClass *v1beta1.EC2NodeClass, | |||
maxPods: int(instanceType.Capacity.Pods().Value()), | |||
} | |||
}) | |||
|
|||
zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone) | |||
capacityReservations := []v1beta1.CapacityReservation{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can handle this within the if below, we dont need to create this variable beforehand
96951d1
to
5471bc4
Compare
pkg/cloudprovider/cloudprovider.go
Outdated
@@ -96,6 +96,9 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *corev1beta1.NodeC | |||
} | |||
instance, err := c.instanceProvider.Create(ctx, nodeClass, nodeClaim, instanceTypes) | |||
if err != nil { | |||
if cloudprovider.IsInsufficientCapacityError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already get an ICE error back, we shouldn't have to wrap the error again, when we do a check down the line, it should be able to identify that the error is an ICE error, so long as one of the wrapped errors is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my testing that wasn't working, as it becomes a normal error see line 102 where its just a string at that point
pkg/providers/amifamily/resolver.go
Outdated
|
||
zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone) | ||
capacityReservations := []v1beta1.CapacityReservation{} | ||
if capacityType == "capacity-reservation" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we select a capacity reservation NodeClaim, should we just best effort the NodeClaim launch here and then have it get deleted with an ICE error from Fleet if there isn't any available capacity. The next iteration of GetInstanceTypes should have the updated capacity reservation availability so we shouldn't try and launch with the same offering again on the second attempt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benefit of this is that it becomes visible in nodeclaims that something is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benefit of this is that it becomes visible in nodeclaims that something is happening
More that we made the decision to launch it so we should probably follow it through. I get that you are trying to be a little smarter in that we know that the launch won't succeed most likely, but, in general, I'd like to avoid introducing too much modality into the code around specific logic like this.
I'm also thinking about this more and realizing that if we don't find an available capacity reservation for this launch earlier in our Create call (there's a filterInstanceTypes that should look for available offerings) then I'd expect us to return an ICE there. If we have already gotten past that point in code, it seems like that we are safe to proceed with the launch (though we may still need to see exactly which instance type offerings have availability with ODCR when we are creating the launch templates, but I would expect this to come through the GetInstanceTypes() and not be coming through the EC2NodeClass status)
pkg/providers/amifamily/resolver.go
Outdated
zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone) | ||
capacityReservations := []v1beta1.CapacityReservation{} | ||
if capacityType == "capacity-reservation" { | ||
for _, capacityReservation := range nodeClass.Status.CapacityReservations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lo.Filter
?
pkg/providers/amifamily/resolver.go
Outdated
return nil, cloudprovider.NewInsufficientCapacityError(fmt.Errorf("trying to resolve capacity-reservation but no available capacity reservations available")) | ||
} | ||
} | ||
|
||
for params, instanceTypes := range paramsToInstanceTypes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we group paramsToInstanceTypes
above should we also group these with the capacity reservations in mind? This would allow us to keep the same logic on L178 where we iterate through the params and instance types, and I think this may work better as well since there should only be specific instance types that are valid for a given capacity reservation
@@ -71,6 +71,12 @@ cat << EOF > controller-policy.json | |||
"Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER_NAME}", | |||
"Sid": "EKSClusterEndpointLookup" | |||
}, | |||
{ | |||
"Effect": "Allow", | |||
"Action": "eks:DescribeCapacityReservations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Action": "eks:DescribeCapacityReservations", | |
"Action": "ec2:DescribeCapacityReservations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -48,6 +48,7 @@ var ( | |||
"UnfulfillableCapacity", | |||
"Unsupported", | |||
"InsufficientFreeAddressesInSubnet", | |||
"ReservationCapacityExceeded", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we return this type of ICE error, should we short-circuit and update the capacity reservation that we launched with in-place so we don't have to wait for another iteration of the capacity reservation polling to update the instance availability.
If we didn't want to directly update this to 0, we could also use this as a trigger to re-call DescribeCapacityReservations since we know that something has changed since we made the launch decision originally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think setting it to zero at this time would be ok, then it eventually will refetch anyways, but this would let other evaluation happening at same time, get it immediately too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I guess it all sort-of comes out in the wash. I'm a little skeptical that we can model this with the existing ICE cache on its own. Minimally, we need to update the ICE cache to reflect the fact that the ICE is only for that particular capacity reservation. Without that, we'd either not try the OD instance in that AZ at all OR we wouldn't try other capacity reservations assigned to that instance type in that zone. Either way, we aren't really reacting correctly to what the CreateFleet call is telling us here.
b7e2828
to
0f3ab47
Compare
b8512cf
to
1089159
Compare
pkg/providers/amifamily/resolver.go
Outdated
|
||
zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone) | ||
capacityReservations := []v1beta1.CapacityReservation{} | ||
if capacityType == "capacity-reservation" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benefit of this is that it becomes visible in nodeclaims that something is happening
More that we made the decision to launch it so we should probably follow it through. I get that you are trying to be a little smarter in that we know that the launch won't succeed most likely, but, in general, I'd like to avoid introducing too much modality into the code around specific logic like this.
I'm also thinking about this more and realizing that if we don't find an available capacity reservation for this launch earlier in our Create call (there's a filterInstanceTypes that should look for available offerings) then I'd expect us to return an ICE there. If we have already gotten past that point in code, it seems like that we are safe to proceed with the launch (though we may still need to see exactly which instance type offerings have availability with ODCR when we are creating the launch templates, but I would expect this to come through the GetInstanceTypes() and not be coming through the EC2NodeClass status)
pkg/apis/v1/ec2nodeclass.go
Outdated
// Reservation. This ensures that only permitted instances can use | ||
// the reserved capacity. | ||
// +optional | ||
Type string `json:"type,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an enum
and we validate through kubebuilder OpenAPI that this is only one of the valid values. See https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/apis/v1beta1/nodepool.go#L76 for an example of this
pkg/apis/v1/ec2nodeclass_status.go
Outdated
// * limited - The Capacity Reservation expires automatically at a specified | ||
// date and time. | ||
// +required | ||
EndDateType string `json:"endDateType"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be implied from the presence or lack of the end date time?
pkg/apis/v1/ec2nodeclass_status.go
Outdated
// launch instances into it. The Capacity Reservation's state changes to expired | ||
// when it reaches its end date and time. | ||
// +optional | ||
EndDate *string `json:"endDate,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EndDate *string `json:"endDate,omitempty"` | |
EndDate *time.Time `json:"endDate,omitempty"` |
Can this map to the actual go type?
offering.Requirements.Add(scheduling.NewRequirement(v1beta1.LabelTopologyZoneID, v1.NodeSelectorOpIn, subnet.ZoneID)) | ||
} | ||
// TODO: tvonhacht | ||
offering.Requirements.Add(scheduling.NewRequirement(v1beta1.LabelCapactiyReservationID, v1.NodeSelectorOpIn, capacityReservation.ID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we make sure that we don't create duplicate offerings for multiple ODCRs that have the same zone and instance type?
offering.Requirements.Add(scheduling.NewRequirement(v1beta1.LabelCapactiyReservationID, v1.NodeSelectorOpIn, capacityReservation.ID)) | ||
log.FromContext(ctx).WithValues("instanceType", *instanceType.InstanceType, "capacityReservation", capacityReservation.ID, "offering", offering).V(0).Info("offering for capacity reservation and instanceType") | ||
offerings = append(offerings, offering) | ||
instanceTypeOfferingAvailable.With(prometheus.Labels{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we punt on metrics for capacity reservations for instance type availability/price right now. This gets trickier since there's now no longer a concept of "global" availability and price but now availability is dictated on a per NodePool-basis, since the EC2NodeClass dictates the offerings that are surfaced for each one. Maybe we need to add a NodePool dimension to make this more accurate, but this is going to increase the cardinality even more than we already have (and this is a pretty high cardinality metric to begin with)
capacityTypeLabel: capacityType, | ||
zoneLabel: zone, | ||
}).Set(float64(lo.Ternary(available, 1, 0))) | ||
instanceTypeOfferingPriceEstimate.With(prometheus.Labels{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here. Maybe worth punting on the metric since it's going to make things more challenging to reason about
@@ -64,6 +64,19 @@ var ( | |||
zoneLabel, | |||
}, | |||
) | |||
instanceTypeOfferingAvailableCapacityReservation = prometheus.NewGaugeVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we punt on this too. Ideally, we can just model this through our standard offering availabiliyt
@@ -58,43 +58,58 @@ type Provider interface { | |||
DeleteAll(context.Context, *v1beta1.EC2NodeClass) error | |||
InvalidateCache(context.Context, string, string) | |||
ResolveClusterCIDR(context.Context) error | |||
GetCapacityReservationID(launchTemplateName string) *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if rather than relying on a launch template lookup if we can just know the map of instance type/az -> capacity reservation id during launch and then know that if a given instance type/az was chosen that it is a particular capacity reservation id. This might reduce cross-provider look-ups and additional dependencies
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
8a3d6bf
to
8b2eb11
Compare
8b2eb11
to
28e98c0
Compare
This is a collaboration on implementing the RFC #5716 Supporting ODCRs
Progress
TODOs
instanceMatchCriteria
vstype
Description
Supporting associating ODCR to EC2NodeClass
Add a new field
capacityReservationSelectorTerms
toEC2NodeClass
Adding new label
karpenter.k8s.aws/capacity-reservation-id
nodeClaim/nodeIn order for Karpenter (and admins) to understand if a NodeClaim/Node is part of an ODCR, Karpenter is adding the annotation
karpenter.k8s.aws/capacity-reservation-id
containing the Capacity Reservation Id (for examplecr-12345678
)This follows closely (does not implement all fields) how EC2 DescribeCapacityReservations can filter.
instanceMatchCriteria
is calledtype
Karpenter will perform validation against the spec to ensure there isn't any violation prior to creating the LaunchTemplates.
How was this change tested?
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.