Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve gpu-provisioner based on sigs.k8s.io/karpenter #185

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

rambohe-ch
Copy link
Collaborator

  1. upgrade CRD: Machine to NodeClaim
  2. update aws/karpenter-core to sigs.k8s.io/karpenter
  3. add webhook for v1beta1.NodeClaim and v1.NodeClaim conversion
  4. add instance garbage collection controller for cleanuping leaked cloud provider instance and node.
  5. remove unused files like sku, pricing, instancetype, etc.
  6. improve nodeclaim launch error cases: if the return error is InvalidParameterError, LocationRestrictionError or InsufficientCapacityError, [nodeclaim launch] controller will publish a warning event, then delete the nodeclaim because of these errors are not recoverable, so it is not necessary to retry create agentpool.
  7. add unit test cases

@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2024

Codecov Report

Attention: Patch coverage is 66.99029% with 34 lines in your changes missing coverage. Please review.

Project coverage is 71.26%. Comparing base (46e16e0) to head (a00cb7c).

Files with missing lines Patch % Lines
pkg/cloudprovider/cloudprovider.go 55.73% 21 Missing and 6 partials ⚠️
pkg/providers/instance/instance.go 82.50% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #185       +/-   ##
===========================================
+ Coverage   57.26%   71.26%   +13.99%     
===========================================
  Files           5        4        -1     
  Lines         454      435       -19     
===========================================
+ Hits          260      310       +50     
+ Misses        185      107       -78     
- Partials        9       18        +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 7f97f84 to 3304c0b Compare December 8, 2024 12:07
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 3304c0b to da1b0a0 Compare December 8, 2024 12:12
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from da1b0a0 to 27b96e1 Compare December 8, 2024 12:18
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 27b96e1 to 5f2d0cc Compare December 8, 2024 12:35
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 5f2d0cc to 6f10082 Compare December 8, 2024 12:50
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 6f10082 to ad329ee Compare December 9, 2024 02:55
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from ad329ee to 4e26221 Compare December 9, 2024 04:00
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 4e26221 to a7a8f5b Compare December 9, 2024 05:05
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from a7a8f5b to 9acc2b7 Compare December 9, 2024 05:47
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from 9acc2b7 to f739e3a Compare December 9, 2024 06:54
@rambohe-ch rambohe-ch changed the title feat: improve gpu-provisioner based on sigs.k8s.io/karpenter [WIP]feat: improve gpu-provisioner based on sigs.k8s.io/karpenter Dec 9, 2024
1. upgrade CRD: Machine to NodeClaim
2. update aws/karpenter-core to sigs.k8s.io/karpenter
3. add webhook for v1beta1.NodeClaim and v1.NodeClaim conversion
4. add instance garbage collection controller for cleanuping leaked cloud provider instance and node.
5. remove unused files like sku, pricing, instancetype, etc.
6. improve nodeclaim garbage collection: launch error cases: if node was not ready for more than 10min, we recognize this node crashed and delete nodeclaim for triggering to create a new node.
7. add unit test cases

Signed-off-by: rambohe-ch <rambohe.ch@gmail.com>
@rambohe-ch rambohe-ch force-pushed the upgrade-to-node-claim branch from f739e3a to a00cb7c Compare December 9, 2024 11:40
@rambohe-ch rambohe-ch changed the title [WIP]feat: improve gpu-provisioner based on sigs.k8s.io/karpenter feat: improve gpu-provisioner based on sigs.k8s.io/karpenter Dec 9, 2024
@Fei-Guo Fei-Guo merged commit 6899cba into main Dec 9, 2024
6 of 7 checks passed
@Fei-Guo Fei-Guo deleted the upgrade-to-node-claim branch December 9, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants