Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: The great split #353

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

WIP: The great split #353

wants to merge 3 commits into from

Conversation

sebhoss
Copy link
Member

@sebhoss sebhoss commented Nov 30, 2024

So in the past, this project had to deal with various size related issues, e.g. in the very first version I was running with all features enabled and quickly realized that my computer does not have enough memory. I solved this by running commands like cargo check in a loop against each feature individually instead of running against the entire project with all features enabled. This worked for a while, however once more and more custom resources were added and thus the number of features grew as well, we ran into a limit of crates.io which only allows 300 features per crate nowadays. Originally, each version in each custom resource group had its own feature, which was changed so that each group is mapped to one feature and each version in that group is one module. Additionally, I contacted the crates.io team to increase the feature limit for this project, so that I can continue to grow past 300 custom resource groups. Lately, we are running into the 10 MiB per crate size limit of crates.io. Again, I contacted the crates.io who again were very helpful and quickly increased the size limit. However, both the feature and size limits will be reached again eventually since the Kubernetes community is very active and creates more and more custom resources all the time.

The idea here is to come up with a solution to allow this project to grow indefinitely. The rather simple idea in this PR is to split all custom resource groups into their own crate. Each version within a group continues to be a module within that crate. Since each group typically only contains a handful of resources, I do not see any size related problems happening in the future (although it's possible in theory..). Each crate will be versioned and released individually and the release schedules are spread across an entire day, so that we won't hammer the crates.io API with hundreds of crates at the same time.

Open questions:

  • Should we introduce the feature-per-version from the original version of this project again? I think this can help users, since groups often expose the same custom resource in various versions but users are typically only using one version. By introducing features, users can make sure that they will only ever use the version they want to use and not accidentally use something else.
  • Is the crates.io team ok with this? We would be releasing 400+ crates and that number will only ever go up. Beside the high number of crates and the potential API bottleneck, I'm concerned about name squatting since crates.io does not have a namespace/group concept. My idea here is to prefix each crate with something that does not exist yet in crates.io and hope for the best 🤷 (the current implementation uses the kcr_ prefix which stands for kube-custom-resource). The crates.io team have asked me in the past to limit the number of releases since the high number of features in the original crate causes too much strain in their API.

Feedback highly welcomed here. I'm pinging @w3irdrobot, @waynr, @banool, @brendanhay and @orf since you guys either pushed changes in this repo or somehow interacted with it in the past and I'm assuming you are using this project. Likewise, @clux since he created kopium and maintains kube-rs. I'm going to contact the crates.io team by mail since I don't not know any of their usernames on GitHub (feel free to ping them here if you do?)

Ignore the build failures - this is WIP. If you want to see what this currently looks like open https://github.com/metio/kube-custom-resources-rs/tree/split since the diff view in this PR will be completely useless 😅

Signed-off-by: Sebastian Hoß <[email protected]>
@sebhoss sebhoss added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed github_actions Pull requests that update GitHub Actions code labels Nov 30, 2024
@sebhoss sebhoss self-assigned this Nov 30, 2024
@orf
Copy link

orf commented Dec 2, 2024

This sounds awesome @sebhoss!

Should we introduce the feature-per-version from the original version of this project again

I think that's a good idea - it seems sensible, and can reduce the work the compiler has to do as well?

On that note, this should significantly reduce compilation time for this crate. Even with only a couple of features selected, it was pretty heavyweight!

A couple of notes:

  • you can do custom-resources/* in the root workspace, so you don't need to maintain that big list of packages
  • we could use workspace dependencies rather than duplicate them? This is:
dependencies.workspace = true

In the custom resource TOML files, then add:

[dependencies]
schemars = { version = "~0" }
serde = { version = "~1" }
serde_json = { version = "~1" }
k8s-openapi = { version = "~0" }
kube = { version = "~0", features = ["derive"] }

To the workspace Cargo toml file.

If that doesn't work, you can do it individually:

[dependencies]
schemars = { workspace = true }
serde = { workspace = true }
...

You can also do the same for package attributes:

[package]
authors.workspace = true
description.workspace = true
...

This would prevent the need to re-generate code when any of the workspace metadata changes?

@orf
Copy link

orf commented Dec 2, 2024

When it comes to releasing the crates, is this really an issue? We won't be releasing all the crates every time, only when the code within changes?

If so, then this issue only applies to the first release - and we can mitigate that by just initially releasing slowly.

@orf
Copy link

orf commented Dec 2, 2024

Last comment/idea, sorry for the spam!

Rather than version/release each crate individually and per resource, we could group them by namespace instead?

I.e rather than have ~40ish *_k8s_aws crates, we could have a single crate for k8s_aws with all resources inside?

From a UX point of view, you're more likely to use CRDs in these "groups", so you probably want all of the AWS resources available to you.

This could be grouped by the last two parts of the CRD? So sqs.services.k8s.aws would become k8s.aws?

But maybe this isn't always safe, idk. At the very least it might make sense for some specific hard-coded list of groups? AWS in particular?

@sebhoss
Copy link
Member Author

sebhoss commented Dec 2, 2024

Thanks for the feedback - I'll adjust the workspace related stuff as per your comment in my next push ❤️

Wrt. releasing crates: I'm not sure if that will be an issue but I don't want to be a bad citizen on crates.io so spreading releases out a bit won't hurt? My current implementation just calculates the hash of the group name and then mods them with 24 and 60 to get hours and minutes.

I like the idea of grouping crates even further to reduce the total number of crates! Not sure if we can use the last two segments of their group though, e.g. https://github.com/metio/kube-custom-resources-rs/tree/split/crd-catalog/aws sometimes uses amazon.com or k8s.aws and even some special cases like karpenter.sh. I thought about using the GitHub/GitLab organization name to group crates (e.g. just aws for the previous example), however there are some special cases like https://github.com/metio/kube-custom-resources-rs/tree/split/crd-catalog/aws-controllers-k8s which should probably belong to the aws group as well or https://github.com/metio/kube-custom-resources-rs/tree/split/crd-catalog/apache which contains resources for multiple disjointed projects and it might be weird to group them together? That said, the crd-catalog is manually maintained anyway, so manually grouping the aws and aws-controllers-k8s orgs into aws isn't really a showstopper. Likewise, we can manually split the apache org into their individual projects.

If we go down this road, how would this work with features though? Would we have one feature for each version in each group in each project? Using https://github.com/metio/kube-custom-resources-rs/tree/split/crd-catalog/aws/amazon-cloudwatch-agent-operator/cloudwatch.aws.amazon.com/v1alpha1 as an example, would we have a feature called cloudwatch_aws_amazon_com_v1alpha1 in the aws crate? Should we add cloudwatch_aws_amazon_com as well in case someone wants to enable all versions of that group at once?

@orf
Copy link

orf commented Dec 2, 2024

Hmm, on second thought perhaps it's best to not group them. It would work for larger groups of packages, but for smaller ones it would be confusing to add a package then enable a feature with the same name?

It would also increase update churn - you're not going to be sure your specific CRDs are updated when you update a package, whereas with a single package per CRD you can be?

Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
@clux
Copy link

clux commented Dec 10, 2024

Hey! I think this is a good idea also. Less mega crates that does everything. Everywhere I go I end up inlining kopium output into some vendor-equivalent directory, and this would help with that. Group names make sense to me, provided you don't run into some crate name length limit.

My expectation on load is that given how you've set it up (only publish when there are commits touching the directory), it'll likely end being less load altogether on crates.io as there will be less stuff released, but maybe there's some common overhead you incur. Definitely see much bigger offenders in the crates ecosystem in the build queue every now and then though (like repos releasing 100+ crates at every update unconditionally; those seem to get downprioritised in the queue).

Maybe the calendar versioning (masquerading as semver) that you inject with sed should be avoided in favor of something like cargo-release or cargo-workspaces with a blanket minor release - if it's easy to install in ci via something like the taik-e/release-action. It's not like the numbers are going to be super meaningful either way, but people do write github action logic around auto-merging based on whether or not something is a major version, and this would make "one major every year" rather meaninglessly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request github_actions Pull requests that update GitHub Actions code help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants