Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP for git based dependencies #321

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
114 changes: 114 additions & 0 deletions hips/hip-00NN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
hip: "00NN"
title: "Support Git protocol for installing chart dependencies"
authors: [ "Jeff Valore (@rally25rs)", "yxxhero <[email protected]>", "Dominykas Blyžė <[email protected]>", "George Jenkins <[email protected]>" ]
created: "2021-10-05"
type: "feature"
status: "draft"
---

## Abstract

Currently, Helm supports installing dependencies from various registries, however that requires the charts to be published there. There are use cases which are cumbersome when registries are involved, for example:

- private charts for organizations/individuals who do not have the capacity to maintain the infrastructure required to run a private registry
- testing work-in-progress charts in their umbrella charts

## Motivation

There are existing ways to achieve installation of charts without using a registry, however they are not very user-friendly and require additional tooling.

- A Helm chart repository is effectively an `index.yaml` with links to downloads. Maintaining such a repository does create a burden of scripts and automation (e.g. Github Actions). This is not always feasible for smaller projects. It also does not really offer an obvious and readily available way of testing pre-releases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: OCI support does alleviate some of these issues with needing to maintain a chart repository. But doesn't really help for e.g the "testing" use case below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and Github itself supports OCI (although I haven't tried publishing charts into it...) - it would be great if there was a shared workflow or a standardized Github action to publish Helm charts for maintainers to use...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or a standardized Github action to publish Helm charts for maintainers to use

Good point. I will double check nothing exists, and add an issue on Helm if not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/appany/helm-oci-chart-releaser/tree/main -- there does appear to be at least this (community based) action

- For the testing use cases, charts can be packaged using `helm package`, however this does introduce manual steps and requires extra work to replicate in CI/CD scenarios.
- There are several plugins available to solve this problem ([`aslafy-z/helm-git`](https://github.com/aslafy-z/helm-git), [`diwakar-s-maurya/helm-git`](https://github.com/diwakar-s-maurya/helm-git), [`sagansystems/helm-github`](https://github.com/sagansystems/helm-github)), however using plugins requires additional setup, which may not always be feasible (esp. in more complex team structures and cluster setups with advanced tooling, e.g. ArgoCD). An additional drawback is that the `Chart.yaml` does not provide a way to specify the plugin requirements, which leaves it up to the consumer to figure this out.

## Rationale

Installing dependencies from git is an established pattern in other ecosystems even when they are registry-based, e.g. npm (Node.js), pipenv (Python), bundler (Gem) have this option - it would make sense to have the behavior replicated in Helm.

At least the npm and the pip ecosystems have already established a syntax of `vcs+protocol` for defining the dependency source, so it should be familiar to some users.

One alternative to consider would be to exclude defining the vcs/protocol, similar to what Go does, esp. given that Helm is built using Go. This does limit the flexibility somewhat - while Go does allow adding a VCS qualifier at the end of the URL (allowing future support for other VCSs), it does not allow specifying the protocol, which means that the users might have to override the default protocol in their VCS configuration.

## Specification

The `Chart.yaml` should support the following format for `dependencies`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One curiosity I think we need to think about. Is for the "testing" use case, a user might want to refer to a git repository as a dependency. But they probably want to refer to a registry/OCI repository for non-testing.

The workflow of manually updating dependencies post-testing is possible of course. But it can be awkward. UX could be improved later too of course.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've definitely experienced this problem when using npm. In the end, we wrote tooling which does the dependency overrides in CI (via params and conventions), so that we don't have to commit things like this into git, as well as validators to fail release builds in case they find git based dependencies.

Would it make sense to print a warning during helm dep build / helm dep up? I don't think it's done for the file:// based ones, although the use cases are probably different.

helm package could also pay attention to that and require a flag, although I'm not sure if the current implementation even validates what's under chart/* before doing the work? I just deleted a file inside the tgz under charts and it didn't flinch.

The "How to teach this" section does mention that the docs should explicitly recommend registries over git - probably a place for some notes there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to print a warning during helm dep build / helm dep up?

I think this is more of a lint scenario. But curious to what others thoughts might be...

To me, it is not explicit that a git dependency can not be used for production. But that you probably should not (especially not a mutable reference).

OTOH, we could decide to be cautious I think. e.g. helm push/package could refuse to publish a chart with a git dependency. And then folk can argue if that should be relaxed. Once we allow something, the horse has bolted, and we can't rescind that behavior (even if we later think it is a terrible idea).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a linting rule should print a warning - adding that to the HIP: c16a922

I'm unconvinced about failing to package/push. I can see how a warning could be useful, but having to use a flag to publish might break people's CI scenarios? Is there an established way to solicit more feedback on this?

Copy link
Member

@gjenkins8 gjenkins8 Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unconvinced about failing to package/push. I can see how a warning could be useful, but having to use a flag to publish might break people's CI scenarios?

My sole thought is that it is easy to be cautious initially, and error on publish/push with git dependencies. Then upon feedback/later review open that publishing/pushing with git dependencies. If Helm initially does the reverse: and allows publishing with git dependencies, then removing that functionality later isn't easily possible (it is a "one-way door").

Is there an established way to solicit more feedback on this?

Lets see what other reviewers of this HIP think. The above for me is an option, not a requirement.


```
dependencies:
- name: "<dependency name>"
repository: "git[+<subprotocol>]://<hostname>[:<port>][:][/]<path>[#subdirectory=<path-inside-the-repository>]"
version: "<commit-ish>"
```
where:
- `<subprotocol>` is a protocol supported by `git clone` (e.g. `ssh`, `http`, `https`, `file`, etc).
- `<commit-ish>` is an existing reference (SHA hash, tag or branch name) on the repo.
- `<path-inside-the-repository>` is the folder where the chart we want installed is located. If not specified, it defaults to the root of the repository.
- Implementation note: this has potential for path traversal based security bugs - it needs to be validated and prevented.

Note that `git clone` supports having the `username` and `password` in the repository URL. The implementation of this feature should explicitly forbid that to prevent accidental credential leakage. It should throw an error if the URL contains a `username` or `password`.

For example:

```
dependencies:
- name: "jenkins"
repository: "git+https://github.com/jenkinsci/helm-charts.git#subdirectory=charts/jenkins"
version: "main"
```

### Installation

When Helm is installing a dependency from git, it should:

- create a temporary directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the process that other similar systems follow? (go modules, npm, pipenv, etc. And the mentioned plugins?) In that, we should probably copy prior-art here. Rather than re-discovering already solved problems?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know for certain that npm uses a temporary folder for git clone during the npm install process (but it also has other special behaviors around git based deps, that are not applicable to Helm).

I checked the ~/go/pkg folder and it does not seem to have the .git folders there, so at least that implies that some sort of an intermediate method must have been used to perform the cloning?

A quick scan of pipenv code also implies a temp folder: https://github.com/pypa/pipenv/blob/da751e1f5568c72e013af9d977ad54c26e924d0e/pipenv/utils/dependencies.py#L749 (although they also have a special behavior for git+file which might be interesting to learn from).


If the question is about the overall approach - I think at least npm follows a very similar approach, from what I've seen - the end result after cloning does go through their npm pack (I think) process, because the final output in the node_modules folder looks the same way as if it had been npm published to the registry.


I'm not familiar enough with the process and the implementation on the Helm side just yet, but it might make sense to do a helm package instead of treating it as if it's a file:// dependency - not sure about any additional steps it performs. It seems the original implementation PR does it's own tarball build, not sure why just yet - still reviewing it.

- clone the repo at the specified branch/tag into the temp dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume, this will executable mean executing git as a subprocess? Which does also mean git chart dependencies will presume/require the existence of a working git installation.

I think worth calling this out explicitly (beyond the mention in the "security implications" section). While it might be kinda obvious, it is also an important detail of course.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be included in the CLI docs, and should be mentioned in the website docs - is it worth mentioning this explicitly in the HIP itself?

Copy link
Member

@gjenkins8 gjenkins8 Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think worth calling out in the HIP. That Helm will require a working git installation to invoke (via subprocess) in order for a Helm to utilize a charts git dependencies without error.

And that last part is also kinda important I guess. A non-functional/missing git installation would cause Helm to error (plenty of folk use contaieners, with minimal dependencies ie. no git installed. And often without git credentials). The HIP IMHO should detail that Helm would need to error in this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the note: 4fc29f1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should also do git clone --branch <version> --depth=1? (or whatever the best options for a minimal as possible clone are). But see note about copying prior-art :).

Additionally, submodules come to mind. If the target git repository uses submodules, should these be supported? (how?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the submodules be left out of the initial implementation? If anything - that would make the initial delivery faster.

Good point on shallow cloning. Added a note: 0f6cd2f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the submodules be left out of the initial implementation?

IMHO sure (as long as it is documented). I guess 99%+ of use cases would be satisfied without submodules, so implementation/support could happily be deferred.

- Helm will require a working git installation to invoke (via subprocess) in order for Helm to utilize chart's git dependencies. Helm will throw an error if git is not installed or misconfigured (e.g. credentials are not set up for private repositories).
- for performance reasons, a shallow clone of just the latest commit of a specific branch should be performed (i.e. `git clone --depth 1 --branch <commit-ish> --single-branch --no-tags <repo-url> <temp-dir>`)
- treat the cloned git repo similar to a `file:///path/to/temp/dir` style requirement; use `chart.LoadDir` to load that directory (which in turn applied the logic for filtering the files through `.helmignore`) and archives it to `charts/`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double check the packaging behavior will work correctly here. Helm in places assumes (iirc) that charts with the same (semver) version, represent the same artifact. Probably charts fetched from a git branch will have changes, but retain the same semver version. And this assumption would be invalid.

Copy link
Author

@dominykas dominykas Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any clues for specifics? Happy to go look, but the code base has the string "version" in a lot of places and a quick scan did not bring results 😅

I can see it maybe might make a difference for deployment/rollback, but that's probably not applicable when used as a dependency?

It would also have to include the registry for comparison, would it not? There's also the checksum in the lock file to validate sameness? And probably the same issue would apply to file:// dependencies already? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me investigate, and update with some links.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still need to look here)

- delete the temp dir

### Linting

`helm lint` should print a warning when a chart contains a git-based dependency, primarily because git references are mutable.

## Backwards compatibility

This is backwards compatible from Helm perspective - the existing formats for `dependencies` are still supported.

Charts that start using the new format will effectively be changing their minimum required Helm version, i.e. they would be introducing breaking changes and should bump their major version.

## Security implications
Copy link
Author

@dominykas dominykas Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git allows symlinks that can point to anywhere. Helm resolves symlinks and packages up the file contents with a printed warning. Is that an additional security risk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a risk to me. Ideally Helm already might not package symlinks (I didn't check)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm does support resolving symlinks during helm package: helm/helm#8540 - seems that will be done by design, and it should print a warning.

So does helm template ..

However the symlinks are ignored if they exist inside the tarball itself (i.e. if you craft the tarball manually, and include a symlink, then it will not be followed) - I hope that's by design too 😁

I've switched the code in the implementation to effectively use the same code path as helm package (primarily to make sure that if you're installing a chart from git, it would be have as close as possible to it have been helm packaged and published into a registry). This means that symlinks get followed and resolved during helm dep up, but you do get a warning:

walk.go:74: found symbolic link in path: /var/folders/j3/8m3y8q4176118dng0hvdqfh00000gp/T/helm-git-3323594338/templates/secret.yaml resolves to /Users/Dominykas_Blyze/devel/experiments/helm-test/templates/secret.yaml. Contents of linked file included and used

Given that helm package's warning was deemed to be enough - I figure this is also enough? I will add an explicit note under "security implications" to explain this - and possibly that should be left at that?


Pulling the dependencies from git may introduce additional attack surfaces, as it would need to rely on an implementation of `git` (most likely the official `git` executable), and there have been recent vulnerabilities disclosed, including Remote Code Execution (RCE).

This is something that needs to be taken into account in security conscious environments and might need to be documented for the end users. Users with high security requirements, should probably avoid using the feature and instead rely on a registry.

## How to teach this

- The documentation should note the security caveat listed above
- The documentation should provide the recommendation to prefer registries to git, if possible
- The documentation should note the implications of git being mutable with a recommendation of pinning to specific hashes
- The documentation could list the examples for various git protocols, but mention that Helm supports whatever `git clone` supports

## Reference implementation

Multiple implementation attempts available for [a discarded earlier draft of a related HIP](https://github.com/helm/community/pull/214):

- [helm/helm#11258](https://github.com/helm/helm/pull/11258)
- [helm/helm#9482](https://github.com/helm/helm/pull/9482)
- [helm/helm#6734](https://github.com/helm/helm/pull/6734)

## Rejected ideas

- An [earlier draft for solving the issue](https://github.com/helm/community/pull/214) suggested using URLs like `git://https://...`. There were comments about that approach in the reference implementations, with the suggestion to use the conventions which are already established in other ecosystems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, the established convention (as mentioned appears to be e.g git+https://...


## Open issues

N/A

## References

- [`npm install` documentation](https://docs.npmjs.com/cli/v10/commands/npm-install) (covers the `git+[protocol]` format)
- [Python packaging documentation on version specifiers](https://packaging.python.org/en/latest/specifications/version-specifiers/) (covers the `VCS+protocol` format)
- [bundler documentation on installing gems from git repositories](https://bundler.io/guides/git.html)