Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yarn: Forbid zero installs workflow #379

Merged
merged 3 commits into from
Nov 20, 2023

Conversation

eskultety
Copy link
Member

@eskultety eskultety commented Nov 3, 2023

The concept of zero installs is inherently flawed in that one has to either store binary ZIP files or an exploded node-modules dependency tree inside their project repository. While the former may sound like an acceptable use case from cachi2's hermetic build POV because Yarn provides us with a way to check their integrity, the latter surely doesn't (in fact we refuse projects having node-modules in the repository).

That said, even with the former case some dependencies may specify post-install scripts which will cause such a dependency to be unpacked and expanded under .yarn/cache/unplugged which in simple terms is nothing else than again an expanded node-modules tree which we have to treat the same way as we do for NPM - reject it - because such zero installs would not work without these unplugged packages (if any dependency needed to be unpacked) and at the same time we cannot verify them reliably.

Note that even if one uses the 'immutablePatterns' [1] YarnRc configuration
option with **/.yarn/unplugged/** or even **/.yarn/** Yarn somehow ignores the pattern unless the whole .yarn/unplugged/<unzipped_package>/node_modules/<package> subdirectory of a given unplugged package tree was completely removed in which case Yarn finally notices and throws an immutable cache error:

        The checksum for **/.yarn/unplugged/**/* has been modified by this
        install, which is explicitly forbidden

The whole glob pattern handling in Yarn happens here:

Given that Yarn doesn't provide us with a mechanism to verify integrity of the unplugged packages we're facing essentially the same problem as with node_modules handling in NPM which we explicitly forbid. Needless to say that Yarn also provides a backwards compatible way with the NPM ecosystem by creating the node_modules directory if the nodeLinker setting in the RC config was set to node-modules and hence the check detecting zero-installs workflow was updated accordingly to reflect the different configuration values for the setting.

[1] https://v3.yarnpkg.com/configuration/yarnrc#immutablePatterns

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

@eskultety
Copy link
Member Author

eskultety commented Nov 3, 2023

Note that this PR was rebased on top of #377 and #376 in order for tests to pass and to be testable locally, but naturally conflicts will arise with other PRs on review. Also, only the last 2 commits are relevant, the rest is just the rebase.

Also, the fact I used an RFC prefix to spark a discussion on this topic and did so rather than marking this as a draft since it looks like the concept of drafts is used to mark something as NOT READY for review in this project which isn't the case here.

@eskultety eskultety force-pushed the forbid-node-modules branch from eccce8a to 20e0ecd Compare November 9, 2023 11:26
@eskultety eskultety changed the title RFC: yarn: Forbid zero installs workflow yarn: Forbid zero installs workflow Nov 9, 2023
@eskultety
Copy link
Member Author

since RFC:

  • rebased on top of current main
  • added a getter/setter for the nodeLinker RC option (as a result of current HEAD changes)
  • added a couple of unit tests for the zero-installs detection

@eskultety eskultety force-pushed the forbid-node-modules branch from 20e0ecd to 95c63ab Compare November 9, 2023 11:29
@ben-alkov ben-alkov self-requested a review November 9, 2023 14:43
Copy link
Contributor

@taylormadore taylormadore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach LGTM

cachi2/core/package_managers/yarn/main.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/yarn/main.py Show resolved Hide resolved
@ben-alkov
Copy link
Member

If it were me, and I weren't planning to ever merge this (I assume that's the case, given that the usual PR template wasn't used), I'd open a PR against my fork for the team to comment on, which could then be turned into a PR against 'main' in this repo once it was actually ready.

P.S. We usually use 'draft' exactly to mean "this is ready to review, but NOT merge", e.g. to get early comments on design etc.

@eskultety
Copy link
Member Author

If it were me, and I weren't planning to ever merge this (I assume that's the case, given that the usual PR template wasn't used)

I do plan on merging this, sorry, my bad for messing this up when dropping the RFC by not putting the PR template back in, fixed now!

I'd open a PR against my fork for the team to comment on, which could then be turned into a PR against 'main' in this repo once it was actually ready.

Note that by doing ^that in upstream context you're decreasing the exposure of your proposed changes to a selected group of people who may be tracking your fork (or notify them via private means) instead of immediately getting input from all interested parties in the community, so again, it was my mistake for not putting the PR template back in, but other than that all RFC discussions ought to happen within the project's main repo if it uses PR/MR based workflow or on the official mailing list if the project uses mail-based workflow simply because personal forks in most cases are just a playground to experiment (e.g. with Git<forge> features) and to fine-tune one's changes against the main repo.

@brunoapimentel
Copy link
Contributor

It LGTM on a first review. I think we can rebase this now since #376 was merged, and it's probably easier to merge this even before #377 comes in.

A project can make use of zero installs a couple of different ways:
    - offline cache vs
    - exploded node_modules tree
The value of 'nodeLinker' [1] can help us in determining if a project
uses any of the above and hence whether a project does use zero installs
or not.

We're mainly interested in the getter and the setter is only introduced
for consistency reasons as we likely won't need to set this when
overriding user Yarn configuration.

[1] https://v3.yarnpkg.com/configuration/yarnrc#nodeLinker

Signed-off-by: Erik Skultety <[email protected]>
Although Yarn docs [1] suggests usage of the 'pnp' linker mode combined
with offline mirrors in shape of .yarn/cache directory populated with
ZIP archives, the page also mentions alternative way of achieving the
same thing - expanded 'node_modules' file tree.
The thing with node_modules directory is that there are 2 distinct ways
of appearing in the repository:
    1) using the 'pnpm' nodeLinker install setting
    2) using the 'node-modules' nodeLinker install setting

Why is the nodeLinker setting relevant at all? Because for local cachi2
usage where the environment, and more specifically the git repository
might not be completely in a clean and pristine condition, in other
words the user may have some forgotten node_modules directory in the
repository which isn't even checked in and we'd still fail a build
request. Therefore, the linker setting can tell us what we should be
looking at when evaluating whether a given project does or doesn't use
zero-install workflow and modify the check accordingly.

[1] https://v3.yarnpkg.com/features/zero-installs

Signed-off-by: Erik Skultety <[email protected]>
@eskultety
Copy link
Member Author

It LGTM on a first review. I think we can rebase this now since #376 was merged, and it's probably easier to merge this even before #377 comes in.

rebased on top of main in the meantime.

Copy link
Contributor

@taylormadore taylormadore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

The concept of zero installs, i.e. no install needed (git clone is
sufficient), is inherently flawed for a number of reasons:
    - taking over maintenance (by the means of manual updates) of a
      project's dependencies by baking their sources in to the given
      project's repository
    - creating unnecessary bloat (often in form of binary formats) in
      the repository
    - moving the trust in package contents from the official packaging
      tooling and official public registries to a given project which
      doesn't really solve the biggest security problem of many public
      packaging repositories - unvetted contents

just to mention a few. In context of Yarn what the above would mean is
checking in dependencies' ZIP files into the repository. While that may
sound like an acceptable use case since Yarn can verify integrity of
the ZIP archives, some dependencies (due to e.g. post-install scripts)
may end up being unpacked into a .yarn/unplugged directory, effectively
creating an exploded node_modules/ dependency tree hierarchy inside the
repository which would be needed for the zero install use case to work.
However, we would have to employ a complex methodology (still
preventing arbitrary code execution) of reliably verifying such
dependencies in order to produce an accurate SBOM. Since we already
reject projects containing 'node_modules' directory inside the
repository for NPM, we can use it as a precedent here.

The whole situation would be different if Yarn provided a mechanism to
verify integrity of 'unplugged' contents the same way it does it for
ZIP files, but unfortunately it doesn't [1].

As a result of this patch some test variants involving the zero-install
use case which no longer applies have been adjusted accordingly and
dedicated test cases dealing with zero installs were added.

[1] Even if one sets the 'immutablePatterns' [2] YarnRc configuration
option to something like '**/.yarn/unplugged/**' Yarn doesn't seem to
care about the glob pattern unless the whole

    .yarn/unplugged/<unzipped_package>/node_modules/<package>

subdirectory of a given unplugged package tree would end up being
removed in which case Yarn finally notices and throws a immutable cache
error:

    The checksum for **/.yarn/unplugged/**/* has been modified by this
    install, which is explicitly forbidden

[2] https://v3.yarnpkg.com/configuration/yarnrc#immutablePatterns

Signed-off-by: Erik Skultety <[email protected]>
Copy link
Contributor

@brunoapimentel brunoapimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@eskultety
Copy link
Member Author

/retest

@eskultety eskultety added this pull request to the merge queue Nov 20, 2023
Merged via the queue into containerbuildsystem:main with commit 0a91337 Nov 20, 2023
15 checks passed
@eskultety eskultety deleted the forbid-node-modules branch November 23, 2023 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants