Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying node is slow #831

Open
elibarzilay opened this issue Feb 6, 2025 · 8 comments
Open

Copying node is slow #831

elibarzilay opened this issue Feb 6, 2025 · 8 comments

Comments

@elibarzilay
Copy link
Contributor

Copying a big directory is slow.
Using symlinks is much lighter, but obviously there's too many issues.

I have two suggestions:

First, cp -l creates a hard link, which is of course much faster, and the result is very close to a plain cp so there shouldn't be any issues. For example, a very lightweight change would be to set CP="cp -l" and use that instead of all cps. (CP_OPTIONS is better, but maybe that's too big of a change.)


Second, having a symlink based thing is nice not only in being fast, but also in keeping things more organized, since the downloaded stuff is kept in a single place. But of course adding that properly is expensive + risky so unlikely to be desirable.

A possible way to get that with minimal changes could be:

  • Add one single symlink to the current version, something like /usr/local/n/versions/node/current -> 22.13.1. That should be just a one-line thing to create/set it before copying. This effectively adds a proper place that remembers the current version directly rather than node --version.
  • Add some --no-copy flag (better: envvar (better: both)) to completely skip copying (so just the above symlink is changed). This is probably also a very small change.

This means that I can maintain my own bin/* symlinks to current/bin/*, and others if I want to.

Now, this looks like a weird and overly specific feature, but a good way to think about it is that n has functionality to (A) get some node version, and more functionality to (B) install it. This suggestion makes it possible to use (A) without (B). (IOW: use installl() without activate().)

This is very close to using --download, the only difference is that it's still doing the symlink change. A tiny change would be to rename $DOWNLOAD to a $MODE internally, so it can have three values for downloading only, or also doing the symlink, or also doing the whole thing.


Again, I can do PRs for both of these if it sounds reasonable. I'm guessing that they will be very small overall.

@shadowspawn
Copy link
Collaborator

shadowspawn commented Feb 6, 2025

cp -l is an interesting idea, but significantly changes the result. The cached and installed hierarchies would no longer be independent. I don't think many operations would modify the files, as opposed to overwriting, so might not notice in normal use.

Also, hard links do not work across partitions, and the cache destination and install destination may be on different partitions.


Now, this looks like a weird and overly specific feature

I appreciate that you recognise switching to symlinks would be major and unlikely, but adding a partial feature to make it possible to opt-in is... a bit weird and overly specific. 😆

For interest, my recollection is this is similar to how fnm is implemented. There is a current-ish folder that fnm modifies symlinks in.

@shadowspawn
Copy link
Collaborator

The two supported ways of avoiding or lowering the copy cost are:

  1. use exec or run or which to use a cached node without installing
  2. skip copying npm. This makes a big difference! This fits particularly nicely into workflows where you usually run latest npm anyway, or are using a different package manager.

@shadowspawn
Copy link
Collaborator

For my own interest I did a quick check, and --preserve makes less of a difference than I remembered. Significant improvement, but rest of the install does still take real time.

% time n 22.13.1
     copying : node/22.13.1
   installed : v22.13.1 (with npm 10.9.2)
n 22.13.1  0.18s user 1.09s system 45% cpu 2.797 total
% time n 22.13.1
     copying : node/22.13.1
   installed : v22.13.1 (with npm 10.9.2)
n 22.13.1  0.18s user 1.07s system 46% cpu 2.676 total

% time n --preserve 22.13.1 
     copying : node/22.13.1
   installed : v22.13.1
n --preserve 22.13.1  0.09s user 0.44s system 29% cpu 1.815 total
% time n --preserve 22.13.1
     copying : node/22.13.1
   installed : v22.13.1
n --preserve 22.13.1  0.09s user 0.44s system 28% cpu 1.853 total

@elibarzilay
Copy link
Contributor Author

hard links do not work across partitions

Yeah, that's a good point. Personally, I would still use it (I'd test one cp -l and see if the copy was made). The thing is that modifying files, either by truncating (would affect both) or by re-creating (affects just one side) should not be done at all in a properly setup environment -- as in having the directory owned by root. If anything, I'd worry more about re-created files getting the directories out of sync, but with a plain copy that still happens (and worse: changes to the installed directory would get nuked on the next "activate").

But that's up to you :)


Side note (related to the below): I think that --download and $DOWNLOAD are bad names. The obvious meaning I'd guess is that it enables downloading whereas, IIUC, it actually just prevents activation.


Finally, I'm actually interested in the symlink thing not only because of the performance issue, but also because I very much prefer that approach, since it keeps things very tidy. (I never have issues with preserving npm, since I'd rather use whatever I get with node.) In fact, I got to n after using my own script for a long time which did this simple symlink thing, and switched to n for the much better way of figuring out what to download.

I realize that using exec or run would be better, but that's impractical since I want to write scripts that don't require n to run. (I know that it is possible to have a node script that would do that, using shell black magic (a node script that looks for the real node by removing its directory from $PATH and then looking for the node exe) -- but that's a tall order...)

I also considered some wrapper script (eg, some nn script that calls n) -- but the problem there is that I don't know which version gets activated by n, and I don't see a reliable way to get that. If I had just a symlink, I could have invoked n with --download andd use the directory, but I don't see that.

...BUT..., looking at the new --cleanup thing, I'm not sure that such a symlink would make sense since the cached directory could just be removed.

I actually thought for a while that the only way out would be a fork, but then I realized that there's a nice way to do this which requires a VERY small change that I will do in a followup PR.

@shadowspawn
Copy link
Collaborator

The big functional difference with running versions of node from separate folders, whether using symlinks or jumpers, is the impact on global packages. I have had one try at a lightweight way of using symlinks but I abandoned it because it wouldn't be just an under-the-hood change. I see other managers have commands and behaviour related to managing global packages like nvs migrate.

but the problem there is that I don't know which version gets activated by n, and I don't see a reliable way to get that.

n which lts?

I actually thought for a while that the only way out would be a fork, but then I realized that there's a nice way to do this which requires a VERY small change that I will do in a followup PR.

I will be interested to see, but not optimistic it will be accepted. 😄

@shadowspawn
Copy link
Collaborator

shadowspawn commented Feb 8, 2025

Side note (related to the below): I think that --download and $DOWNLOAD are bad names. The obvious meaning I'd guess is that it enables downloading whereas, IIUC, it actually just prevents activation.

The first usage of --download was to (mis)use implicit install to just download by skipping activation. --download then got used again more appropriately to enable downloads for run et al. I tidied up the public facing usage in #821 but minimised the code churn and kept the old behaviour for a while. So the code uses it two ways, one historical and one recent.

The documented usage is to enable downloading, as you expected.

@shadowspawn
Copy link
Collaborator

My musing from thinking about links during a walk is n link <version> to update symlinks in the "link" folder. That makes it separate from install. (I still think probably better to leave link approach to the many other version managers, but it is enticing!)

elibarzilay added a commit to elibarzilay/n that referenced this issue Feb 9, 2025
This is the tiny change that I talked about in tj#831. The actual change
is both the extra line, plus a vague highlevel decision to keep this
output intact in face of future change.

Looks like a very small change (which I promised :) and also likely to
be accepted since it's a harmless one line of output --- BUT --- with
this, I can use `n` to manage the downloading + caching of various
versions, but use my own system for actually "activating" it.

In my use case, I'd write a small script that uses
`n <ver> --quiet --download` to just have some version available. With
this PR in place, I can parse the output and look for `^ *(mk)?dir : `.
That gives me the directory which I will then use as a `/usr/local/node`
symlink (with `/usr/local/bin/*` symlinks into that main `node`
symlink).

AAt a higher level, this makes it possible to use the download-and-cache
functionality of `n` with any way of using the results. This could open
up more uses in the future to have more alternative activations.

I specifically think now that some disconnected `n link <ver>` would be
an awkward addition, since it would look like a side-think that is
bolted on, since it would be disconnected from the main way of
activating a version. That's why I originally thought about something
along the lines of adding a "mode" thing, but as I said in that thread,
that would be a major change. Especially with things like `--cleanup`
that won't fit with it.

At a minimum, it would make it easy to play with alterrnative
activations before adding it to the main code.
@elibarzilay
Copy link
Contributor Author

n which lts

Well, originally, I wanted to hook into "the currently activated version", so which doesn't help. More generally, there's no concept of a "currently activated version", but rather "the version that node currently shows". My #833 PR is giving that up, and is basically similar to using n --quiet --download <ver>; n which <ver> except that there's no need to run the whole thing twice so it's very lightweight.

(Re n link <ver> -- I commented there about it...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants