Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the strict requirement for an independent storage backend #584

Closed
considerate opened this issue Jan 15, 2023 · 9 comments
Closed

Comments

@considerate
Copy link

I'm considering creating a registry for purenix. I have looked at #354 and I think that the simplest solution for purenix would be to just create a separate registry repository given the small number of supported packages rather than adding a backend field to each package in the registry.

I think it's feasible for me to create a registry with metadata and a corresponding registry index with cached manifests. However, I wouldn't want to replicate all the work in this repository and I would like to avoid setting up a separate storage backend.

However, as it stands, the registry is tightly coupled to a single storage backend because the PublishedMetadata only contains the fields hash, bytes, and publishedTime.

If the PublishedMetadata had an alternative representation that uniquely declares how to fetch the package from the original location instead there would be no requirement for the separate storage backend.

I understand that not having a separate storage backend would imply issues with disappearing packages that the main registry aims to avoid. However, for alternative registries like the one for purenix and maybe one for purerl I think loosening this constraint and having the package manager fetch the source from their original locations is preferable.

@f-f
Copy link
Member

f-f commented Jan 15, 2023

Hi Viktor! I hope you are doing well 🙂

The "alternate backends" corner of the Registry is underimplemented right now, as we chose to prioritise other things first in order to keep the initial scope manageable. However, I would say that the main idea of where to go (and perhaps a clearer vision of where not to go with it) is nailed down. I'll try to reason through it here below.

If the PublishedMetadata had an alternative representation that uniquely declares how to fetch the package from the original location instead there would be no requirement for the separate storage backend.
I understand that not having a separate storage backend would imply issues with disappearing packages that the main registry aims to avoid

In my opinion the main reasons why a central Registry is useful to us right now are:

  1. it provides a central repository of names, i.e. the authoritative key-value store pointing from short names (e.g. prelude) to source blobs (tarballs in the context of the new registry, pointers to specific GitHub refs in the Bower world)
  2. it stores the source blobs from above and guarantees their integrity, so that people's builds do not break and are as secure as we can guarantee (this has been an issue in the past, and it's the main improvement that the new Registry offers over the Bower setup)

Accepting the proposal here would mean disposing of point (2) above, which is not the experience that we'd like to offer to the Registry users (i.e. if we were fine with the tradeoff that the Bower registry offered then we wouldn't be doing all this work in the first place)

As you note this situation was discussed in #354, which in hindsight I find a confusing read, as a lot of the context was talked about but never written down in there.
The main conclusion that we reached there is that the "least wrong" way to support packages targeted to backends that are not the default one likely involves "package aliases", i.e. you publish a package under a certain name, and declare somewhere "this replaces package X when using backend Y". This situation has a few unspecified corner cases, so I don't think we are ready to move forward with that just yet.

The other important conclusion that we reached at the time was "alternate backends are doing just fine with using package sets, no hurry in using the Registry for them" (which is what allowed us to stash this away for later).
In fact if you are using an alternate backend today, you're probably using package sets and nothing else - this is how we use purerl at work, we're the main consumers of it, and we're doing just fine with that setup.

Your first sentence above is "I'm considering creating a registry for purenix", and I ask: do you need one? Is that a simpler/more effective setup than just using a package set?

If you are using the new spago with the spago.yaml files, then you are using package sets that look like this one.
The way you point to this package set in spago looks like this:

workspace:
  set:
    registry: 9.0.0

however, you can also reference it by URL:

workspace:
  set:
    url: https://raw.githubusercontent.com/purescript/registry/main/package-sets/9.0.0.json

Now, these package sets don't have to come from the Registry repo. You can point Spago at any URL, and it's able to read a bunch of different formats (there's unfortunately not a single line of documentation on this stuff yet, but on the other hand we're not even in alpha yet), so in the case of an alternate backend, you could alias packages and the package set file would look something like this:

{
  "version": "9.0.0",
  "compiler": "0.15.4",
  "published": "2023-01-04",
  "packages": {
    "prelude": {
      "git": "https://github.com/purenix-org/purescript-prelude",
      "ref": "0a991d6422d5d57650955fab8468f7af82dba944"
    },
    "bifunctors": "6.0.0",
    ...
  }
}

Specifying git and ref is all you need if the package you're fetching has a spago.yaml, otherwise you'd also specify the dependencies array much like you'd need in the current Spago's package set format, and if the source of the dependency is not in the root then you can specify a subdir (example here).


If the above makes somewhat sense, then I suppose that we could so something to help the authors of alternate package sets to keep them up to date, much like we automatically keep the official package sets up to date automatically.
The registry uses this script, and I think we could make it generic enough (and/or encase it in a github action) so that one could use it to keep any package set up to date. (cc @thomashoneyman)

@considerate
Copy link
Author

Hello Fabrizio! Long time no see! Thank you, I'm doing great! How about you?

{
  "version": "9.0.0",
  "compiler": "0.15.4",
  "published": "2023-01-04",
  "packages": {
    "prelude": {
      "git": "https://github.com/purenix-org/purescript-prelude",
      "ref": "0a991d6422d5d57650955fab8468f7af82dba944"
    },
    "bifunctors": "6.0.0",
    ...
  }
}

I really like the structure of that last example, I just thought this was something that wouldn't be supported because of the type of a PackageSet only allows Version as the value in the Map.

Is it reasonable for me to add a PR where I extend the PackageSet type to also allow remote locations?

@MonoidMusician
Copy link
Contributor

I believe the intent is for Spago to support extended package sets with custom packages, but the registry will only release package sets of packages from the registry.

Side note: can we rename the set: config key to package-set: in the new Spago? It's really confusing to read it in isolation, since it's not clear what it is referring to (is it a verb, a noun? setting what / a set of what?). Happy to create a PR to spago with backwards-compat there.

@considerate
Copy link
Author

@MonoidMusician

My current use case is purescript2nix, a small script to build purescript packages in nix using purs directly while fetching the dependencies from the registry.

I wouldn’t want to implement the extended package sets into purescript2nix without updating the registry because it would be in direct violation of the spec.

packages, which is an object in which keys are PackageNames and values are Versions

@f-f
Copy link
Member

f-f commented Jan 16, 2023

@considerate I'm doing great as well, thanks 😊

As @MonoidMusician said, the upstream package sets that we host in the Registry are bound to use only packages that come from inside the Registry, hence the limitation in the type you linked.

However, any other package set doesn't have to have this limitation, and in fact Spago can understand many more different formats. The type that we use for that is here (and in addition to this new format we also support the current package set format).
The Registry spec that you link is not a general spec for package sets, but only the specification for the format that the Registry will use - other package sets are allowed to use any format as long as the package manager can understand it. We'll include a similar spec in Spago's docs documenting the accpeted formats.

@f-f
Copy link
Member

f-f commented Jan 16, 2023

@MonoidMusician and I'm on board with renaming the set key!
Let's go with package_set for consistency with the rest of the Yaml format.
And no need to be backwards compatible, no one is supposed to be using this yet 😄

@considerate
Copy link
Author

considerate commented Jan 17, 2023

Thank you for the comments @MonoidMusician @f-f.

The Registry spec that you link is not a general spec for package sets, but only the specification for the format that the Registry will use - other package sets are allowed to use any format as long as the package manager can understand it.

I see. I'm weary about each package manager specifying the format of package sets as this might lead to divergence and incompatibilities between the package managers. However, I understand that it's not really my call to make. Hopefully the registry spec can at least serve as a least common denominator and a minimal requirement of what each package manager should support.

In the short-term I'll implement the RemotePackageSet format from spago in purescript2nix. I already have something similar for the extra_packages in the spago.yaml file so it's probably not going to be difficult. My objections are of a more philosophical nature and my reservations stem from considering the long-term interactions between what package managers should do and what the registry provides. If each package manager has its own format there's no single source of truth for what a purescript packge set is.

@considerate
Copy link
Author

Closing this since the original issue of requiring a storage backend does not seem to be there.

@f-f
Copy link
Member

f-f commented Jan 19, 2023

I see. I'm weary about each package manager specifying the format of package sets as this might lead to divergence and incompatibilities between the package managers. However, I understand that it's not really my call to make. Hopefully the registry spec can at least serve as a least common denominator and a minimal requirement of what each package manager should support.

We have tried to keep the Registry spec footprint to a minimal size, as we'll have to support for a very long time anything that we introduce (due to our forwards-compatibility guarantee).
Given this situation, it's preferrable to leave package set concerns to package-manager-driven convention.
We hope that, being Spago the suggested package manager, any other package manager would standardise on its accepted formats, while still being open for innovation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants