Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PineAPPL file format and backwards compatibility issues #83

Closed
cschwan opened this issue Oct 29, 2021 · 10 comments
Closed

PineAPPL file format and backwards compatibility issues #83

cschwan opened this issue Oct 29, 2021 · 10 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@cschwan
Copy link
Contributor

cschwan commented Oct 29, 2021

Backwards compatibility

First, let's define backwards compatibility:

Grid::read must be able to read all generated PineAPPL grids if they were generated using a released version of PineAPPL. Released versions are the ones on the Releases page.

PineAPPL file format

PineAPPL doesn't have a dedicated file format, but instead relies on serde for (de)serialization and on bincode for actually writing bytes to and from files. This has the disadvantage that, for the sake of backwards compatibility, every struct that has the #[derive(Deserialize,Serialize)] attributes must never be changed ever, and the only flexibility is adding further kinds of enums; that's the reason why there are multiple versions of a struct as V1 and V2 variants.

Obviously requirements change and even in the design mistakes were/will be made. To mention a few examples:

  • The MoreMembers enum was added to support a BinRemapper. This struct basically supersedes BinLimits, which only supports one-dimensional distributions that are contiguous (the right bin limit is the left limit of the next bin). BinRemapper supports, at least in principle, an arbitrary number of dimensions and also normalizations that are not necessarily tied to bin sizes. Yet another struct BinInfo is needed to abstract the differences between the two, as shown in Grid::bin_info.
  • Furthermore, the MoreMembers enum is needed for metadata, which was previously missing. As a result, the methods Grid::key_values, Grid::key_values_mut return an Option depending on whether the Grid does have metadata or not.

Planned changes

To make file handling more flexible and to support different designs without sacrificing backwards compatibility, we need to implement a few changes:

  1. We need a file header and a file version. The file header precedes as the remaining data and can be as simple as the byte string ['P', 'i', 'n', 'e', 'A', 'P', 'P', 'L']. This is needed to let Grid::read detect if a grid can immediately be deserialized or if it has first to be decompressed. The file version, on the other hand, lets us determine exactly how the read is performed.
  2. Depending on the file version, read of the correct struct is called, followed by upgrade which converts the grid from a specific version to the latest one. The upgrade method must also be offered by the CLI so that one can batch convert grids into the newest version.
  3. At some point we might have different versions of the Grid struct in the crate, possible as pineappl::grid::v0::Grid, pineappl::grid::v1::Grid as so forth, and a type definition for pineappl::grid::Grid for the most recent version.
  4. As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.
  5. To make this work, the supported file versions need to be documented, ideally in the upgrade subcommand of the CLI itself as error messages (something along the lines of error: tried to upgrade grid with file version 0. You need pineappl 0.5.0 to upgrade this version).
@cschwan cschwan added the enhancement New feature or request label Oct 29, 2021
@cschwan cschwan added this to the v0.6.0 milestone Oct 29, 2021
@cschwan cschwan self-assigned this Oct 29, 2021
@cschwan
Copy link
Contributor Author

cschwan commented Oct 29, 2021

As a first step, we might want to simply consider every version released as file version 0. Starting with v0.6.0 we should explicitly write the file version.

@cschwan
Copy link
Contributor Author

cschwan commented Oct 29, 2021

Commit 3b88b4c adds the upgrade subcommand.

@alecandido
Copy link
Member

  1. As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.

You can think about to declare a fixed number of older versions always supported.
E.g. you can support just one older version, for which upgrade is available to the newer one. Then, if there are grids that survive multiple releases without being updated, the user can always download the intermediate releases and do the upgrades one by one.

If occasionally is not overly complicated to maintain several older versions you do it, but it's not strictly required.

@cschwan
Copy link
Contributor Author

cschwan commented Oct 29, 2021

  1. As soon as a new file version is released all previous file versions should be considered deprecated, and at some point older versions can be removed. Backwards compatibility is ensured by the fact that crates.io always has all versions of the CLI, which we can use to upgrade grids in bootstrap kind of way (install the latest version that still supports the file format that needs to be upgraded, upgrade to most recent version supported, etc.). This could and should probably be automated.

You can think about to declare a fixed number of older versions always supported. E.g. you can support just one older version, for which upgrade is available to the newer one. Then, if there are grids that survive multiple releases without being updated, the user can always download the intermediate releases and do the upgrades one by one.

If occasionally is not overly complicated to maintain several older versions you do it, but it's not strictly required.

Yes, that's what I meant with 'bootstraping'. It's a well known problem for GCC, which needs a C++ compiler to build 😄.

@alecandido
Copy link
Member

alecandido commented Oct 29, 2021

You're definitely right, then just keep going :)

@cschwan
Copy link
Contributor Author

cschwan commented Nov 3, 2021

Preliminary code to support file format changes are in commit d9897bc. If you find yourself not being able to read new grids, make sure the CAPI/CLI/Python API is up to date.

@cschwan
Copy link
Contributor Author

cschwan commented Jan 8, 2022

Here's what I'd like to change in a newer version:

  • merge the types BinInfo and BinRemapper into BinLimits. The reason for having them separate is historical only (saw above) and merging them will make parts of the code much easier
  • merge Mmv3 into Grid, which means that metadata will always be present making metadata-related code much shorter
  • change the Order's member from u32 to u8 and add another member to support Add support for additional couplings #98
  • remove most of the Subgrid types and keep only the ones we use

@alecandido
Copy link
Member

Just to have an idea: can you tell which are the subgrid types we're still using, and where?
I know a couple of them (maybe a bit more), but I'm definitely puzzled about the others...

@cschwan
Copy link
Contributor Author

cschwan commented Jan 8, 2022

Here's an overview:

  • EmptySubgridV1: keep, this is needed to optimize empty grids
  • ImportOnlySubgridV1: remove
  • ImportOnlySubgridV2: keep, is more general than ImportOnlySubgridV1, supports grids where the facorization scale is different from the renormalization scale
  • LagrangeSubgridV1: remove
  • LagrangeSubgridV2: keep, is more general than LagrangeSubgridV1, supports DIS
  • LagrangeSparseSubgridV1: remove, was never used; was supposed to give a better memory footprint than LagrangeSubgridV{1,2} while filling the grid with a MC, but that was never a problem with Madgraph5
  • NtupleSubgridV1: remove, this saves N-tuples so that there's no interpolation error, but its space requirements make it unpractical

@cschwan
Copy link
Contributor Author

cschwan commented Feb 19, 2022

I'm closing this as support for the original is supported in v0.5.0. Further development should be discussed in #118.

@cschwan cschwan closed this as completed Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants