Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tar/asm.IterateHeaders #71

Merged
merged 1 commit into from
Sep 26, 2024
Merged

Add tar/asm.IterateHeaders #71

merged 1 commit into from
Sep 26, 2024

Conversation

mtrmac
Copy link
Contributor

@mtrmac mtrmac commented Sep 11, 2024

This allows reading the metadata contained in tar-split without expensively recreating the whole tar stream including full contents.

We have two use cases for this:

  • In a situation where tar-split is distributed along with a separate metadata stream, ensuring that the two are exactly consistent
  • Reading the tar headers allows making a ~cheap check of consistency of on-disk layers, just checking that the files exist in expected sizes, without reading the full contents.

This can be implemented outside of this repo, but it's not ideal:

  • The function necessarily hard-codes some assumptions about how tar-split determines the boundaries of SegmentType/FileType entries (or, indeed, whether it uses FileType entries at all). That's best maintained directly beside the code that creates this.
  • The ExpectedPadding() value is not currently exported, so the consumer would have to heuristically guess where the padding ends.

Cc: @kwilczynski

This allows reading the metadata contained in tar-split
without expensively recreating the whole tar stream
including full contents.

We have two use cases for this:
- In a situation where tar-split is distributed along with
  a separate metadata stream, ensuring that the two are
  exactly consistent
- Reading the tar headers allows making a ~cheap check
  of consistency of on-disk layers, just checking that the
  files exist in expected sizes, without reading the full
  contents.

This can be implemented outside of this repo, but it's
not ideal:
- The function necessarily hard-codes some assumptions
  about how tar-split determines the boundaries of
  SegmentType/FileType entries (or, indeed, whether it
  uses FileType entries at all). That's best maintained
  directly beside the code that creates this.
- The ExpectedPadding() value is not currently exported,
  so the consumer would have to heuristically guess where
  the padding ends.

Signed-off-by: Miloslav Trmač <[email protected]>
@kwilczynski
Copy link

@mtrmac, this is very nice! Thank you for exposing this!

There was a use case I had where exposing FileType and Payload (so the CRC64) would be useful. But I don't know if this is something we would like to do and what the complexity of this would be.

@kwilczynski
Copy link

/approve
/lgtm

Copy link
Owner

@vbatts vbatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting use-case. Thanks for that.

@vbatts vbatts merged commit 93a41cf into vbatts:main Sep 26, 2024
5 checks passed
@vbatts
Copy link
Owner

vbatts commented Sep 27, 2024

and i've tagged release v0.11.6

@mtrmac
Copy link
Contributor Author

mtrmac commented Sep 27, 2024

@vbatts Thanks!

@mtrmac mtrmac deleted the iterate branch September 27, 2024 17:39
@mtrmac
Copy link
Contributor Author

mtrmac commented Sep 27, 2024

There was a use case I had where exposing FileType and Payload (so the CRC64) would be useful. But I don't know if this is something we would like to do and what the complexity of this would be.

The current code simply ignores FileType, but it already assumes that there is only one tar header per SegmentType; so changing that to expect the two to interleave regularly, and to collect the other data, seems easy enough to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants