Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode unpadded length of payload #375

Closed
wants to merge 3 commits into from
Closed

Conversation

nothingmuch
Copy link
Collaborator

@nothingmuch nothingmuch commented Oct 23, 2024

Although BIP 174 PSBTs are self terminating [edit: actually they aren't, see below nvm keylen=0 is syntactically invalid, not just semantically c.f. https://github.com/bitcoin/bips/blob/master/bip-0174.mediawiki#cite_note-1], storing the length in the encrypted payload avoid any behavioral dependency on a PSBT parser ignoring any trailing data (the NULL byte padding).

This is encoded as a BOLT 1 TLV1 record with type 0 (and therefore also a valid TLV stream), facilitating forward compatibility with BIP 77 extensions that might not necessarily signal receiver capabilities in the BIP 21 URI. This implementation si,mply ignores any trailing data which would contain any subsequent TLV records with a type larger than 0.

Since payloads are at most 7168 bytes (including the overhead), the unpadded plaintext length will always fit in 16 bits. For very small payloads this value is possibly less than 253 (0xfd), resulting in an 8 bit length.

Footnotes

  1. https://github.com/lightning/bolts/blob/master/01-messaging.md#type-length-value-format

@nothingmuch nothingmuch force-pushed the TLV branch 3 times, most recently from e1b17fe to ed0555c Compare October 23, 2024 20:27
@DanGould
Copy link
Contributor

In another chat, you presented the problem this solves to me as follows:

BIP 174 specifies that the last element, the last kvp in an output map, is terminated by a 0x00
without parsing the PSBT, the unpadded length that includes this terminator can't really be determined

Encrypt message a with plaintext "....\x00" and "...\x00\x00" result in the same ciphertext

and

For implementers, if their psbt parser rejects instead of ignoring trailing nulls, they would currently need to find the end by dropping the additional trailing nulls after the first one

Do we know of any PSBT parsing implementations that suffer this behavior that does not follow the specification? i.e. rejecting instead of ignoring trailing nulls

I'm going to suggest simpler ways to solve this. Potentially stupid:

  • determine the unpadded payload length by reading bytes from the end of the buffer to the start until a non-0x00 byte is found. Yes, this depends on a PSBT being the last element of the plaintext payload, but is that a real limitation in the v2 payload?
  • Should the plaintext payload just be versioned to allow us to depend on behavior like e.g. "PSBT is the last element of the payload"

Help me understand whether or not adding the complexity of BOLT 1 TLV encoding is a strict necessity or if it's just a nice-to-have to prevent a theoretical problem

@nothingmuch
Copy link
Collaborator Author

nothingmuch commented Oct 24, 2024

Do we know of any PSBT parsing implementations that suffer this behavior that does not follow the specification? i.e. rejecting instead of ignoring trailing nulls

BIP 174 (and 370) doesn't say anything about ignoring trailing null bytes, only to ignore (but preserve) unknown fields.

Strictly speaking NULL bytes are interpretable, since 4 null bytes are a valid key value pair in BIP 174 consisting of type 0 and empty key and value, then if there's 4n trailing bytes this adds n keypairs, with duplicate (empty) keys, so that's in violation of the spec.

If there's a number of trailing bytes that isn't a multiple of 4, then i'm not sure what happens.

  • determine the unpadded payload length by reading bytes from the end of the buffer to the start until a non-0x00 byte is found. Yes, this depends on a PSBT being the last element of the plaintext payload, but is that a real limitation in the v2 payload?

I don't think this is simpler compared to encoding the length, there are edge cases. For example, if the last keypair in the output map has a value that ends in a null byte, there would need to be two null bytes at the end of the PSBT or it'll be truncated.

  • Should the plaintext payload just be versioned to allow us to depend on behavior like e.g. "PSBT is the last element of the payload"

That's the point of making this a valid TLV, the type can be something other than 0,

Help me understand whether or not adding the complexity of BOLT 1 TLV encoding is a strict necessity or if it's just a nice-to-have to prevent a theoretical problem

The specification I'm suggesting in BIP 77 is: The first byte must be 0, followed by a BOLT-1 BigSize encoded length, followed by that many bytes. Trailing bytes must be ignored

So not actually a TLV stream as far as BIP 77 is concerned, just forwards compatible with TLV stream.

The simpler approach is to specify that it's always prefixed with a signed 16 bit value which must be positive (if it's negative then that's an incompatible version, i.e. top bit is reserved as a type bit effectively).

I prefer the TLV approach, either BOLT-1 or the <key> encoding in BIP 174 (only difference is endianness, I chose big endian/BOLT-1 since it's more clearly documented and I thought I saw other big-endian numerical value encoding but I must be confusing with some other codebase)

@DanGould
Copy link
Contributor

Do we know of any PSBT parsing implementations that suffer this behavior that does not follow the specification? i.e. rejecting instead of ignoring trailing nulls

The answer appears to be no, and it's just a strict interpretation of the BIP174 spec that's caught your attention.

I'm inclined to close this wontfix since it seems like client PSBT parsers could address this on their own.

Have you been bitten by a problem like this before or was it just a careful reading of the spec that brought it up?

payjoin/src/hpke.rs Outdated Show resolved Hide resolved
payjoin/src/hpke.rs Outdated Show resolved Hide resolved
@DanGould
Copy link
Contributor

DanGould commented Oct 24, 2024

@nothingmuch help me understand

If bitcoin-hpke was modified to retain the underlying in-place interface
then this code could be further simplified so that there is only one
PADDED_MESSAGE_BYTES length buffer shared by all steps, which would also
save a copy step.

Can you add more color to what this modification would look like? Are you talking about serializing the pubkey in certain message types by default so that we wouldn't need to implement that in encrypt_message_a?

We've already got a bitcoin-hpke update in this next release, so the time for breaking changes and updating deps is now

payjoin/src/hpke.rs Outdated Show resolved Hide resolved
@nothingmuch
Copy link
Collaborator Author

Can you add more color to what this modification would look like? Are you talking about serializing the pubkey in certain message types by default so that we wouldn't need to implement that in encrypt_message_a?

No, simpler: In both encrypt_message_{a,b}, first allocate a [u8; PADDED_MESSAGE_BYTES] (s the code in this PR does), then borrow mutable slice for the ciphertext + authentication tag, write the plaintext into that, and then do AEAD encryption inplace instead of returning a separate Vec<u8> that is then copied into the target Vec<u8>.

The only motivation is that it makes the bounds checking a little easier to reason about IMO. From a performance standpoint this is a silly micro-optimization, so not a justification by any means.

FWIW, if you're cACK on length tagging and I should improve this PR, I also just learned that Cursor usage is unnecessary for writing, since &mut [u8] implements the Write trait.


For the public record, I made two mistakes in this PR that Dan pointed out in chat:

  1. PSBTs are self terminating after all, since empty keys are syntactically invalid, I missed that and mistakenly concluded those would need to be rejected as duplicate keys
  2. Null termination and binary safety are irrelevant (I forgot the additional layer of encoding, so the plaintext can be treated as a NULL terminated text string).

Therefore the only remaining justifications are:

  1. making x = decrypt(encrypt(x)), general/vague arguments for length encoded strings as opposed to terminated ones (but even in a memory unsafe language such as C there still would be a fixed maximum length, so arguably this is not a memory safety issue)

  2. forward compatibility with BIP 77 extensions that do not signal the extension
    explicitly in the BIP 21 URI. this probably doesn't affect the multiparty transaction scenario since that is harder to make backwards compatible and would use explicit signalling and domain separation anyway.

Although BIP 174 PSBTs are self terminating, storing the length in the
encrypted payload avoid any behavioral dependency on a PSBT parser
ignoring any trailing data (the NULL byte padding).

This is encoded as a BOLT 1 TLV[^1] record with type 0 (and therefore
also a valid TLV stream), facilitating forward compatibility with BIP 77
extensions that might not necessarily signal receiver capabilities in
the BIP 21 URI. This implementation si,mply ignores any trailing data
which would contain any subsequent TLV records with a type larger than
0.

Since payloads are at most 7168 bytes (including the overhead), the
unpadded plaintext length will always fit in 16 bits. For very small
payloads this value is possibly less than 253 (0xfd), resulting in an 8
bit length.

[^1]: https://github.com/lightning/bolts/blob/master/01-messaging.md#type-length-value-format
@nothingmuch nothingmuch force-pushed the TLV branch 2 times, most recently from 35c7d50 to c3d58ac Compare October 24, 2024 21:16
@nothingmuch
Copy link
Collaborator Author

rebased to unbreak the newly merged test

This change enforces that messages have a uniform length at the type
level.

If bitcoin-hpke was modified to retain the underlying in-place interface
then this code could be further simplified so that there is only one
PADDED_MESSAGE_BYTES length buffer shared by all steps, which would also
save a copy step.
- constants instead of magic numbers
- `&mut [u8]` implements Write, no need for Cursor
Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only issue is the unused PADDED_MESSAGE_BYTES import

Comment on lines +36 to +37
#[cfg(feature = "v2")]
use crate::hpke::PADDED_MESSAGE_BYTES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears not to be used anywhere

@nothingmuch
Copy link
Collaborator Author

Closing this since we agreed that this change is not necessary, will followup with BIP 77 PRs to clarify why it isn't necessary after finishing some other stuff.

Are the changes from Vec<u8> to &[u8; <constant>] still desirable on parameters and return types? I would be happy to redo only that as a separate PR that does not change functionality after I finish the higher priority stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants