Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding for return values #185

Merged
merged 10 commits into from
Nov 25, 2024
Merged

String encoding for return values #185

merged 10 commits into from
Nov 25, 2024

Conversation

virgil-serbanuta
Copy link
Member

No description provided.

@virgil-serbanuta virgil-serbanuta marked this pull request as ready for review November 11, 2024 19:15
@yanliu18
Copy link
Member

It looks good to me. But It see the changes different than @Robertorosmaninho 's latest changes to the ulm-virgil branch.
I might request Roberto to take a look at this PR too.

Important reminder: Do not touch or make changes to ulm-virgil branch unless approved by @Robertorosmaninho.

@virgil-serbanuta virgil-serbanuta requested review from Robertorosmaninho and sskeirik and removed request for ACassimiro November 12, 2024 19:49
Copy link

@sskeirik sskeirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me in general.

I left a few comments about names/code clarity --- I didn't find any bugs --- though given that this was my first read through, I may not have understood everything.

If nobody else chimes in with issues, I will approve.

Comment on lines 27 to 28
// assumes that bufferId points to an empty buffer.
syntax NonEmptyStatementsOrError ::= encodeStatements(bufferId: Identifier, values: EncodeValues) [function, total]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function confused me when I first saw it --- I interpreted encodeStatements to be a function that consumes a list of Statements and encodes them (a similar remark applies to #encodeStatements).

My first suggestion is codegenValuesEncoder; happy to hear other thoughts/suggestions as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +20 to +22
if prefix_size != 32_u256 {
fail();
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this prefix is supposed to be an integer equal to the length of the string. If so, is that a property worth validating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more complicated. See this for more details: https://docs.soliditylang.org/en/latest/abi-spec.html but, as I understand it, the summary is something like this:

First, values are encoded in two bytes objects (called the "prefix" and the "suffix"), which are concatenated at the end to produce the full encoding. Fixed length values (such as int256) are encoded fully in the prefix (their suffix part is empty). Variable length values (such as strings) have both prefix and suffix parts. The prefix part is equal to the-final-length-of-the-prefix + the-length-of-the-suffix-before-the-current-value (I assume that this makes it easy to find the start of the suffix part of the value). The suffix part contains the length of the value followed by the value bytes.

So, prefix_size here is the prefix encoding of the string, which, for a single encoded string, should be the total length of the prefix. It can only be 32, i.e. the number of bytes needed for any fixed size int with at most 256 bytes.

The :: bytes_hooks :: decode_str call below will receive the suffix, and it will read the length, then it will extract that many bytes, and so on.

@@ -34,7 +34,13 @@ module ULM-PREPROCESSING-EVENTS
// but the last one are indexed. We should handle generic events.
=> #ulmPreprocessEvent
( Method
, appendParamToBytes(data, last(Param, Params))
, encodeStatements
( data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more descriptive variable name than data would help readability slightly, in my opinion.

My first suggestion is logPayload; that's just my preference though, I am happy to hear other suggestions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is log_buffer better?

Comment on lines +18 to +20
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x20" +
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05" +
b"Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
Copy link

@sskeirik sskeirik Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding of how encoded values work, this seems like it has an extra value in it?

I expect that the endpoint returns a string, where a str is encoded as follows:

| 32: str.len() | 32x: str payload |

where:

  1. the notation | n: payload | describes a byte field of length n with the given payload
  2. x is chosen to be the smallest value of x such that 32x > str.len().

I don't understand where the initial value x20 is coming from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the answer here: #185 (comment) , but a single string is encoded like this (when part of a request, response, log entry, and probably other stuff):

| 32: prefix-length | 32: str.len() | 32x: str payload |
|-------prefix------|-------------suffix---------------|

@@ -178,6 +179,23 @@ module ULM-TEST-EXECUTION

rule (ulmCancel ~> expect_cancel) => .K

syntax Bytes ::= concat(BytesList) [function, total]
rule concat(.BytesList) => b""
rule concat(B:Bytes + Bs:BytesList) => B +Bytes concat(Bs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nitpick, so feel free take it or leave it 😄

To me, + always stands for a verb. When you see it, it means that a function (or perhaps machine instruction) is being invoked --- whereas here, + is used as a data constructor, describing the shape of a value.

Given that feeling, my preference is for a generic infix operator, e.g. ||| or $ or #, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not look that good there, but it may make the tests look reasonable, IMO, see this example (I'm not sure if it was obvious where this list is being used):
https://github.com/Pi-Squared-Inc/rust-demo-semantics/pull/185/files#diff-bee3b842fdd29b8f22d8f3ab54cd51ab3d687b3a0a16e848ef877c5c4c20305fR18-R20

So, if you think that "$" would be better both here and in the tests, I'll change it.

@yanliu18
Copy link
Member

It looks good to me in general.

I left a few comments about names/code clarity --- I didn't find any bugs --- though given that this was my first read through, I may not have understood everything.

If nobody else chimes in with issues, I will approve.

Hi @sskeirik your review comments are great.
I did went through the code changes yesterday and is OK to approve it.
But there is one more step I need to do which is bring up the tests in BYOL and SWAP demo to make sure the changes work with front-end.
However, I do not have the SWAP demo testing script.

So, please go ahead of approving it. I will create an issue of testing those demos (also need to confirm with @yzhang90 if it is still required to work with the front end demo) in future (if I am not doing it, I shall leave instructions of how it can be done).

And thank you @virgil-serbanuta .

@virgil-serbanuta
Copy link
Member Author

[...] But there is one more step I need to do which is bring up the tests in BYOL and SWAP demo to make sure the changes work with front-end. However, I do not have the SWAP demo testing script.

I think that the byol demo works only on the ulm-virgil branch, and there are some things in that branch which are not in main yet (I'm working on it). If you want to test this specific change, I should try to rebase ulm-virgil on top of this (someone else may also do this, but there will be merge/rebase conflicts).

@yanliu18
Copy link
Member

[...] But there is one more step I need to do which is bring up the tests in BYOL and SWAP demo to make sure the changes work with front-end. However, I do not have the SWAP demo testing script.

I think that the byol demo works only on the ulm-virgil branch, and there are some things in that branch which are not in main yet (I'm working on it). If you want to test this specific change, I should try to rebase ulm-virgil on top of this (someone else may also do this, but there will be merge/rebase conflicts).

No worries, Virgil.
There have been a lot of shortcuts we made for the demo, similar to other semantics.
I think Yi is working on plans to deal with those shortcuts and decide which way we should go.
But integration testing is needed, @yzhang90 has created an issue here. There is no further changes required from you at this stage. But will let you know when integration testing strategy is discussed.

@virgil-serbanuta virgil-serbanuta merged commit 4b974ad into main Nov 25, 2024
3 checks passed
@virgil-serbanuta virgil-serbanuta deleted the str-encoding branch November 25, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants