-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String encoding for return values #185
Conversation
It looks good to me. But It see the changes different than @Robertorosmaninho 's latest changes to the Important reminder: Do not touch or make changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me in general.
I left a few comments about names/code clarity --- I didn't find any bugs --- though given that this was my first read through, I may not have understood everything.
If nobody else chimes in with issues, I will approve.
// assumes that bufferId points to an empty buffer. | ||
syntax NonEmptyStatementsOrError ::= encodeStatements(bufferId: Identifier, values: EncodeValues) [function, total] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this function confused me when I first saw it --- I interpreted encodeStatements
to be a function that consumes a list of Statement
s and encodes them (a similar remark applies to #encodeStatements
).
My first suggestion is codegenValuesEncoder
; happy to hear other thoughts/suggestions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if prefix_size != 32_u256 { | ||
fail(); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this prefix is supposed to be an integer equal to the length of the string. If so, is that a property worth validating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more complicated. See this for more details: https://docs.soliditylang.org/en/latest/abi-spec.html but, as I understand it, the summary is something like this:
First, values are encoded in two bytes objects (called the "prefix" and the "suffix"), which are concatenated at the end to produce the full encoding. Fixed length values (such as int256) are encoded fully in the prefix (their suffix part is empty). Variable length values (such as strings) have both prefix and suffix parts. The prefix part is equal to the-final-length-of-the-prefix + the-length-of-the-suffix-before-the-current-value
(I assume that this makes it easy to find the start of the suffix part of the value). The suffix part contains the length of the value followed by the value bytes.
So, prefix_size
here is the prefix encoding of the string, which, for a single encoded string, should be the total length of the prefix. It can only be 32
, i.e. the number of bytes needed for any fixed size int with at most 256 bytes.
The :: bytes_hooks :: decode_str
call below will receive the suffix, and it will read the length, then it will extract that many bytes, and so on.
@@ -34,7 +34,13 @@ module ULM-PREPROCESSING-EVENTS | |||
// but the last one are indexed. We should handle generic events. | |||
=> #ulmPreprocessEvent | |||
( Method | |||
, appendParamToBytes(data, last(Param, Params)) | |||
, encodeStatements | |||
( data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more descriptive variable name than data
would help readability slightly, in my opinion.
My first suggestion is logPayload
; that's just my preference though, I am happy to hear other suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is log_buffer
better?
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x20" + | ||
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05" + | ||
b"Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of how encoded values work, this seems like it has an extra value in it?
I expect that the endpoint returns a string, where a str is encoded as follows:
| 32: str.len() | 32x: str payload |
where:
- the notation
| n: payload |
describes a byte field of lengthn
with the given payload x
is chosen to be the smallest value ofx
such that32x > str.len()
.
I don't understand where the initial value x20 is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the answer here: #185 (comment) , but a single string is encoded like this (when part of a request, response, log entry, and probably other stuff):
| 32: prefix-length | 32: str.len() | 32x: str payload |
|-------prefix------|-------------suffix---------------|
@@ -178,6 +179,23 @@ module ULM-TEST-EXECUTION | |||
|
|||
rule (ulmCancel ~> expect_cancel) => .K | |||
|
|||
syntax Bytes ::= concat(BytesList) [function, total] | |||
rule concat(.BytesList) => b"" | |||
rule concat(B:Bytes + Bs:BytesList) => B +Bytes concat(Bs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nitpick, so feel free take it or leave it 😄
To me, +
always stands for a verb. When you see it, it means that a function (or perhaps machine instruction) is being invoked --- whereas here, +
is used as a data constructor, describing the shape of a value.
Given that feeling, my preference is for a generic infix operator, e.g. |||
or $
or #
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not look that good there, but it may make the tests look reasonable, IMO, see this example (I'm not sure if it was obvious where this list is being used):
https://github.com/Pi-Squared-Inc/rust-demo-semantics/pull/185/files#diff-bee3b842fdd29b8f22d8f3ab54cd51ab3d687b3a0a16e848ef877c5c4c20305fR18-R20
So, if you think that "$" would be better both here and in the tests, I'll change it.
Hi @sskeirik your review comments are great. So, please go ahead of approving it. I will create an issue of testing those demos (also need to confirm with @yzhang90 if it is still required to work with the front end demo) in future (if I am not doing it, I shall leave instructions of how it can be done). And thank you @virgil-serbanuta . |
I think that the byol demo works only on the ulm-virgil branch, and there are some things in that branch which are not in |
b0ffa7f
to
81535c2
Compare
No worries, Virgil. |
10f243a
to
d40e15d
Compare
No description provided.