String encoding for return values #185

virgil-serbanuta · 2024-11-08T22:42:20Z

No description provided.

yanliu18 · 2024-11-12T03:54:44Z

It looks good to me. But It see the changes different than @Robertorosmaninho 's latest changes to the ulm-virgil branch.
I might request Roberto to take a look at this PR too.

Important reminder: Do not touch or make changes to ulm-virgil branch unless approved by @Robertorosmaninho.

ulm-semantics/main/preprocessing/endpoints.md

ulm-semantics/main/hooks/bytes.md

sskeirik

It looks good to me in general.

I left a few comments about names/code clarity --- I didn't find any bugs --- though given that this was my first read through, I may not have understood everything.

If nobody else chimes in with issues, I will approve.

sskeirik · 2024-11-19T20:09:38Z

ulm-semantics/main/encoding/syntax.md

+    // assumes that bufferId points to an empty buffer.
+    syntax NonEmptyStatementsOrError ::= encodeStatements(bufferId: Identifier, values: EncodeValues)  [function, total]


The name of this function confused me when I first saw it --- I interpreted encodeStatements to be a function that consumes a list of Statements and encodes them (a similar remark applies to #encodeStatements).

My first suggestion is codegenValuesEncoder; happy to hear other thoughts/suggestions as well.

sskeirik · 2024-11-19T21:39:32Z

tests/ulm-contracts/test_helpers.rs

+  if prefix_size != 32_u256 {
+      fail();
+  };


If I understand correctly, this prefix is supposed to be an integer equal to the length of the string. If so, is that a property worth validating?

It's more complicated. See this for more details: https://docs.soliditylang.org/en/latest/abi-spec.html but, as I understand it, the summary is something like this:

First, values are encoded in two bytes objects (called the "prefix" and the "suffix"), which are concatenated at the end to produce the full encoding. Fixed length values (such as int256) are encoded fully in the prefix (their suffix part is empty). Variable length values (such as strings) have both prefix and suffix parts. The prefix part is equal to the-final-length-of-the-prefix + the-length-of-the-suffix-before-the-current-value (I assume that this makes it easy to find the start of the suffix part of the value). The suffix part contains the length of the value followed by the value bytes.

So, prefix_size here is the prefix encoding of the string, which, for a single encoded string, should be the total length of the prefix. It can only be 32, i.e. the number of bytes needed for any fixed size int with at most 256 bytes.

The :: bytes_hooks :: decode_str call below will receive the suffix, and it will read the length, then it will extract that many bytes, and so on.

sskeirik · 2024-11-19T21:52:26Z

ulm-semantics/main/preprocessing/events.md

@@ -34,7 +34,13 @@ module ULM-PREPROCESSING-EVENTS
            // but the last one are indexed. We should handle generic events.
            => #ulmPreprocessEvent
                ( Method
-                , appendParamToBytes(data, last(Param, Params))
+                , encodeStatements
+                    ( data


A more descriptive variable name than data would help readability slightly, in my opinion.

My first suggestion is logPayload; that's just my preference though, I am happy to hear other suggestions.

Is log_buffer better?

sskeirik · 2024-11-19T22:16:27Z

tests/ulm-with-contract/types.str-raw.run

+    b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x20" +
+    b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05" +
+    b"Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"


Based on my understanding of how encoded values work, this seems like it has an extra value in it?

I expect that the endpoint returns a string, where a str is encoded as follows:

| 32: str.len() | 32x: str payload |

where:

the notation | n: payload | describes a byte field of length n with the given payload

x is chosen to be the smallest value of x such that 32x > str.len().

I don't understand where the initial value x20 is coming from.

See the answer here: #185 (comment) , but a single string is encoded like this (when part of a request, response, log entry, and probably other stuff):

| 32: prefix-length | 32: str.len() | 32x: str payload | |-------prefix------|-------------suffix---------------|

sskeirik · 2024-11-19T22:52:57Z

ulm-semantics/test/execution.md

@@ -178,6 +179,23 @@ module ULM-TEST-EXECUTION

    rule (ulmCancel ~> expect_cancel) => .K

+    syntax Bytes ::= concat(BytesList)  [function, total]
+    rule concat(.BytesList) => b""
+    rule concat(B:Bytes + Bs:BytesList) => B +Bytes concat(Bs)


This is a nitpick, so feel free take it or leave it 😄

To me, + always stands for a verb. When you see it, it means that a function (or perhaps machine instruction) is being invoked --- whereas here, + is used as a data constructor, describing the shape of a value.

Given that feeling, my preference is for a generic infix operator, e.g. ||| or $ or #, etc.

This may not look that good there, but it may make the tests look reasonable, IMO, see this example (I'm not sure if it was obvious where this list is being used):
https://github.com/Pi-Squared-Inc/rust-demo-semantics/pull/185/files#diff-bee3b842fdd29b8f22d8f3ab54cd51ab3d687b3a0a16e848ef877c5c4c20305fR18-R20

So, if you think that "$" would be better both here and in the tests, I'll change it.

yanliu18 · 2024-11-20T01:34:53Z

It looks good to me in general.

I left a few comments about names/code clarity --- I didn't find any bugs --- though given that this was my first read through, I may not have understood everything.

If nobody else chimes in with issues, I will approve.

Hi @sskeirik your review comments are great.
I did went through the code changes yesterday and is OK to approve it.
But there is one more step I need to do which is bring up the tests in BYOL and SWAP demo to make sure the changes work with front-end.
However, I do not have the SWAP demo testing script.

So, please go ahead of approving it. I will create an issue of testing those demos (also need to confirm with @yzhang90 if it is still required to work with the front end demo) in future (if I am not doing it, I shall leave instructions of how it can be done).

And thank you @virgil-serbanuta .

virgil-serbanuta · 2024-11-20T13:16:31Z

[...] But there is one more step I need to do which is bring up the tests in BYOL and SWAP demo to make sure the changes work with front-end. However, I do not have the SWAP demo testing script.

I think that the byol demo works only on the ulm-virgil branch, and there are some things in that branch which are not in main yet (I'm working on it). If you want to test this specific change, I should try to rebase ulm-virgil on top of this (someone else may also do this, but there will be merge/rebase conflicts).

yanliu18 · 2024-11-21T03:19:56Z

[...] But there is one more step I need to do which is bring up the tests in BYOL and SWAP demo to make sure the changes work with front-end. However, I do not have the SWAP demo testing script.

I think that the byol demo works only on the ulm-virgil branch, and there are some things in that branch which are not in main yet (I'm working on it). If you want to test this specific change, I should try to rebase ulm-virgil on top of this (someone else may also do this, but there will be merge/rebase conflicts).

No worries, Virgil.
There have been a lot of shortcuts we made for the demo, similar to other semantics.
I think Yi is working on plans to deal with those shortcuts and decide which way we should go.
But integration testing is needed, @yzhang90 has created an issue here. There is no further changes required from you at this stage. But will let you know when integration testing strategy is discussed.

virgil-serbanuta requested review from ACassimiro and yanliu18 November 11, 2024 19:14

virgil-serbanuta marked this pull request as ready for review November 11, 2024 19:15

yanliu18 requested a review from Robertorosmaninho November 12, 2024 03:54

Robertorosmaninho reviewed Nov 12, 2024

View reviewed changes

ulm-semantics/main/preprocessing/endpoints.md Outdated Show resolved Hide resolved

ulm-semantics/main/hooks/bytes.md Outdated Show resolved Hide resolved

virgil-serbanuta requested review from Robertorosmaninho and sskeirik and removed request for ACassimiro November 12, 2024 19:49

sskeirik reviewed Nov 19, 2024

View reviewed changes

yanliu18 mentioned this pull request Nov 20, 2024

Integrated testing with front-end demo #190

Open

virgil-serbanuta force-pushed the str-encoding branch from b0ffa7f to 81535c2 Compare November 20, 2024 13:18

virgil-serbanuta requested a review from sskeirik November 20, 2024 15:44

virgil-serbanuta added 10 commits November 22, 2024 19:30

String encoding for return values

7a25875

cleanup

bd3767d

cleanup

28bf950

tmp

72dc6b6

Fix tests

f14a42c

Remove unneeded identifier

384aef2

Renamee encodeStatements to codegenValuesEncoder

8254fa2

Rename data to log_buffer

5f4b61d

More documentation

65ac93e

Remove spaces

d40e15d

virgil-serbanuta force-pushed the str-encoding branch from 10f243a to d40e15d Compare November 22, 2024 17:31

sskeirik approved these changes Nov 25, 2024

View reviewed changes

Robertorosmaninho approved these changes Nov 25, 2024

View reviewed changes

virgil-serbanuta merged commit 4b974ad into main Nov 25, 2024
3 checks passed

virgil-serbanuta deleted the str-encoding branch November 25, 2024 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String encoding for return values #185

String encoding for return values #185

virgil-serbanuta commented Nov 8, 2024

yanliu18 commented Nov 12, 2024

sskeirik left a comment

sskeirik Nov 19, 2024

virgil-serbanuta Nov 20, 2024

sskeirik Nov 19, 2024

virgil-serbanuta Nov 20, 2024

sskeirik Nov 19, 2024

virgil-serbanuta Nov 20, 2024

sskeirik Nov 19, 2024 •

edited

Loading

virgil-serbanuta Nov 20, 2024

sskeirik Nov 19, 2024

virgil-serbanuta Nov 20, 2024

yanliu18 commented Nov 20, 2024

virgil-serbanuta commented Nov 20, 2024

yanliu18 commented Nov 21, 2024

		// assumes that bufferId points to an empty buffer.
		syntax NonEmptyStatementsOrError ::= encodeStatements(bufferId: Identifier, values: EncodeValues) [function, total]

String encoding for return values #185

String encoding for return values #185

Conversation

virgil-serbanuta commented Nov 8, 2024

yanliu18 commented Nov 12, 2024

sskeirik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sskeirik Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanliu18 commented Nov 20, 2024

virgil-serbanuta commented Nov 20, 2024

yanliu18 commented Nov 21, 2024

sskeirik Nov 19, 2024 •

edited

Loading