Call data coding #167

ACassimiro · 2024-10-20T23:04:54Z

This PR introduces the encoding and decoding of call data following the Ethereum standard. This call data can be used to trigger function calls in our Rust-lite semantics.

virgil-serbanuta · 2024-10-21T13:08:41Z

ukm-semantics/main/decoding/decoder.md

+    imports INT
+
+    rule decodeCallData(D:Bytes) => 
+            UKMDecodedCallData1(decodeFunctionSignature(substrBytes(D, 0, 8)), decodeArguments(loadArgumentsFromHash(substrBytes(D, 0, 8)), substrBytes(D, 8, lengthBytes(D)), .List) )


Needs "requires lengthBytes(D) >=Int 8"

ukm-semantics/main/decoding/decoder.md

yanliu18 · 2024-10-22T03:38:27Z

Since we lack of encoded call data generated from opgeth-KEVM, I am sharing the specification from Solidity, assuming they followed the spec exactly. (@yzhang90 mentioned before.)
https://docs.soliditylang.org/en/develop/abi-spec.html

The reason is I am a bit confused while reading the code, in terms of slicing the bytes, that the length may not be correct.
For examples,

Function Selector

rust-demo-semantics/ukm-semantics/main/decoding/decoder.md

Line 17 in 4175b9d

    
                       UKMDecodedCallData1(decodeFunctionSignature(substrBytes(D, 0, 8)), decodeArguments(loadArgumentsFromHash(substrBytes(D, 0, 8)), substrBytes(D, 8, lengthBytes(D)), .List) )

Explain: The function selector should be 4 bytes;

Reference: The implementation in the IMP language:
https://github.com/Pi-Squared-Inc/SIMPLE-ulm/blob/830993591ca374fa28f459f027198963ab2930b9/simple-ulm.md?plain=1#L296

Length of bytes to represent of a data type:

rust-demo-semantics/ukm-semantics/main/decoding/decoder.md

Line 59 in 4175b9d

    
                       ListItem( convertKBytesToPtrValue (T, Bytes2Int ( substrBytes(D, 0, sizeOfType(T)), BE, Unsigned ) ) ) L )

rust-demo-semantics/ukm-semantics/main/decoding/decoder.md

Line 70 in 4175b9d

rule sizeOfType(u32) => 32

Explain: The sizeOfType(u32) should be 4 = 32/8.

Suggest: Maybe we could change the function name of sizeOfType to byteSizeOfType to differentiate (maybe) with the sizes we used in K to convert into to bytes.

Reference: The implementation in IMP language, they only have int type u256 whose syntax is int, of length 32.

https://github.com/Pi-Squared-Inc/SIMPLE-ulm/blob/830993591ca374fa28f459f027198963ab2930b9/simple-ulm.md?plain=1#L323

Please correct me if my understanding is incorrect.

… to search path

ACassimiro · 2024-10-23T04:15:54Z

@yanliu18 thank you for your comments. You are correct regarding the sizing of the types, and that should have been addressed after the latest commits. Regarding the size of the function selector, I'm aware of the ABI specification you referenced, but for some reason, I could only generate correct signatures (matched against signatures generated using this website) after using 8 bytes. This approach also has been used in the encoding of function selectors on KEVM (sample reference).

yanliu18 · 2024-10-23T04:40:16Z

@ACassimiro thank you for addressing them.
As for length 8 for the function selector, it might make sense that although it is defined to be 4 bytes, it was represented by a byte string of length 8 as we see from example. The IMP semantic implementation could be wrong. Overall, I believe KEVM has more credit over IMP.

If my guesses are correct, then every 32 bytes data should be represented as 64 length byte string?

Please proceed with your PR with the tests while I'll look around for answers.

tests/ukm-contracts/bytes_hooks.rs

virgil-serbanuta · 2024-10-23T14:34:15Z

tests/ukm-with-contract/endpoints.1.run


 call_contract 12345;
-return_value;
+return_value; 


Change not needed

virgil-serbanuta

I made a lot of comments below. I would merge this as-is for now (or, perhaps, fix whatever is easy to fix, but that can take more time than expected), then, after the full encoding+decoding thing works, I would go back and discuss and/or fix the comments I made.

virgil-serbanuta · 2024-10-23T17:35:46Z

ukm-semantics/main/decoding/decoder.md

@@ -0,0 +1,6 @@
+```k


Maybe delete the entire file?

virgil-serbanuta · 2024-10-23T17:50:29Z

ukm-semantics/main/encoding/encoder.md

+
+    // Function signature encoding
+    rule encodeFunctionSignature(FuncName:String, RL:List, "") => 
+            encodeFunctionSignature("", RL:List, FuncName +String "(") [priority(40)]


This would be more clear if you would add a helper function, called either encodeFunctionSignatureHelper or #encodeFunctionSignature so you would need only two arguments for both encodeFunctionSignature and encodeFunctionSignatureHelper:

rule encodeFunctionSignature(FuncName:String, RL:List) => encodeFunctionSignatureHelper(RL:List, FuncName +String "(")

virgil-serbanuta · 2024-10-23T17:51:53Z

ukm-semantics/main/encoding/encoder.md

+    rule encodeFunctionSignature("", ListItem(FuncParam:String) .List, FS) => 
+                encodeFunctionSignature("", .List, FS +String FuncParam ) 
+
+    rule encodeFunctionSignature("", .List, FS) => String2Bytes(substrString(Keccak256(String2Bytes(FS  +String ")")), 0, 8)) 


Add a TODO to encode this properly? Also, it may be worth extracting the substrString(Keccak256(String2Bytes(FS)), 0, 8) thing to a function that takes a String and produces a String or Bytes (as opposed to taking a StringOrError as below) (perhaps with an encodeAsBytes(...) on top of it) and then use it here and in the rules below.

virgil-serbanuta · 2024-10-23T17:57:17Z

ukm-semantics/main/preprocessing/endpoints.md

@@ -174,7 +176,7 @@ module UKM-PREPROCESSING-ENDPOINTS
                            | signatureType(Type)  [function, total]

    rule methodSignature(S:String, Ps:NormalizedFunctionParameterList)


methodSignature is a total function, but is evaluated to encodeFunctionSignatureAsString which is a partial function. You should probably mark it as total and also evaluate it on the error case (it should also return StringOrError, not just String).

virgil-serbanuta · 2024-10-23T18:19:05Z

ukm-semantics/main/encoding/syntax.md


    syntax UKMInstruction ::= "ukmEncodePreprocessedCell"

+    syntax Bytes ::= encodeCallData (String, List, List) [function] //Function name, argument types, argument list
+                   | encodeFunctionSignature (String, List, String) [function]
+                   | encodeFunctionSignature (StringOrError) [function]


Is this used anywhere?

The encoding functions? They are being triggered by the production that mocks the encoding of values. The UKMInstruction defined above wasn't modified. It was a part of the module before I started altering it. If not needed, I'll remove it.

Ok, my guess is that encodeFunctionSignature (StringOrError) is not being used. A link pointing to a call place would be nice.

Its true, I should have linked the usage of them. The encodeFunctionSignature (StringOrError) is used for encoding methodSignature, which returns a StringOrError, on endpoints (ref).

encodeCallData is called when mocking the call data encoding, and encodeFunctionSignature (String, List, String) is triggered by encodeCallData (ref).

I thought that methodSignature uses encodeFunctionSignatureAsString(StringOrError), not encodeFunctionSignature (StringOrError)

That is true. I've confused the two methods. If you don't have any objections, I'll double-check if we need encodeFunctionSignature (StringOrError) and remove it if we don't.

I have removed it. I've also removed the productions related to it after testing. It, indeed, wasn't being used.

virgil-serbanuta · 2024-10-23T18:22:28Z

ukm-semantics/main/encoding/syntax.md

+                   | encodeFunctionSignature (String, List, String) [function]
+                   | encodeFunctionSignature (StringOrError) [function]
+                   | encodeFunctionParams (List, List, Bytes) [function]
+                   | convertToKBytes ( Value , String ) [function]


Having partial functions raises all sorts of problems, some of which are philosophical, some of which are practical (e.g. it's easy to create a bug like when evaluating a total function to a partial one without checking that the partial function is actually defined for that input; I can provide more details). I would suggest using total functions unless you are sure that you need partial ones.

virgil-serbanuta · 2024-10-23T18:32:32Z

ukm-semantics/test/execution.md


    syntax ExecutionItem  ::= "mock" "CallData"
                            | "mock" "Caller"
                            | "mock" UkmHook UkmHookResult
                            | "list_mock" UkmHook UkmHookResult
+                            | "mock" "EncodeCallData"


This does not mock anything, so I would call it simply "encode_call_data" or something similar. The same is probably true for the line below.

virgil-serbanuta · 2024-10-23T18:40:01Z

ukm-semantics/test/execution.md

    imports RUST-EXECUTION-TEST-PARSING-SYNTAX
    imports UKM-HOOKS-UKM-SYNTAX
+    imports BYTES-SYNTAX
+
+    syntax UKMTestTypeHolder ::= "ptr_holder" KItem [strict]


Is it possible to put something more concrete than KItem here, something like Expression, perhaps? Note that below you are just taking a KItem from the stack and passing it here, so that also does not help with figuring out what happens. My current guess is that the stack contains a ptr(Int) that is then put into ptr_holder and then evaluated to a ptrValue(...), so Expression would work here, but I'm currently not sure.

The same question for value_holder.

FWIW, if you actually need to use something as general as KItem, then this is fine.

I have tried simplifying it to an Expression, but this is leading to some error such as Inner Parser: Sort of variable V inferred as greatest lower bound of [Expression, Value], but candidate bounds are incomparable: [Bool, String].

Given that this is something that we're using for the execution of our tests, I'd prefer to revisit this once the proper parsing of the function signature is addressed in my next PR, so I'm adding this as a TODO.

virgil-serbanuta · 2024-10-23T19:04:21Z

ukm-semantics/test/execution.md

+         <test-stack> ListItem(P) L:List => L </test-stack>
+    rule <k> ptr_holder ptrValue(_, V) => value_holder V ... </k>
+
+    rule <k> hold_list_values_from_test_stack => list_ptrs_holder L ~> list_values_holder .List ... </k>


I would implement this as:

rule hold_list_values_from_test_stack => hold_stuff(.List) rule <k> (.K => V) ~> holdStuff(_) ... </k> <test_stack> (ListItem(V) => .List) ... </test_stack> rule ptrValue(_, V) ~> holdStuff(L) => holdStuff(ListItem(V) L)

There is another implementation where you convert the stack to a list of Ptr, then you automatically evaluate that, something like:

rule <k> hold_list_values_from_test_stack => list_values_holder_or_error ptrListToValueList(listToPtrList(Stack), Values) ... </k> <test-stack> Stack => .List </test-stack> <values> Values </values> rule list_values_holder_or_error Values:ValueList => list_values_holder valuesListToList(Values)

You would have to either write a valuesListToList function or you would need to make list_values_holder take a ValueList (honestly, I think that making it take a ValueList is better anyway).

Probably the best solution would be to make some kind of expression list (I'm not sure which one), put it into a constructor, then evaluate the expression list with heating/cooling, but I'm not going to figure out how that should work unless you are actually interested in that. One example would be to create a tuple, i.e., (ptr(_), ptr(_), ..., ptr(_)), and have that evaluated automatically (with heating/cooling) to a tuple(ValueList). Or something like that.

Added as a TODO. I'll be solving this together with the other issues raised for ukm-semantics/test/execution.md.

virgil-serbanuta · 2024-10-23T19:05:37Z

ukm-semantics/test/execution.md

+
+    syntax UKMTestTypeHolder ::= "ptr_holder" KItem [strict]
+                                | "value_holder" KItem
+                                | "list_ptrs_holder" List


Would a PtrList work here, and a ValueList below?

Added as a TODO. I'll be solving this together with the other issues raised for ukm-semantics/test/execution.md.

ACassimiro added 6 commits October 16, 2024 23:34

Adding encoding capabilities

fe926fa

Enabling encoding of endpoints at preprocessing stage

c1cb9a3

Enabling call data arguments decoding

0be8974

Fixing minor issue with decoding arguments

f1b24dc

Properly returning an object representing the decoded call data

580676d

Enabling function calls from decoded call data

4175b9d

virgil-serbanuta reviewed Oct 21, 2024

View reviewed changes

yanliu18 reviewed Oct 22, 2024

View reviewed changes

ukm-semantics/main/decoding/decoder.md Outdated Show resolved Hide resolved

ACassimiro added 5 commits October 22, 2024 19:05

Modifying the decoding approach to comply with Ethereum's

f3b27b1

Merge branch 'main' into call-data-coding

434e61a

Fixing minor issue from merge and adding the blockchain plugin folder…

f344860

… to search path

Use lowercase type names in function signature

ffbb0cf

Adapting tests to properly encode call data

6d075ef

virgil-serbanuta added 4 commits October 23, 2024 17:28

Fix build

6a140b2

Add random dependencies

f1caec9

Fix dependencies

4f39b2f

Fix dependencies

73d9494

virgil-serbanuta reviewed Oct 23, 2024

View reviewed changes

ACassimiro added 2 commits October 23, 2024 14:16

Code cleanup

60a540a

Code cleanup

b8ca609

virgil-serbanuta approved these changes Oct 23, 2024

View reviewed changes

Addressing review

1bdbb7d

ACassimiro marked this pull request as ready for review October 23, 2024 23:34

ACassimiro merged commit 236a8de into main Oct 23, 2024
3 checks passed

ACassimiro deleted the call-data-coding branch October 23, 2024 23:34

virgil-serbanuta mentioned this pull request Oct 24, 2024

Call data encoding and decoding #171

Merged

ACassimiro mentioned this pull request Oct 26, 2024

Implement value encoding and decoding #142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call data coding #167

Call data coding #167

ACassimiro commented Oct 20, 2024

virgil-serbanuta Oct 21, 2024

yanliu18 commented Oct 22, 2024 •

edited

Loading

ACassimiro commented Oct 23, 2024

yanliu18 commented Oct 23, 2024

virgil-serbanuta Oct 23, 2024

virgil-serbanuta left a comment

virgil-serbanuta Oct 23, 2024

virgil-serbanuta Oct 23, 2024

virgil-serbanuta Oct 23, 2024

virgil-serbanuta Oct 23, 2024

virgil-serbanuta Oct 23, 2024

ACassimiro Oct 23, 2024

virgil-serbanuta Oct 23, 2024

ACassimiro Oct 23, 2024

virgil-serbanuta Oct 23, 2024

ACassimiro Oct 23, 2024

ACassimiro Oct 23, 2024

virgil-serbanuta Oct 23, 2024

virgil-serbanuta Oct 23, 2024

virgil-serbanuta Oct 23, 2024

ACassimiro Oct 23, 2024

virgil-serbanuta Oct 23, 2024

ACassimiro Oct 23, 2024

virgil-serbanuta Oct 23, 2024

ACassimiro Oct 23, 2024 •

edited

Loading

		@@ -174,7 +176,7 @@ module UKM-PREPROCESSING-ENDPOINTS
		\| signatureType(Type) [function, total]

		rule methodSignature(S:String, Ps:NormalizedFunctionParameterList)

Call data coding #167

Call data coding #167

Conversation

ACassimiro commented Oct 20, 2024

Choose a reason for hiding this comment

yanliu18 commented Oct 22, 2024 • edited Loading

ACassimiro commented Oct 23, 2024

yanliu18 commented Oct 23, 2024

Choose a reason for hiding this comment

virgil-serbanuta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ACassimiro Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

yanliu18 commented Oct 22, 2024 •

edited

Loading

ACassimiro Oct 23, 2024 •

edited

Loading