Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call data coding #167

Merged
merged 18 commits into from
Oct 23, 2024
Merged

Call data coding #167

merged 18 commits into from
Oct 23, 2024

Conversation

ACassimiro
Copy link
Collaborator

This PR introduces the encoding and decoding of call data following the Ethereum standard. This call data can be used to trigger function calls in our Rust-lite semantics.

imports INT

rule decodeCallData(D:Bytes) =>
UKMDecodedCallData1(decodeFunctionSignature(substrBytes(D, 0, 8)), decodeArguments(loadArgumentsFromHash(substrBytes(D, 0, 8)), substrBytes(D, 8, lengthBytes(D)), .List) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs "requires lengthBytes(D) >=Int 8"

@yanliu18
Copy link
Member

yanliu18 commented Oct 22, 2024

Since we lack of encoded call data generated from opgeth-KEVM, I am sharing the specification from Solidity, assuming they followed the spec exactly. (@yzhang90 mentioned before.)
https://docs.soliditylang.org/en/develop/abi-spec.html

The reason is I am a bit confused while reading the code, in terms of slicing the bytes, that the length may not be correct.
For examples,

  1. Function Selector

UKMDecodedCallData1(decodeFunctionSignature(substrBytes(D, 0, 8)), decodeArguments(loadArgumentsFromHash(substrBytes(D, 0, 8)), substrBytes(D, 8, lengthBytes(D)), .List) )

Explain: The function selector should be 4 bytes;

Reference: The implementation in the IMP language:
https://github.com/Pi-Squared-Inc/SIMPLE-ulm/blob/830993591ca374fa28f459f027198963ab2930b9/simple-ulm.md?plain=1#L296

  1. Length of bytes to represent of a data type:

ListItem( convertKBytesToPtrValue (T, Bytes2Int ( substrBytes(D, 0, sizeOfType(T)), BE, Unsigned ) ) ) L )

Explain: The sizeOfType(u32) should be 4 = 32/8.

Suggest: Maybe we could change the function name of sizeOfType to byteSizeOfType to differentiate (maybe) with the sizes we used in K to convert into to bytes.

Reference: The implementation in IMP language, they only have int type u256 whose syntax is int, of length 32.

https://github.com/Pi-Squared-Inc/SIMPLE-ulm/blob/830993591ca374fa28f459f027198963ab2930b9/simple-ulm.md?plain=1#L323

Please correct me if my understanding is incorrect.

@ACassimiro
Copy link
Collaborator Author

@yanliu18 thank you for your comments. You are correct regarding the sizing of the types, and that should have been addressed after the latest commits. Regarding the size of the function selector, I'm aware of the ABI specification you referenced, but for some reason, I could only generate correct signatures (matched against signatures generated using this website) after using 8 bytes. This approach also has been used in the encoding of function selectors on KEVM (sample reference).

@yanliu18
Copy link
Member

@ACassimiro thank you for addressing them.
As for length 8 for the function selector, it might make sense that although it is defined to be 4 bytes, it was represented by a byte string of length 8 as we see from example. The IMP semantic implementation could be wrong. Overall, I believe KEVM has more credit over IMP.

If my guesses are correct, then every 32 bytes data should be represented as 64 length byte string?

Please proceed with your PR with the tests while I'll look around for answers.

tests/ukm-contracts/bytes_hooks.rs Show resolved Hide resolved

call_contract 12345;
return_value;
return_value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change not needed

Copy link
Member

@virgil-serbanuta virgil-serbanuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a lot of comments below. I would merge this as-is for now (or, perhaps, fix whatever is easy to fix, but that can take more time than expected), then, after the full encoding+decoding thing works, I would go back and discuss and/or fix the comments I made.

@@ -0,0 +1,6 @@
```k
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe delete the entire file?


// Function signature encoding
rule encodeFunctionSignature(FuncName:String, RL:List, "") =>
encodeFunctionSignature("", RL:List, FuncName +String "(") [priority(40)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more clear if you would add a helper function, called either encodeFunctionSignatureHelper or #encodeFunctionSignature so you would need only two arguments for both encodeFunctionSignature and encodeFunctionSignatureHelper:

    rule encodeFunctionSignature(FuncName:String, RL:List) => 
            encodeFunctionSignatureHelper(RL:List, FuncName +String "(")

rule encodeFunctionSignature("", ListItem(FuncParam:String) .List, FS) =>
encodeFunctionSignature("", .List, FS +String FuncParam )

rule encodeFunctionSignature("", .List, FS) => String2Bytes(substrString(Keccak256(String2Bytes(FS +String ")")), 0, 8))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO to encode this properly? Also, it may be worth extracting the substrString(Keccak256(String2Bytes(FS)), 0, 8) thing to a function that takes a String and produces a String or Bytes (as opposed to taking a StringOrError as below) (perhaps with an encodeAsBytes(...) on top of it) and then use it here and in the rules below.

@@ -174,7 +176,7 @@ module UKM-PREPROCESSING-ENDPOINTS
| signatureType(Type) [function, total]

rule methodSignature(S:String, Ps:NormalizedFunctionParameterList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methodSignature is a total function, but is evaluated to encodeFunctionSignatureAsString which is a partial function. You should probably mark it as total and also evaluate it on the error case (it should also return StringOrError, not just String).


syntax UKMInstruction ::= "ukmEncodePreprocessedCell"

syntax Bytes ::= encodeCallData (String, List, List) [function] //Function name, argument types, argument list
| encodeFunctionSignature (String, List, String) [function]
| encodeFunctionSignature (StringOrError) [function]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding functions? They are being triggered by the production that mocks the encoding of values. The UKMInstruction defined above wasn't modified. It was a part of the module before I started altering it. If not needed, I'll remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my guess is that encodeFunctionSignature (StringOrError) is not being used. A link pointing to a call place would be nice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its true, I should have linked the usage of them. The encodeFunctionSignature (StringOrError) is used for encoding methodSignature, which returns a StringOrError, on endpoints (ref).

encodeCallData is called when mocking the call data encoding, and encodeFunctionSignature (String, List, String) is triggered by encodeCallData (ref).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that methodSignature uses encodeFunctionSignatureAsString(StringOrError), not encodeFunctionSignature (StringOrError)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. I've confused the two methods. If you don't have any objections, I'll double-check if we need encodeFunctionSignature (StringOrError) and remove it if we don't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it. I've also removed the productions related to it after testing. It, indeed, wasn't being used.

| encodeFunctionSignature (String, List, String) [function]
| encodeFunctionSignature (StringOrError) [function]
| encodeFunctionParams (List, List, Bytes) [function]
| convertToKBytes ( Value , String ) [function]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having partial functions raises all sorts of problems, some of which are philosophical, some of which are practical (e.g. it's easy to create a bug like when evaluating a total function to a partial one without checking that the partial function is actually defined for that input; I can provide more details). I would suggest using total functions unless you are sure that you need partial ones.


syntax ExecutionItem ::= "mock" "CallData"
| "mock" "Caller"
| "mock" UkmHook UkmHookResult
| "list_mock" UkmHook UkmHookResult
| "mock" "EncodeCallData"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not mock anything, so I would call it simply "encode_call_data" or something similar. The same is probably true for the line below.

imports RUST-EXECUTION-TEST-PARSING-SYNTAX
imports UKM-HOOKS-UKM-SYNTAX
imports BYTES-SYNTAX

syntax UKMTestTypeHolder ::= "ptr_holder" KItem [strict]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put something more concrete than KItem here, something like Expression, perhaps? Note that below you are just taking a KItem from the stack and passing it here, so that also does not help with figuring out what happens. My current guess is that the stack contains a ptr(Int) that is then put into ptr_holder and then evaluated to a ptrValue(...), so Expression would work here, but I'm currently not sure.

The same question for value_holder.

FWIW, if you actually need to use something as general as KItem, then this is fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried simplifying it to an Expression, but this is leading to some error such as Inner Parser: Sort of variable V inferred as greatest lower bound of [Expression, Value], but candidate bounds are incomparable: [Bool, String].

Given that this is something that we're using for the execution of our tests, I'd prefer to revisit this once the proper parsing of the function signature is addressed in my next PR, so I'm adding this as a TODO.

<test-stack> ListItem(P) L:List => L </test-stack>
rule <k> ptr_holder ptrValue(_, V) => value_holder V ... </k>

rule <k> hold_list_values_from_test_stack => list_ptrs_holder L ~> list_values_holder .List ... </k>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would implement this as:

rule hold_list_values_from_test_stack => hold_stuff(.List)
rule <k> (.K => V) ~> holdStuff(_) ... </k>
    <test_stack> (ListItem(V) => .List) ... </test_stack>
rule ptrValue(_, V) ~> holdStuff(L) => holdStuff(ListItem(V) L)

There is another implementation where you convert the stack to a list of Ptr, then you automatically evaluate that, something like:

    rule <k> hold_list_values_from_test_stack => list_values_holder_or_error ptrListToValueList(listToPtrList(Stack), Values) ... </k>
        <test-stack> Stack => .List </test-stack>
        <values> Values </values>
   rule list_values_holder_or_error Values:ValueList => list_values_holder valuesListToList(Values)

You would have to either write a valuesListToList function or you would need to make list_values_holder take a ValueList (honestly, I think that making it take a ValueList is better anyway).

Probably the best solution would be to make some kind of expression list (I'm not sure which one), put it into a constructor, then evaluate the expression list with heating/cooling, but I'm not going to figure out how that should work unless you are actually interested in that. One example would be to create a tuple, i.e., (ptr(_), ptr(_), ..., ptr(_)), and have that evaluated automatically (with heating/cooling) to a tuple(ValueList). Or something like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a TODO. I'll be solving this together with the other issues raised for ukm-semantics/test/execution.md.


syntax UKMTestTypeHolder ::= "ptr_holder" KItem [strict]
| "value_holder" KItem
| "list_ptrs_holder" List
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a PtrList work here, and a ValueList below?

Copy link
Collaborator Author

@ACassimiro ACassimiro Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a TODO. I'll be solving this together with the other issues raised for ukm-semantics/test/execution.md.

@ACassimiro ACassimiro marked this pull request as ready for review October 23, 2024 23:34
@ACassimiro ACassimiro merged commit 236a8de into main Oct 23, 2024
3 checks passed
@ACassimiro ACassimiro deleted the call-data-coding branch October 23, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants