Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ refactor(gentest,base_types): Improved architecture, implement EthereumTestBaseModel, EthereumTestRootModel #901

Merged
merged 6 commits into from
Nov 6, 2024

Conversation

raxhvl
Copy link
Contributor

@raxhvl raxhvl commented Oct 18, 2024

🗒️ Description

This PR ensures data from gentest to generated tests reaches in a type-safe manner.

The Gentest module generates a test script in three steps:

  1. The CLI entrypoint function captures the user input required for generating the test.
  2. This input is used to generate additional context required for creating tests.
  3. Finally, using this context, a formatted pytest source code is generated and written to disk.

The process of generating context and rendering templates will share the same logic across different kinds of tests. These are created as modules: (cli, test_context_provider, and source_code_generator).

graph TD;
    A[CLI Entry Point] -->|Captures User Input| B[Test Context Provider]
    B -->|Generates Context| C[Source Code Generator]
    C -->|Generates Formatted Pytest Code| D[Disk]
Loading

I am using the same data type when passing data from the provider to the generated source code. This keeps the behavior predictable; Pydantic will raise errors if the RPC call returns unexpected values even before we run the test. I'm leveraging __repr__ values to print the data into the source code.

Finally, the code is formatted using the same formatter (Black) used elsewhere. The existing formatting configurations are applied, so any change in configuration should automatically reflect in the files generated as well. Formatting would become essential when reading contract bytecode, especially with the plans to convert bytecode to python objects (see #862).

This effort makes writing new kinds of gentests easier, requiring only the creation of new test providers and templates.

🔗 Related Issues

closes #859

✅ Checklist

  • All: Set appropriate labels for the changes.
  • All: Considered squashing commits to improve commit history.
  • All: Added an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • Tests: All converted JSON/YML tests from ethereum/tests have been added to converted-ethereum-tests.txt.
  • Tests: A PR with removal of converted JSON/YML tests from ethereum/tests have been opened.
  • Tests: Included the type and version of evm t8n tool used to locally execute test cases: e.g., ref with commit hash or geth 1.13.1-stable-3f40e65.
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it a lot, really clean 🌟 Thanks so much for all the thought you're putting into this.

One comment about the name of the jinja2 filter (cool feature, btw). Feel free to either change it and merge, or simply merge as-is. Thanks!

Also like the idea of applying black formatting on the output.

src/cli/gentest/source_code_generator.py Show resolved Hide resolved
src/cli/gentest/source_code_generator.py Show resolved Hide resolved
src/cli/gentest/cli.py Outdated Show resolved Hide resolved
src/cli/gentest/cli.py Show resolved Hide resolved
@raxhvl
Copy link
Contributor Author

raxhvl commented Oct 23, 2024

I think the output format of the generated test could be made more human readable. Which is why the PR is still in draft.

image

For example, we should ensure addresses are represented as hex and large numbers like v,r,s to be perhaps literal hex numbers. So that it is consistent with manually written tests. How about:

Field Representation in Tests
Address hex string
Hash hex string
Number hex number literal
Code Opcode class?

I think the best way to achieve this is to override __repr__ in individual base types. Perhaps like this:

class HexNumber(Number):
    """
    Class that helps represent an hexadecimal numbers in tests.
    """

+   def __repr__(self):
+       return f"0x{123}"

    def __str__(self) -> str:
        """
        Returns the string representation of the number.
        """
        return self.hex()

I would like you to weigh in on the effect of this across the codebase.

@danceratopz
Copy link
Member

danceratopz commented Oct 23, 2024

I think the output format of the generated test could be made more human readable. Which is why the PR is still in draft.

image

For example, we should ensure addresses are represented as hex and large numbers like v,r,s to be perhaps literal hex numbers. So that it is consistent with manually written tests. How about:
Field Representation in Tests
Address hex string
Hash hex string
Number hex number literal
Code Opcode class?

I think the best way to achieve this is to override __repr__ in individual base types. Perhaps like this:

class HexNumber(Number):
    """
    Class that helps represent an hexadecimal numbers in tests.
    """

+   def __repr__(self):
+       return f"0x{123}"

    def __str__(self) -> str:
        """
        Returns the string representation of the number.
        """
        return self.hex()

I would like you to weigh in on the effect of this across the codebase.

I'll set off with a broader topic, so we have the full picture. Bear with me!

All right, so getting nice representations would also really help us with the Test Case Reference docs, see the bytes objects in the test case parameters tables, for example here: https://ethereum.github.io/execution-spec-tests/main/tests/prague/eip6110_deposits/test_deposits/test_deposit.html

For the objects defined and only used in the test cases (which aren't part of EEST's types libraries), adding a __repr__ would help and seems like the way to go.

For the objects that we do define in our test libraries, we get nice representations (of most objects, at least) when we write them as string to the fixture JSON files. And, instead of just adding __repr__ to all these classes, perhaps we can re-use the pydantic magic. I will try to get back to this later today, but if you'd like to play around in the meantime, try this:

Get some fixtures:

uv run fill tests/istanbul/

Then play around in ipython (https://ethereum.github.io/execution-spec-tests/main/dev/interactive_usage/)

uvx  --with-editable . ipython

for example:

import rich

from ethereum_test_fixtures.file import Fixtures
from ethereum_test_base_types import to_json

fixtures = Fixtures.from_file("fixtures/state_tests/istanbul/eip1344_chainid/chainid/chainid.json", fixture_format=None)
fixtures.keys()
fixtures['tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Istanbul-state_test]']
fixtures['tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Istanbul-state_test]'].env

rich.print(fixtures['tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Shanghai-state_test]'].env)

to_json(fixtures['tests/istanbul/eip1344_chainid/test_chainid.py::test_chainid[fork_Shanghai-state_test]'].env.fee_recipient)

to_json uses pydantic's model_dump():

def to_json(
input: BaseModel | RootModel | AnyStr | List[BaseModel | RootModel | AnyStr],
) -> Any:
"""
Converts a model to its json data representation.
"""
if isinstance(input, list):
return [to_json(item) for item in input]
elif isinstance(input, (BaseModel, RootModel)):
return input.model_dump(mode="json", by_alias=True, exclude_none=True)
else:
return str(input)

Just be aware that fill serializes from the test spec format (e.g. StateTest src/ethereum_test_specs/state.py to (state) Fixture src/ethereum_test_fixtures/state.py) and when we load the fixture from file above we're not creating a StateTest, we're creating a state test Fixture object. Quite a lot of magic happens when we serialize, for example, removing objects that are None; removing objects don't belong in state test fixture, but do in a blockchain test fixture.

I can take a deeper look myself and happy to discuss to brainstorm, but if you have time and you're interested, have play around!

@raxhvl
Copy link
Contributor Author

raxhvl commented Oct 23, 2024

I get your point. The to_json method respects Pydantic attributes (like alias) and filters out None values, while your overrides of __str__ produce nicely formatted JSON.

However, to_json outputs JSON, not valid Python code like __repr__. We need a way to combine the best of both without duplicating effort.

Let me know if my understanding is correct. Meanwhile, let me think this through.

@danceratopz
Copy link
Member

I get your point. The to_json method respects Pydantic attributes (like alias) and filters out None values, while your overrides of __str__ produce nicely formatted JSON.

However, to_json outputs JSON, not valid Python code like __repr__. We need a way to combine the best of both without duplicating effort.

Let me know if my understanding is correct. Meanwhile, let me think this through.

Yup, that's right.

Yes, unfortunately pydantic doesn't deserialize to Python. Calling to_json on each individual field (the only way I can think of doing it at the mo) makes little sense. So, as we'll only need __repr__() for these base classes (I think), it'll have minimal impact to add them.

Perhaps @marioevz can chime in, if he has a better idea? 🙏

Sorry for the detour and that I couldn't give you a better answer earlier @raxhvl.

@raxhvl
Copy link
Contributor Author

raxhvl commented Oct 29, 2024

I have been marinating on this. I believe your intuition is correct: We should implement a unified serialization pipeline. This approach not only eliminates duplicate (tedious) efforts but also ensures that the framework produces consistent data outputs, whether for fixtures or gentests. Fundamentally, gentest should be as close to manual tests as possible.

Our goal is to serialize a Pydantic model to python code. After trying out few things, I have a solution.

We can override __repr__args__ in the root class, which feeds into Pydantic's __repr__. This will ensure that JSON serialization and __repr__ are consistent.

Here is a full example:

from pydantic import BaseModel, Field
from typing import List


class RecipeBaseModel(BaseModel):
   
    def __repr_args__(self):
       # Skip tags
        attrs_names = self.model_dump(mode="json", exclude={"tags"}).keys()
        attrs = ((s, getattr(self, s)) for s in attrs_names)
        return [(a, v) for a, v in attrs if v is not None]


# Example child model
class Ingredient(RecipeBaseModel):
    name: str
    quantity: str
    tags: List[str]


# Example nested child model
class Recipe(RecipeBaseModel):
    title: str
    ingredients: List[Ingredient]


# Example usage
ingredient1 = Ingredient(name="Tomato", quantity="2 cups", tags=["vegetable", "fresh"])
ingredient2 = Ingredient(name="Basil", quantity="1 bunch", tags=["herb", "fresh"])
recipe = Recipe(title="Tomato Basil Salad", ingredients=[ingredient1, ingredient2])

print(repr(ingredient1))  # Ingredient(name='Tomato', quantity='2 cups')
print(
    repr(recipe)
)  # Recipe(title='Tomato Basil Salad', ingredients=[Ingredient(name='Tomato', quantity='2 cups'), Ingredient(name='Basil', quantity='1 bunch')])

I know this will take some effort to implement correctly. But I think incentive to unify serialization will go down as the code diverges because of the refactoring involved.

Let me know if we should pursue this or override __repr__ everywhere for now.

@danceratopz
Copy link
Member

I have been marinating on this. I believe your intuition is correct: We should implement a unified serialization pipeline. This approach not only eliminates duplicate (tedious) efforts but also ensures that the framework produces consistent data outputs, whether for fixtures or gentests. Fundamentally, gentest should be as close to manual tests as possible.

Our goal is to serialize a Pydantic model to python code. After trying out few things, I have a solution.

We can override __repr__args__ in the root class, which feeds into Pydantic's __repr__. This will ensure that JSON serialization and __repr__ are consistent.

Here is a full example:

from pydantic import BaseModel, Field
from typing import List


class RecipeBaseModel(BaseModel):
   
    def __repr_args__(self):
       # Skip tags
        attrs_names = self.model_dump(mode="json", exclude={"tags"}).keys()
        attrs = ((s, getattr(self, s)) for s in attrs_names)
        return [(a, v) for a, v in attrs if v is not None]


# Example child model
class Ingredient(RecipeBaseModel):
    name: str
    quantity: str
    tags: List[str]


# Example nested child model
class Recipe(RecipeBaseModel):
    title: str
    ingredients: List[Ingredient]


# Example usage
ingredient1 = Ingredient(name="Tomato", quantity="2 cups", tags=["vegetable", "fresh"])
ingredient2 = Ingredient(name="Basil", quantity="1 bunch", tags=["herb", "fresh"])
recipe = Recipe(title="Tomato Basil Salad", ingredients=[ingredient1, ingredient2])

print(repr(ingredient1))  # Ingredient(name='Tomato', quantity='2 cups')
print(
    repr(recipe)
)  # Recipe(title='Tomato Basil Salad', ingredients=[Ingredient(name='Tomato', quantity='2 cups'), Ingredient(name='Basil', quantity='1 bunch')])

I know this will take some effort to implement correctly. But I think incentive to unify serialization will go down as the code diverges because of the refactoring involved.

Let me know if we should pursue this or override __repr__ everywhere for now.

Thanks @raxhvl for taking the time to look so deeply into this! This looks like a great approach to unify our serialization and think we can start adding __repr__args__ to our types (or to a new class that the types inherit from).

If I understand correctly, the code paths for serializing the JSON fixture files would remain unchanged? But now we use json.dumps() within the __repr_args__ method to achieve compatibility? In this case, this doesn't seem like a huge change to the codebase (and certainly not a breaking change). But perhaps I've misunderstood how intrusive this is. If I'm right, you could try adding these methods upfront to achieve your goal for gentest now which would allow us to better evaluate the change.

@danceratopz
Copy link
Member

As we discussed, whenever you have time, feel free to just go ahead and add either:

  1. __repr_args__ to a new base class in https://github.com/ethereum/execution-spec-tests/blob/9a109298161db2cf5986bde0781a4d28d3ccf44f/src/ethereum_test_base_types/pydantic.py
  2. Or to the classes you explicitly need for gentest.

As we discussed, doing 1. would be cleaner and probably easier, but 2.'s fine if you'd like to just get a proof of concept. :)

Thanks again for looking into this. Another advantage of unifying pydantic with this serialization to Python is that all the fields that are None will be correctly skipped in the generated test module. I'm referring to the None fields in your screenshot here:
https://github.com/ethereum/execution-spec-tests/blob/9a109298161db2cf5986bde0781a4d28d3ccf44f/src/ethereum_test_base_types/pydantic.py

@raxhvl
Copy link
Contributor Author

raxhvl commented Nov 2, 2024

The new serialization works well

I have added a new base class, here is how it fits in:

Warning

This is outdated. See updated implementation.

classDiagram
   BaseModel <|-- EthereumTestBaseModel
   EthereumTestBaseModel <|-- CopyValidateModel
   CopyValidateModel <|-- CamelModel

   class EthereumTestBaseModel{
    * serialize()
    * __repr_args__()
   }
Loading

The serialize method is a thin wrapper over model_dump which __repr_args__ consumes. The __repr__ logic is working fine. It ignores None fields, supports custom representation if required, and falls back to __str__ representation.

I tried to fill the newly generated test and it succeeds.

 uv run gentest 0xa41f343be7a150b740e5c939fa4d89f3a2850dbe21715df96b612fc20d1906be tests/paris/test_0xa41f.py

 uv run fill tests/paris/test_0xa41f.py                                                                      

=== 6 passed, 6 warnings in 5.21s ===

Should we update to_json?

I need your feedback on a change I would like to see but is not directly related to gentest. Currently model_dump is used both within the new serialize method and within to_json.

return input.model_dump(mode="json", by_alias=True, exclude_none=True)

We could replace this with input.serialize() so that serialization is at one place but I'm not sure if all classes does infact inherrit EthereumTestBaseModel. So the code is now duplicated at two places.

We should NOT modify pre-state in the tests

This seems quite unusual to me. Ideally, the pre-state tracer should provide the state for all accounts prior to the transaction's execution.

if address == self.transaction.sender:
state_str += f"{pad}nonce={self.transaction.nonce},\n"
else:

While the sender's nonce does get updated, that's just one of several side effects resulting from the transaction. Changing the state within the test interferes with the EVM - a big red flag 🟥 . We shouldn't be updating the state ourselves; that's the EVM's responsibility. The test should focus solely on comparing the state transition against the specification.

@raxhvl raxhvl marked this pull request as ready for review November 2, 2024 15:58
Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @raxhvl, this is looking very clean!

Can you rebase on main, this will get some important CI fixes, that will help getting this merged more quickly :)

Few comments:

  1. Yes, I think we should replace model_dump in to_json. You can verify the fixtures locally to ensure that this doesn't break anything by using the hasher cli. Fill the tests twice: Once with main and once with raxhvl:feat/gentest to separate, clean directories using --output=fixtures-raxhvl, for example. Use -n auto to enable parallelism when filling. Then:
uv run hasher fixtures-main/ > hasher-main.txt
uv run hasher fixtures-raxhv/l > hasher-raxhvl.txt
diff fixtures-main.txt hasher-raxhvl.txt
  1. Completely agree that we shouldn't modify any pre-state. I'm not sure why this was originally added.
  2. Side note: Formatting the generated code using black is a great idea, just be aware that black will get switched out to ruff soonish, see feat(all): add ruff as default linter and additional clean up #922. So don't invest too much more time with black - switching to ruff as it stands in this PR should be very easy.

I'd love to get @marioevz's feedback on this. If you'd like to do point 1. up front and check the fixtures, go for it and request a review from him please. If you'd like to get feedback first, we can request one now 🙂

@raxhvl
Copy link
Contributor Author

raxhvl commented Nov 4, 2024

  • Rebased on top of main

re: black to ruff migration.

The code uses cli to format code as long as ruff has a cli interface to format. The migration is trivial (<2 lines of code)

It would be nice if Mario reviews this once we are done.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work so far, thanks!

Just a single comment (the other comment about the fork name is not really something that we need to do right now).

src/ethereum_test_base_types/pydantic.py Outdated Show resolved Hide resolved
@raxhvl
Copy link
Contributor Author

raxhvl commented Nov 5, 2024

All models are now using the same serialization logic. We have two types of custom models: one for BaseModel and one for RootModel. The customization for both of these models is handled through a shared mixin.

classDiagram
   BaseModel <|-- EthereumTestBaseModel
   RootModel <|-- EthereumTestRootModel
   ModelCustomizationsMixin <|-- EthereumTestBaseModel
   ModelCustomizationsMixin <|-- EthereumTestRootModel

   class ModelCustomizationsMixin{
    serialize()
    __repr_args__()
   }
Loading

I have tested this thoroughly, but please review it with care as this change impacts EVERY Pydantic model in the codebase. I'm especially unsure of EthereumTestRootModel override.

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really impressive, great addition! Thanks again @raxhvl for taking the time to find such an elegant solution for this!

@danceratopz
Copy link
Member

I have tested this thoroughly, but please review it with care as this change impacts EVERY Pydantic model in the codebase. I'm especially unsure of EthereumTestRootModel override.

Hey @raxhvl, in regards to this, it looks good to me, I've asked @marioevz if he'd also like to take a look. I diff'd the fixtures locally and got an exact match.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for putting so much thought and care into this!

@marioevz marioevz changed the title ♻️ refactor(gentest): Improved architecture ♻️ refactor(gentest,base_types): Improved architecture, implement EthereumTestBaseModel, EthereumTestRootModel Nov 6, 2024
@marioevz marioevz merged commit 026af18 into ethereum:main Nov 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

✨(gentest): ensure the generated python test module passes tox checks
3 participants