Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abi analyzer #540

Merged
merged 1 commit into from
Dec 10, 2023
Merged

abi analyzer #540

merged 1 commit into from
Dec 10, 2023

Conversation

Andrew7234
Copy link
Collaborator

@Andrew7234 Andrew7234 commented Oct 6, 2023

Part of #872

Adds an analyzer to re-parse event body, tx data, and tx revert reason using known abis.

Note: does not require a reindex

@pro-wh
Copy link
Collaborator

pro-wh commented Oct 6, 2023

👍

@Andrew7234 Andrew7234 force-pushed the andrew7234/abi-analyzer branch from ac4641a to d113244 Compare October 6, 2023 23:52
@Andrew7234 Andrew7234 force-pushed the andrew7234/abi-analyzer branch from d113244 to 99117df Compare November 7, 2023 05:32
@Andrew7234 Andrew7234 force-pushed the andrew7234/abi-analyzer branch from b4350c5 to 62a0a5c Compare November 14, 2023 20:36
@Andrew7234 Andrew7234 marked this pull request as ready for review November 15, 2023 08:00
Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good, only one substantial comment around error handling.

Sorry for the late reply; I did do an early pass of the PR, but I see now that it remained unsubmitted. It doesn't change anything though.

-- For evm.Call transactions, we store both the name of the function and
-- the function paramters.
ADD COLUMN evm_fn_name TEXT,
ADD COLUMN evm_fn_params JSONB,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With events, there was some ugliness around anonymous fields. Fns can have anonymous params, and errors can presumably have anonymous fields (I'm not 100% sure, not checking right now). Can JSONB handle those or should this and error_params be a list of (k, v) tuples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, looks like this JSONB will always be a list, and the items of the list will be just the values v of the (k, v) tuples I was talking about. So to fully render/understand the params, you need to JOIN the abi. I think that's fair. Can you add a little comment to document the structure of this JSON please?

Copy link
Collaborator Author

@Andrew7234 Andrew7234 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit*: I misread which line this comment was on - yes, for transactions it's a json list of values only. Will add the comment

-- original comment (about how event args are processed)
Hm actually the JSONB will have a list of (k,v) tuples, where k is the param name and v is the param value. It shouldn't need to be JOINed with the abi since we do that during event parsing already.

From my understanding the mapping of event param names to values should be 1:1. Warren's ParseEvent extracts the input values whether they're stored in the topics or in the event data, which should be exhaustive. We fetch param names from the Event Inputs, which should also be exhaustive since go eth will autogenerate names for unnamed args.

Copy link
Collaborator Author

@Andrew7234 Andrew7234 Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, after seeing the evm_fn_params for some nftrout calls, it's much easier to interpret as a list of structured args similar to what we do for events.

[{"name": "_taskIds", "value": [55, 56, 57, 58, 59, 60, 61]
, "evm_type": "uint256[]"}, {"name": "_proof", "value": "", "evm_type": "bytes"}, {"name": "_report", "value": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAABoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACwAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAMgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADtiYWZ5cmVpZ3gyZXA1Mm43NHV5dXhxeTR1ZjVvNHl1NGI3ZmRzcHc1d2wzdmx4ZG1mdm1nZWRjeTdqaQAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADtiYWZ5cmVpY2l5cHVocDdneTJoeXB3bGJmbTd3Z2VsNGdoZmpyb2ZxeGFxem9lYWF4cGd3aW4zNG03ZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAADtiYWZ5cmVpY29jdnUzaTVkMnVpcjJta3pjaXRvZmlvdjYzd3Bja3hnZ2pzdXUyZWRjdmZsdnJub2J2cQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADtiYWZ5cmVpZGZ3Z3FzNHhnNDRr
cjJqamV0a2w2aDQ1ZHJtcm41M3ZpYzRzZDZtdWxlcGtkenpvdmJ4bQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADtiYWZ5cmVpYWk3cHNqdTZ5NWxhZnJtb3Fzb25jZDQ2dzZ5a2ZqenZ1cG
dsNG9penBhNG5uc241aTI2dQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADtiYWZ5cmVpaGJkazM2cmQ0emx6b3I0NmJ1Y3JxZXpnbGZsY3piaTIyaXBheWd5NXZuaTZtcXVxZncyNAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADtiYWZ5cmVpYmM1dGxzM2R6cWprM2RiN3lsbmZ6bzNldHY3d241c2NsMmM3NXlpc2xtYnZxNGZ3dDI0aQAAAAAA", "evm_type": "bytes"}]

storage/migrations/22_runtime_abi.up.sql Outdated Show resolved Hide resolved
analyzer/queries/queries.go Outdated Show resolved Hide resolved
analyzer/queries/queries.go Outdated Show resolved Hide resolved
analyzer/queries/queries.go Outdated Show resolved Hide resolved
analyzer/queries/queries.go Outdated Show resolved Hide resolved
analyzer/queries/queries.go Outdated Show resolved Hide resolved
analyzer/evmabibackfill/evm_abi_backfill.go Outdated Show resolved Hide resolved
storage/migrations/02_runtimes.up.sql Outdated Show resolved Hide resolved
@Andrew7234 Andrew7234 force-pushed the andrew7234/abi-analyzer branch from 62a0a5c to 45fad28 Compare November 28, 2023 07:14
Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I realized there's another gotcha around events that might in require a little more performance testing/tweaking. Sorry we didn't think of it before. But in general, getting there :)

analyzer/queries/queries.go Outdated Show resolved Hide resolved
Comment on lines 943 to 950
FROM abi_contracts
JOIN chain.runtime_transactions as txs ON
txs.runtime = abi_contracts.runtime AND
txs.to = abi_contracts.addr AND
txs.method = 'evm.Call'
JOIN chain.runtime_events as evs ON
evs.runtime = abi_contracts.runtime AND
evs.tx_hash = txs.tx_hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on today's slack conversation with Warren, we actually want to avoid joining on the txs because the contract that was the target of tx is not necessarily also the contract that emitted the event. The contract that emitted the event is in the event's body.address. So something like

Suggested change
FROM abi_contracts
JOIN chain.runtime_transactions as txs ON
txs.runtime = abi_contracts.runtime AND
txs.to = abi_contracts.addr AND
txs.method = 'evm.Call'
JOIN chain.runtime_events as evs ON
evs.runtime = abi_contracts.runtime AND
evs.tx_hash = txs.tx_hash
FROM abi_contracts
JOIN chain.runtime_events as evs ON
evs.type = 'evm.log' AND
evs.runtime = abi_contracts.runtime AND
decode(evs.body->>'address', 'base64') = eth_preimage(abi_contracts.contract_address)

That might require an index on body.address (IDK if it's needed, and/or if it would help), or maybe even a generated column for decode(evs.body->>'address', 'base64') and then an index on top of that. See how it goes with the simplest approach first :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm it's slow, even with a generated column. Seems like the eth_preimage fn is the bottleneck - will try to just join over address_preimages instead

oasisindexer=*> explain analyze
WITH abi_contracts AS (
      SELECT
        runtime,
        contract_address AS addr,
        abi
      FROM chain.evm_contracts
      WHERE
        runtime = 'emerald' AND
        abi IS NOT NULL
    )
    SELECT
      abi_contracts.addr,
      abi_contracts.abi,
      evs.round,
      evs.tx_index,
      evs.body
    FROM abi_contracts
    JOIN chain.runtime_events as evs ON
      evs.type = 'evm.log' AND
      evs.runtime = abi_contracts.runtime AND
      evm_contract_address_eth = eth_preimage(abi_contracts.addr)
    LIMIT 1000;
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..181689.03 rows=1000 width=982) (actual time=42717.383..61984.504 rows=197 loops=1)
   ->  Nested Loop  (cost=0.00..2232049.74 rows=12285 width=982) (actual time=42717.381..61984.447 rows=197 loops=1)
         Join Filter: (evs.evm_contract_address_eth = eth_preimage(evm_contracts.contract_address))
         Rows Removed by Join Filter: 326507
         ->  Seq Scan on runtime_events evs  (cost=0.00..1580270.46 rows=614246 width=215) (actual time=15.804..55033.975 rows=46672 loops=1)
               Filter: ((runtime = 'emerald'::runtime) AND (type = 'evm.log'::text))
               Rows Removed by Filter: 20543893
         ->  Materialize  (cost=0.00..678.53 rows=4 width=807) (actual time=0.000..0.001 rows=7 loops=46672)
               ->  Seq Scan on evm_contracts  (cost=0.00..678.51 rows=4 width=807) (actual time=1.088..2.911 rows=7 loops=1)
                     Filter: ((abi IS NOT NULL) AND (runtime = 'emerald'::runtime))
                     Rows Removed by Filter: 7988
 Planning Time: 3.994 ms
 Execution Time: 61984.703 ms
(13 rows)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 is simple to use but not very efficient. That said, I thought it wouldn't matter here because we have so few verified contracts. But the query plan doesn't choose to go contract by contract, trying to find matching events for each contract :(

Copy link
Collaborator Author

@Andrew7234 Andrew7234 Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to join on chain.address_preimages. Also added an index on chain.runtime_events (runtime, type), which makes the query performant enough. (note that the query below is for 1000; in practice we'll use a limit of <100))

Edit*: With a limit of 100 it takes ~80ms, though some of the benefit may be due to caching

oasisindexer=*> explain analyze
WITH abi_contracts AS (
      SELECT
        runtime,
        contract_address AS addr,
        abi
      FROM chain.evm_contracts
      WHERE
        runtime = 'emerald' AND
        abi IS NOT NULL
    )
    SELECT
      abi_contracts.addr,
      abi_contracts.abi,
      evs.round,
      evs.tx_index,
      evs.body
    FROM abi_contracts
    join chain.address_preimages as preimages on abi_contracts.addr = preimages.address JOIN chain.runtime_events as evs ON
      evs.type = 'evm.log' AND
      evs.runtime = abi_contracts.runtime AND
      decode(body->>'address', 'base64') = preimages.address_data
    LIMIT 1000;
                                                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10154.37..1170949.61 rows=114 width=982) (actual time=257.164..2266.097 rows=197 loops=1)
   ->  Gather  (cost=10154.37..1170949.61 rows=114 width=982) (actual time=257.162..2266.052 rows=197 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Hash Join  (cost=9154.37..1169938.21 rows=48 width=982) (actual time=1015.169..2243.789 rows=66 loops=3)
               Hash Cond: (decode((evs.body ->> 'address'::text), 'base64'::text) = preimages.address_data)
               ->  Parallel Bitmap Heap Scan on runtime_events evs  (cost=8442.09..1167623.43 rows=256322 width=183) (actual time=5.136..2204.427 rows=15564 loops=3)
                     Recheck Cond: ((runtime = 'emerald'::runtime) AND (type = 'evm.log'::text))
                     Heap Blocks: exact=7413
                     ->  Bitmap Index Scan on ix_runtime_events_type  (cost=0.00..8288.29 rows=615173 width=0) (actual time=9.769..9.769 rows=46692 loops=1)
                           Index Cond: ((runtime = 'emerald'::runtime) AND (type = 'evm.log'::text))
               ->  Hash  (cost=712.23..712.23 rows=4 width=828) (actual time=5.272..5.274 rows=7 loops=3)
                     Buckets: 1024  Batches: 1  Memory Usage: 14kB
                     ->  Nested Loop  (cost=0.41..712.23 rows=4 width=828) (actual time=1.554..5.236 rows=7 loops=3)
                           ->  Seq Scan on evm_contracts  (cost=0.00..678.51 rows=4 width=807) (actual time=1.501..5.043 rows=7 loops=3)
                                 Filter: ((abi IS NOT NULL) AND (runtime = 'emerald'::runtime))
                                 Rows Removed by Filter: 7988
                           ->  Index Scan using address_preimages_pkey on address_preimages preimages  (cost=0.41..8.43 rows=1 width=68) (actual time=0.024..0.024 rows=1 loops=21)
                                 Index Cond: ((address)::text = (evm_contracts.contract_address)::text)
 Planning Time: 1.216 ms
 Execution Time: 2266.211 ms
(21 rows)

Side note: The generated column for the event address ended up being not needed, but will keep in mind for future performance testing.

analyzer/queries/queries.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

It's unfortunate that our e2e (or other) tests don't currently cover sourcify, and so cannot test this. I assume you tested locally. Luck favors the brave 🙄

@Andrew7234 Andrew7234 force-pushed the andrew7234/abi-analyzer branch from d28ba58 to a116f77 Compare December 8, 2023 03:52
update db schema

initial version of abi analyzer

don't parse result

fix import cycle

proper argument unwrapping

rename migration

nit

update db schema

address comments

address more comments

nit

address comments

address comments

add startup logic

testing fixes

rename migration

nits

nits

enable emerald abi in regression tests (no-op)
@Andrew7234 Andrew7234 force-pushed the andrew7234/abi-analyzer branch from c6824be to c8ce02e Compare December 8, 2023 22:56
@Andrew7234 Andrew7234 merged commit c1b673b into main Dec 10, 2023
6 checks passed
@Andrew7234 Andrew7234 deleted the andrew7234/abi-analyzer branch December 10, 2023 05:02
@Andrew7234 Andrew7234 mentioned this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants