SIMD-0057: Events #57

mschneider · 2023-06-13T19:59:14Z

Propose a new event logging mechanism that is reliable and efficient.
Browser clients should be able to receive emitted events with minimal latency.
3rd party indexers should allow to horizontally scale access to historic
events.

jarry-xiao · 2023-06-14T22:16:02Z

proposals/0057-events.md

+actual application. It also causeses unnecessary overhead in replay as a new
+execution context neeeds to be created, for something as trivial as a log
+message. There is a potential security risk in the this system in that external
+programs could spoof anchor events (and the program emitting them) with CPI


The spoofing risk was mitigated by adding in a PDA event authority that must sign the self CPI

just to confirm, does this require to add another account per program that wants to emit logs to the transaction?

Yep, but it’s only 34 bytes. (0 data, only pubkey + flags)

proposals/0057-events.md

jarry-xiao · 2023-06-14T22:31:27Z

I fully support the development of more robust indexing solutions + persistent event data, but it's hard for me to see a strict benefit in the near term future of native events over CPI events (which is admittedly a hack). From a resource allocation standpoint, this feels like large engineering lift + ecosystem buy-in lift for insufficient gain IMO. No opposition if a native solution gets prioritized and implemented with 5x improvement to existing devex.

proposals/0057-events.md

mvines · 2023-07-17T17:11:06Z

proposals/0057-events.md

+
+```
+#include <sol/emit.h>
+void sol_emit_(/* r1 */ uint64_t len,


How is the "Discriminator" expressed?

Also it seems better that the program specify the addresses it wants to associate with the event no? This would be part of the price, and allow the indexing backends to only persist the addresses that actually matter instead of requiring every address in the transaction.

discriminator: not at all, i thought we can get by without supporting it explicitly in the standard. clients can decide to have an index on the program id + first n (1..32) bytes for performance reasons, but that's implementation detail.

indexing backends anyways have a gsfa index, an additional discriminator -> signature index can be used for index joins. do you think that's unrealistic?

sorry, what's a gsfa?

clients can decide to have an index on the program id + first n (1..32) bytes for performance reasons, but that's implementation detail.

i think leaving this as an implementation detail means we could then expect to see different behaviours across rpc endpoints, but that's the problem with sol_log() that this proposal is attempting to solve though

gsfa = getSignaturesForAddress

imo this is very application specific, i would expect the program IDL to declare to the indexing provider which parts of the event (for each event type) need indexing and hence which queries can have improved latency. the difference here is that the data query API is defined which is important for compatibility, so same query bytes -> same result bytes. latency is still dependent on rpc provider, it's not something you could enforce via spec.

we could add the discriminator ofc. but i thought at this point it's better to just leave it out of the spec to keep stuff lightweight and implement a 8 byte prefix index in the reference client.

that this proposal is attempting to solve though

just want to comment again on this:

proposal solves provider specific truncation and lack of query api schema, basically enforces data consistency across providers.

it does not guarantee reliable query performance, similar to how SQL spec can't enforce query performance.

ripatel-fd · 2023-07-20T16:32:13Z

proposals/0057-events.md

+```
+#include <sol/emit.h>
+void sol_emit_(/* r1 */ uint64_t len,
+               /* r2 */ const char *message);


@mschneider OCD thing - In other syscalls, pointer comes first and length afterwards. Could we swap arguments here?

The events should have a sequence number so that I can "subscribe" to an event stream and restart at a specific sequence number. This would match up with the event behavior for aptos, sui, and (via a hack or two) algorand.

@jumpsiegel wrong thread to post the comment, but I second that

the events on stream and when queried via rpc call include a sequence number, just to clarify, would it be enough to be able to send a request to backfill missing events (once you notice discontinuity?) i find stream semantic for historic events hard to reason about. curious to see the api's you mentioned, could you drop me the links @jumpsiegel ?

https://aptos.dev/concepts/events

https://docs.sui.io/build/event_api

@jumpsiegel i added a way to query events directly and included the sequence number for that purpose. Due to the way solana randomizes block entry execution there is some weirdness around how events can change their order inside the response, I clarified that. Please let me know if this is sufficient

I must admit, I am confused by this statement. Does this mean different validators might have events with different sequence numbers? ie, the sequence number is non-deterministic and not part of consensus.. The effect of this would be that if I change event providers (validators) while going through a load balancer, the sequence numbers would change and I would find myself scrambling to figure out what I have missed?

Yes, that's the current behaviour of gSFA. @mvines mentioned it's possible to clean up, so i'll update the proposal wrt. i think it's a huge improvement to change this part.

in general sequence numbers can still change due to block re-orgs. i updated the proposal text, would appreciate a final review and ideally thumbs-up, so we can get started on implementation :)

mvines · 2023-09-12T15:10:39Z

proposals/0057-events.md

+The response is sorted by the following priorities:
+
+1. slots ascending
+2. signature order for transactions inside a block


Rather than trying to order by signature here, there is a single natural ordering of how transactions are placed in entries and then in a block. This order is common across the cluster (solana-ledger-tool outputs it). During replay, the entry transactions are intentionally shuffled randomly but if that shuffling didn't exist then the entire cluster would replay entries the same way

Oh wasn't aware, if we can keep that alive across replay, we should be able to have even more easy indexing. Will adapt.

ripatel-fd · 2024-01-21T16:04:37Z

proposals/0057-events.md

+Bank deltas are currently hashed via sha256, the sha256 syscall charges
+85 + 0.5 CU / byte. blake3 a roughly 2x faster to compute hash, would be
+sufficient as well, hence 100 + 0.25 * len CU would be a good approximation


Current best implementations of BLAKE3 are significantly slower than SHA-256 below 4kB
https://forum.solana.com/t/blake3-slower-than-sha-256-for-small-inputs/829

I have an experimental rewrite that makes BLAKE3 fast again, but that won't get backported to Rust anytime soon.
Therefore, I think we should use the SHA256 pricing instead.

ripatel-fd

I would rather bring these fixes to the existing log mechanism instead of adding a new log facility with a different name.

I like some important fixes this SIMD proposes, but we should be able to bring those to the current log mechanism

Removing truncation
Supporting binary logs (no more UTF-8 validation)
Accounting storage cost in fees

Supporting two log mechanisms is more complicated for the developers of programs, frontends, API infrastructure, and core developers. Retiring the old log mechanism is not an option, since it's unreasonable to expect all programs to adopt the new event API (some are immutable)

Keeping a unified log mechanism could look like so:

Add "log v2" core and RPC APIs
Expand the functionality of the log collector to support both binary (v2) and text logs (v1/v2)
Enforce the fee changes onto log v1
Ensure that all messages are visible via either API

alessandrod · 2024-09-24T15:08:36Z

I would rather bring these fixes to the existing log mechanism instead of adding a new log facility with a different name.

Just a drive by comment (sorry!) since someone asked about this. If we reused the existing log facility, how would RPC be able to tell whether something is an event or a dumb log? Would it filter/match all logs? That seems terrible perf wise

steveluscher · 2024-10-15T23:42:44Z

Previously: https://forum.solana.com/t/srfc-00004-native-events-program/32. cc/ @ngundotra

seanyoung · 2024-11-14T13:36:27Z

proposals/0057-events.md

+
+```
+#include <sol/emit.h>
+void sol_emit_(/* r1 */ const char *message,


const uint8_t * message, unless it's really only characters

actually why not an array of slices, that saves the bpf program from having to concatenate

this looks like sol_log_data() with some RPC magic; why not reuse sol_log_data()?

In fact, sol_log_data() was introduced for events (with Solang)

change of CU cost bc size restriction is no longer a client side decison. sol_log_ truncates for programs that log a lot of data depending on the validator configuration

apfitzge · 2024-11-21T21:52:57Z

proposals/0057-events.md

+
+
+### Geyser client
+


we really should not be adding new interfaces in geyser that are also broken for non-rust languages.
we eventually need to fix the old ones, but imo we shouldn't add new ones that are also broken - please make the new Event structs ffi-safe.

had a look at this PR anza-xyz/agave#2125 and updated the events struct, lmk if that works for you

apfitzge · 2024-11-21T21:58:00Z

proposals/0057-events.md

+event payloads but most events are <512b in length. This cost enforces
+theoretical limits to the amount of logs per transaction & block:
+Limit per 200k CU: 800kB
+Limit per 48M CU: 200MB


It seems odd to me that this limit, which is per block, is higher than the amount of state-growth we allow per block.
That to me implies that the CU cost is too low? why would we let events emit faster than our state growth if infrastructure has to store both?

Happy to go with double the cost per byte, if that simplifies the process moving forward.

apfitzge · 2024-11-21T21:59:06Z

proposals/0057-events.md

+In order to log an event, the SVM runtime must implement a new syscall
+`sol_emit_`. The runtime must charge 100 + 0.25 * len CU per invocation to
+account for the potential cost of including a blake3 hash of all log messages
+in consensus.


I don't see a section specifying how this would be included in consensus, what's the plan here?

There is no concrete plan yet. I wanted to leave that open for a future SIMD, as the roadmap towards light clients is not clear yet.

apfitzge · 2024-11-25T14:49:43Z

With this being outside of consensus, I'm not sure how this is different than logs wrt forcing operators to save the data.

For RPCs, sure they'll likely need to save it to serve users. But for normal validator operators, couldn't they just drop the logs after execution, or even patch logic to not emit events at all? Even RPCs, if they didn't want to offer this service, could drop them.

I guess what I'm missing is how this is different than logs where we truncate, since I'm not seeing anything that prevents me from truncating. With my current reading, this effectively seems like a specific way to do logging.

steveluscher · 2024-11-27T00:24:52Z

Because we have an opportunity to do better than logs here, I wanted to ask if we can offer guarantees that an event hasn't been modified.

Let's imagine that I was building some event-driven system that listened for events on the network (listing/oracle/order/etc) and took some action involving a non-trivial amount of value. I would want to know that the event was:

logged by the entity I think it was logged by, and
unmodified

Regarding the existing userspace implementation of events atop self-CPI, @jarry-xiao mentioned that the ‘spoofing risk was mitigated by adding in a PDA event authority that must sign the self CPI.’ That's good. For the next point though – for the same reason that RPCs are free to truncate logs, I don't think there's anything to stop a malicious RPC from modifying the content of the log in a way that the consumer can't detect.

I don't have any ideas of how to do this. Normally I'd produce an HMAC and include it in the content of the event data itself, but at the point where the program is emitting the event, there's nothing secret available to include in the HMAC, and at the point where the signers are signing the transaction they don't know what event data will be emitted.

Anyone have any thoughts?

ripatel-fd · 2024-11-27T21:24:10Z

Anyone have any thoughts?

@steveluscher Regarding log authentication, there's a separate proposal (SIMD-0064) to hash transaction execution metadata into the bank hash. We could include logs (or events) there too. Then, using an inclusion proof, you'd be able to verify the existence and origin of a log message against a bank hash. The bank hash can be partially verified by looking at votes from a handful of validators.

mschneider · 2024-11-29T08:14:00Z

@ripatel-fd had a read-through SIMD-64, please let me know if I correctly summarized it, i also have a couple questions:

transaction receipt is a 32 byte hash that commits transaction execution status (success/failure) and transaction message
given all transaction receipts and their PoH order for a given slot, anyone can verify the correctness of that information by collecting the 32 byte commitment tree root of all validators or a reasonable subset
it does not specify how the commitments are shared, just mentions it would not be ideal to add it to block building. did you plan any extensions to voting or gossip?
inclusion proofs are not part of the proposal, were these formalized since then?

start with events SIMD

26432d5

mschneider changed the title ~~SIMDXX: Events~~ SIM57: Events Jun 13, 2023

mschneider changed the title ~~SIM57: Events~~ SIMD57: Events Jun 13, 2023

assign nr 57

3e1f9a6

mschneider changed the title ~~SIMD57: Events~~ SIMD-0057: Events Jun 13, 2023

linter

664f733

mschneider force-pushed the max/events branch from c73c0db to 6046e7d Compare June 14, 2023 11:00

add clarkeni's feedback

9117e07

mschneider force-pushed the max/events branch from 6046e7d to 9117e07 Compare June 14, 2023 11:01

jarry-xiao reviewed Jun 14, 2023

View reviewed changes

proposals/0057-events.md Show resolved Hide resolved

jarry-xiao reviewed Jun 14, 2023

View reviewed changes

proposals/0057-events.md Show resolved Hide resolved

CriesofCarrots reviewed Jun 29, 2023

View reviewed changes

proposals/0057-events.md Outdated Show resolved Hide resolved

add RPC & geyser spec

0da8947

mschneider force-pushed the max/events branch from d5b0c73 to 0da8947 Compare July 15, 2023 21:51

mvines reviewed Jul 17, 2023

View reviewed changes

ripatel-fd reviewed Jul 20, 2023

View reviewed changes

mschneider added 3 commits July 22, 2023 00:06

remove EOA & POA

c1c0009

swap ptr & len

1c04109

update CU cost

3be1bd9

jacobcreech added standard SIMD in the Standard category core Standard SIMD with type Core labels Aug 16, 2023

mschneider added 3 commits September 8, 2023 14:04

allow querying events directly and add sequence number

e37f8fa

explained CU cost choice in proposal

7c8afd9

satisfy linter

6440104

mvines reviewed Sep 12, 2023

View reviewed changes

change order to leader order rather than replay randomized

9c5c6e1

mschneider requested review from mvines and CriesofCarrots January 21, 2024 10:26

mschneider requested review from ripatel-fd and jumpsiegel January 21, 2024 10:26

ripatel-fd reviewed Jan 21, 2024

View reviewed changes

mschneider added 4 commits November 12, 2024 00:07

update spec with suid and simulation

3afa9bb

linter

d959b7a

simplified account meta filter

4f77458

remove AccountMetaFilter

786fcb0

seanyoung reviewed Nov 14, 2024

View reviewed changes

apfitzge reviewed Nov 21, 2024

View reviewed changes

mschneider added 2 commits November 24, 2024 13:49

increase CU cost

eefa14d

make event struct ffi safe

dc9cb25

SIMD-0057: Events #57

Are you sure you want to change the base?

SIMD-0057: Events #57

Conversation

mschneider commented Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

mschneider Jun 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theDefiBat Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

jarry-xiao commented Jun 14, 2023

Choose a reason for hiding this comment

mschneider Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mschneider Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mschneider Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ripatel-fd left a comment

Choose a reason for hiding this comment

alessandrod commented Sep 24, 2024

steveluscher commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mschneider Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge commented Nov 25, 2024

steveluscher commented Nov 27, 2024

ripatel-fd commented Nov 27, 2024

mschneider commented Nov 29, 2024

mschneider commented Jun 13, 2023 •

edited

Loading

mschneider Jun 15, 2023 •

edited

Loading

theDefiBat Oct 3, 2023 •

edited

Loading

mschneider Jul 19, 2023 •

edited

Loading

mschneider Jul 19, 2023 •

edited

Loading

mschneider Sep 8, 2023 •

edited

Loading

mschneider Nov 14, 2024 •

edited

Loading