-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0057: Events #57
base: main
Are you sure you want to change the base?
Conversation
proposals/0057-events.md
Outdated
actual application. It also causeses unnecessary overhead in replay as a new | ||
execution context neeeds to be created, for something as trivial as a log | ||
message. There is a potential security risk in the this system in that external | ||
programs could spoof anchor events (and the program emitting them) with CPI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spoofing risk was mitigated by adding in a PDA event authority that must sign the self CPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, does this require to add another account per program that wants to emit logs to the transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but it’s only 34 bytes. (0 data, only pubkey + flags)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully support the development of more robust indexing solutions + persistent event data, but it's hard for me to see a strict benefit in the near term future of native events over CPI events (which is admittedly a hack). From a resource allocation standpoint, this feels like large engineering lift + ecosystem buy-in lift for insufficient gain IMO. No opposition if a native solution gets prioritized and implemented with 5x improvement to existing devex. |
proposals/0057-events.md
Outdated
|
||
``` | ||
#include <sol/emit.h> | ||
void sol_emit_(/* r1 */ uint64_t len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the "Discriminator" expressed?
Also it seems better that the program specify the addresses it wants to associate with the event no? This would be part of the price, and allow the indexing backends to only persist the addresses that actually matter instead of requiring every address in the transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discriminator: not at all, i thought we can get by without supporting it explicitly in the standard. clients can decide to have an index on the program id + first n (1..32) bytes for performance reasons, but that's implementation detail.
indexing backends anyways have a gsfa index, an additional discriminator -> signature index can be used for index joins. do you think that's unrealistic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, what's a gsfa?
clients can decide to have an index on the program id + first n (1..32) bytes for performance reasons, but that's implementation detail.
i think leaving this as an implementation detail means we could then expect to see different behaviours across rpc endpoints, but that's the problem with sol_log()
that this proposal is attempting to solve though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gsfa = getSignaturesForAddress
imo this is very application specific, i would expect the program IDL to declare to the indexing provider which parts of the event (for each event type) need indexing and hence which queries can have improved latency. the difference here is that the data query API is defined which is important for compatibility, so same query bytes -> same result bytes. latency is still dependent on rpc provider, it's not something you could enforce via spec.
we could add the discriminator ofc. but i thought at this point it's better to just leave it out of the spec to keep stuff lightweight and implement a 8 byte prefix index in the reference client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that this proposal is attempting to solve though
just want to comment again on this:
proposal solves provider specific truncation and lack of query api schema, basically enforces data consistency across providers.
it does not guarantee reliable query performance, similar to how SQL spec can't enforce query performance.
proposals/0057-events.md
Outdated
``` | ||
#include <sol/emit.h> | ||
void sol_emit_(/* r1 */ uint64_t len, | ||
/* r2 */ const char *message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mschneider OCD thing - In other syscalls, pointer comes first and length afterwards. Could we swap arguments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The events should have a sequence number so that I can "subscribe" to an event stream and restart at a specific sequence number. This would match up with the event behavior for aptos, sui, and (via a hack or two) algorand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jumpsiegel wrong thread to post the comment, but I second that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the events on stream and when queried via rpc call include a sequence number, just to clarify, would it be enough to be able to send a request to backfill missing events (once you notice discontinuity?) i find stream semantic for historic events hard to reason about. curious to see the api's you mentioned, could you drop me the links @jumpsiegel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jumpsiegel i added a way to query events directly and included the sequence number for that purpose. Due to the way solana randomizes block entry execution there is some weirdness around how events can change their order inside the response, I clarified that. Please let me know if this is sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, I am confused by this statement. Does this mean different validators might have events with different sequence numbers? ie, the sequence number is non-deterministic and not part of consensus.. The effect of this would be that if I change event providers (validators) while going through a load balancer, the sequence numbers would change and I would find myself scrambling to figure out what I have missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the current behaviour of gSFA. @mvines mentioned it's possible to clean up, so i'll update the proposal wrt. i think it's a huge improvement to change this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general sequence numbers can still change due to block re-orgs. i updated the proposal text, would appreciate a final review and ideally thumbs-up, so we can get started on implementation :)
proposals/0057-events.md
Outdated
The response is sorted by the following priorities: | ||
|
||
1. slots ascending | ||
2. signature order for transactions inside a block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than trying to order by signature here, there is a single natural ordering of how transactions are placed in entries and then in a block. This order is common across the cluster (solana-ledger-tool
outputs it). During replay, the entry transactions are intentionally shuffled randomly but if that shuffling didn't exist then the entire cluster would replay entries the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wasn't aware, if we can keep that alive across replay, we should be able to have even more easy indexing. Will adapt.
proposals/0057-events.md
Outdated
Bank deltas are currently hashed via sha256, the sha256 syscall charges | ||
85 + 0.5 CU / byte. blake3 a roughly 2x faster to compute hash, would be | ||
sufficient as well, hence 100 + 0.25 * len CU would be a good approximation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current best implementations of BLAKE3 are significantly slower than SHA-256 below 4kB
https://forum.solana.com/t/blake3-slower-than-sha-256-for-small-inputs/829
I have an experimental rewrite that makes BLAKE3 fast again, but that won't get backported to Rust anytime soon.
Therefore, I think we should use the SHA256 pricing instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather bring these fixes to the existing log mechanism instead of adding a new log facility with a different name.
I like some important fixes this SIMD proposes, but we should be able to bring those to the current log mechanism
- Removing truncation
- Supporting binary logs (no more UTF-8 validation)
- Accounting storage cost in fees
Supporting two log mechanisms is more complicated for the developers of programs, frontends, API infrastructure, and core developers. Retiring the old log mechanism is not an option, since it's unreasonable to expect all programs to adopt the new event API (some are immutable)
Keeping a unified log mechanism could look like so:
- Add "log v2" core and RPC APIs
- Expand the functionality of the log collector to support both binary (v2) and text logs (v1/v2)
- Enforce the fee changes onto log v1
- Ensure that all messages are visible via either API
Just a drive by comment (sorry!) since someone asked about this. If we reused the existing log facility, how would RPC be able to tell whether something is an event or a dumb log? Would it filter/match all logs? That seems terrible perf wise |
Previously: https://forum.solana.com/t/srfc-00004-native-events-program/32. cc/ @ngundotra |
|
||
``` | ||
#include <sol/emit.h> | ||
void sol_emit_(/* r1 */ const char *message, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint8_t * message
, unless it's really only characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually why not an array of slices, that saves the bpf program from having to concatenate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like sol_log_data()
with some RPC magic; why not reuse sol_log_data()
?
In fact, sol_log_data()
was introduced for events (with Solang)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change of CU cost bc size restriction is no longer a client side decison. sol_log_ truncates for programs that log a lot of data depending on the validator configuration
|
||
|
||
### Geyser client | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really should not be adding new interfaces in geyser that are also broken for non-rust languages.
we eventually need to fix the old ones, but imo we shouldn't add new ones that are also broken - please make the new Event
structs ffi-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a look at this PR anza-xyz/agave#2125 and updated the events struct, lmk if that works for you
proposals/0057-events.md
Outdated
event payloads but most events are <512b in length. This cost enforces | ||
theoretical limits to the amount of logs per transaction & block: | ||
Limit per 200k CU: 800kB | ||
Limit per 48M CU: 200MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to me that this limit, which is per block, is higher than the amount of state-growth we allow per block.
That to me implies that the CU cost is too low? why would we let events emit faster than our state growth if infrastructure has to store both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to go with double the cost per byte, if that simplifies the process moving forward.
In order to log an event, the SVM runtime must implement a new syscall | ||
`sol_emit_`. The runtime must charge 100 + 0.25 * len CU per invocation to | ||
account for the potential cost of including a blake3 hash of all log messages | ||
in consensus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a section specifying how this would be included in consensus, what's the plan here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no concrete plan yet. I wanted to leave that open for a future SIMD, as the roadmap towards light clients is not clear yet.
With this being outside of consensus, I'm not sure how this is different than logs wrt forcing operators to save the data. For RPCs, sure they'll likely need to save it to serve users. But for normal validator operators, couldn't they just drop the logs after execution, or even patch logic to not emit events at all? Even RPCs, if they didn't want to offer this service, could drop them. I guess what I'm missing is how this is different than logs where we truncate, since I'm not seeing anything that prevents me from truncating. With my current reading, this effectively seems like a specific way to do logging. |
Because we have an opportunity to do better than logs here, I wanted to ask if we can offer guarantees that an event hasn't been modified. Let's imagine that I was building some event-driven system that listened for events on the network (listing/oracle/order/etc) and took some action involving a non-trivial amount of value. I would want to know that the event was:
Regarding the existing userspace implementation of events atop self-CPI, @jarry-xiao mentioned that the ‘spoofing risk was mitigated by adding in a PDA event authority that must sign the self CPI.’ That's good. For the next point though – for the same reason that RPCs are free to truncate logs, I don't think there's anything to stop a malicious RPC from modifying the content of the log in a way that the consumer can't detect. I don't have any ideas of how to do this. Normally I'd produce an HMAC and include it in the content of the event data itself, but at the point where the program is emitting the event, there's nothing secret available to include in the HMAC, and at the point where the signers are signing the transaction they don't know what event data will be emitted. Anyone have any thoughts? |
@steveluscher Regarding log authentication, there's a separate proposal (SIMD-0064) to hash transaction execution metadata into the bank hash. We could include logs (or events) there too. Then, using an inclusion proof, you'd be able to verify the existence and origin of a log message against a bank hash. The bank hash can be partially verified by looking at votes from a handful of validators. |
@ripatel-fd had a read-through SIMD-64, please let me know if I correctly summarized it, i also have a couple questions:
|
Propose a new event logging mechanism that is reliable and efficient.
Browser clients should be able to receive emitted events with minimal latency.
3rd party indexers should allow to horizontally scale access to historic
events.