Add cluster info on geyser #46

godmodegalactus · 2024-03-04T14:46:19Z

Problem

See issue solana-labs#32934
Summary of Changes

This PR is a POC to add cluster node's info to the Geyser plugin notification.

The LegacyContactInfo received on GOSSIP network are forwarded with the Geyser interface to a client subscriber.

Summary of Changes

the entry point in crds.rs to notify LegacyContactInfo updates and removals,
the Geyser plugin interface update to add LegacyContactInfo notification and notification entry point.
the processing of the updates by the Geyser_plugin_manager.

A GRPC Yellowstone plugin update has been done to test the process and see the notified data volume. You can see the PR here.

An example of client can be found here. It subscribes to the Cluster info node notification and call from time to time the RPC get_cluster_nodes to compare with the notified data.
Early results

The test has been done on Testnet cluster with the Solana version 1.17.5 and Yellowstone plugin 1.11.0+solana.1.17.5.
Comparison with RPC call

It takes a few second for the plugin to get the majority of the Testnet cluster nodes and after 30s there's less than 10 nodes in error when comparing with the RPC call (around 3000 nodes in the cluster).

After a few minutes, Geyser and RPC has the same list of cluster's nodes. From time to time they can have some difference (a few nodes) from one part or the other depending on the notification time propagation of update or removal.
Data volume

I use Tcpdump to monitor the packet send between the validator grpc server and the client to build the get_cluster_nodes RPC call nodes list.

This graphic is an example of network activity and bandwidth use. The volume of data transmitted is between 0,5 Mbits/s and 3 Mbits/s.

I've done some monitoring of the network connection between the grpc server and the client for the cluster node info notification.

This is the graphic of the data transferred between the validator grpc server and the client to build the get_cluster_nodes RPC call nodes list.

Fixes #

add cluster notifier registration and test Add cluster node remove geyzer notification rebase on master and correct PR comments fix rebase, fmt, clippy

lijunwangs · 2024-04-23T02:25:27Z

@godmodegalactus -- sorry for getting to this late. Can you address the conflicts?

gregcusack · 2024-11-05T18:52:25Z

why do we need to stream contact info for every gossip update? could we get away with only streaming contact info when an actual ContactInfo CrdsValue is received? notify_clusterinfo_remove() would ensure nodes are removed when they become stale.

also, do we need to stream contact info when a CrdsValue fails to insert into the table? feels like we shouldn't need to report that since it is a failed updated.

godmodegalactus · 2024-11-05T19:09:41Z

why do we need to stream contact info for every gossip update? could we get away with only streaming contact info when an actual ContactInfo CrdsValue is received? notify_clusterinfo_remove() would ensure nodes are removed when they become stale.

also, do we need to stream contact info when a CrdsValue fails to insert into the table? feels like we shouldn't need to report that since it is a failed updated.

Yeah i think you are right.

I will pick up this PR again and try to resolve all the issues.

gregcusack · 2024-11-05T19:30:35Z

gossip/src/crds/geyser_plugin_utils.rs

+                if let CrdsData::LegacyContactInfo(ref cluster_info) = value.value.data {
+                    let notifier = &clusterinfo_update_notifier.read().unwrap();
+                    notifier.notify_clusterinfo_update(cluster_info);


oh jk I see you already set it up to update only on reception of LegacyContactInfo. but ya probably don't need to send an update if the contact info fails to insert

behzadnouri · 2024-11-11T00:45:24Z

gossip/src/crds.rs

+        //notify geyser interface
+        self.notify_clusterinfo_update(self.table.get(&gossip_label));
+        ret


Crds::insert requires an exclusive lock on the crds table.
So we shouldn't put the notification code here.

Instead somewhere else, e.g. here:
https://github.com/anza-xyz/agave/blob/7629a152d/gossip/src/cluster_info.rs#L1618
we need to read locks crds table and obtain new entries since last time using a crds::Cursor.
similar to the code here:
https://github.com/anza-xyz/agave/blob/7629a152d/gossip/src/crds_gossip_push.rs#L184-L190

alternatively somewhere in geyser can pull new contact-infos every once in a while or when needed.
for example in turbine we cache contact-infos for 5 seconds and if 5 seconds have passed we pull again all contact-infos from gossip:
https://github.com/anza-xyz/agave/blob/7629a152d/turbine/src/cluster_nodes.rs#L287

behzadnouri · 2024-11-11T00:48:15Z

geyser-plugin-interface/src/geyser_plugin_interface.rs

+#[derive(Debug, Clone, PartialEq, Eq)]
+#[repr(C)]
+/// Information about a node  in  the cluster.
+pub struct ReplicaClusterInfoNode {


What happens to this struct if we change gossip ContactInfo, e.g. add a new socket for a new protocol?

ya so that is the goal of doing something with an FFI interface. See PR: #3434. We could modify that PR and choose to only expose a pointer to a ContactInfo struct, making it much easier to update ContactInfo in the future. More details here: #3434 (comment)

godmodegalactus · 2024-11-11T12:30:15Z

@gregcusack @lijunwangs @behzadnouri #3434 PR superseeds this PR, or should I start working on this PR ?

behzadnouri · 2024-11-11T13:54:51Z

@gregcusack @lijunwangs @behzadnouri #3434 PR superseeds this PR, or should I start working on this PR ?

We can follow up on #3434
Please close this PR if you don't want to work on it anymore. thanks.

mergify bot added community need:merge-assist labels Mar 4, 2024

mergify bot requested a review from a team March 4, 2024 14:55

Cluster info geyzer notification firs impl

897e0bc

add cluster notifier registration and test Add cluster node remove geyzer notification rebase on master and correct PR comments fix rebase, fmt, clippy

godmodegalactus force-pushed the add_cluster_info_on_geyser branch from c15fe52 to 897e0bc Compare March 11, 2024 13:01

lijunwangs self-requested a review April 23, 2024 02:25

gregcusack reviewed Nov 5, 2024

View reviewed changes

behzadnouri reviewed Nov 11, 2024

View reviewed changes

behzadnouri mentioned this pull request Nov 11, 2024

stream crds node updates via geyser #3434

Draft

godmodegalactus closed this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster info on geyser #46

Add cluster info on geyser #46

godmodegalactus commented Mar 4, 2024

lijunwangs commented Apr 23, 2024

gregcusack commented Nov 5, 2024

godmodegalactus commented Nov 5, 2024

gregcusack Nov 5, 2024 •

edited

Loading

behzadnouri Nov 11, 2024

behzadnouri Nov 11, 2024 •

edited

Loading

behzadnouri Nov 11, 2024

gregcusack Nov 11, 2024

godmodegalactus commented Nov 11, 2024

behzadnouri commented Nov 11, 2024

Add cluster info on geyser #46

Add cluster info on geyser #46

Conversation

godmodegalactus commented Mar 4, 2024

Problem

Summary of Changes

lijunwangs commented Apr 23, 2024

gregcusack commented Nov 5, 2024

godmodegalactus commented Nov 5, 2024

gregcusack Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

behzadnouri Nov 11, 2024

Choose a reason for hiding this comment

behzadnouri Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

behzadnouri Nov 11, 2024

Choose a reason for hiding this comment

gregcusack Nov 11, 2024

Choose a reason for hiding this comment

godmodegalactus commented Nov 11, 2024

behzadnouri commented Nov 11, 2024

gregcusack Nov 5, 2024 •

edited

Loading

behzadnouri Nov 11, 2024 •

edited

Loading