-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use RDMA instead of TCP for state transfer #157
Labels
Comments
Another option: create an empty object with a volatile flag ready=false. Now allow the join to complete. Launch a thread that uses RDMA to fetch the state using the existing logic. (Re)initialize all state variables, set ready=true.
Access state via getters that busy-wait on ready==true.
On May 6, 2020, at 8:36 PM, Edward Tremel <[email protected]<mailto:[email protected]>> wrote:
State-transfer operations in Derecho (i.e. the transfer of a serialized Replicated Object from one node to another) currently use a set of TCP sockets that are stored in ViewManager. Originally we used this design because state transfer was part of the initial setup of adding a node to a group, and we wouldn't add the new node into SST or RDMC until after it had finished joining the group. However, now that we can create peer-to-peer RDMA connections independently of the main RDMC/SST multicast groups, the state transfer operation could be done with a peer-to-peer RDMA connection.
There are two options for how we could implement this:
1. Use the existing P2PConnections in RPCManager to send each Replicated Object. This would be easier for transferring state between existing members that have been re-assigned from one subgroup/shard to another, but would require some refactoring in order to work for new members. Right now, a P2PConnection to a new member is set up in RPCManager's new_view_callback, which does not get called until after state transfer is complete (at the end of ViewManager::finish_view_change). Also, for both new and existing members, ViewManager would need to have access to RPCManager in order to get the P2PConnections, or the P2PConnections object would have to be shared between them (in the way the tcp_connections used to be shared).
2. Create a new P2PConnection, or something similar, that only exists during the state transfer process between nodes that need to do state transfer. This might introduce some redundancy and add the overhead of another RDMA queue pair, but it would also be less destructive to our existing setup of RPCManager and P2PConnections. It would also allow us to customize the P2P RDMA connection that we use for state transfer to have a different buffer size than the one we use for peer-to-peer RPC messages, which might be necessary if the Replicated Objects we're transferring are much larger than the maximum message size for an RPC.
I think option 2 is slightly better, but since I didn't implement P2PConnections, I'm not confident in that opinion. Whatever we decide, though, this seems like a good opportunity for speeding up the process of restarting or joining a group, since transferring a lot of state over a TCP socket is an obvious bottleneck.
(Note that this issue is a specialization of #118<#118>)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#157>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFQMF3YIL5QMLI77DSMAMCLRQH7BDANCNFSM4M24SCVQ>.
|
etremel
added a commit
that referenced
this issue
May 14, 2020
Since it's not likely that we will completely eliminate TCP sockets from ViewManager any time soon (see issues #118 and #157), we should at least make our usage of TCP less confusing. The port named "rpc_port" in all of our configuration files is actually not used for RPC operations at all, but for transferring Views and object state between nodes during a view change. Renaming this port will make it clear that there is no RPC activity going over TCP.
etremel
added a commit
that referenced
this issue
May 15, 2020
Since it's not likely that we will completely eliminate TCP sockets from ViewManager any time soon (see issues #118 and #157), we should at least make our usage of TCP less confusing. The port named "rpc_port" in all of our configuration files is actually not used for RPC operations at all, but for transferring Views and object state between nodes during a view change. Renaming this port will make it clear that there is no RPC activity going over TCP.
Issue seems to be resolved at this point, with the work Edward did on v2.0 |
Actually this is not resolved yet in version 2.0. As you can see in the latest version of Group::receive_objects, we still receive state data over a TCP socket:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
State-transfer operations in Derecho (i.e. the transfer of a serialized Replicated Object from one node to another) currently use a set of TCP sockets that are stored in ViewManager. Originally we used this design because state transfer was part of the initial setup of adding a node to a group, and we wouldn't add the new node into SST or RDMC until after it had finished joining the group. However, now that we can create peer-to-peer RDMA connections independently of the main RDMC/SST multicast groups, the state transfer operation could be done with a peer-to-peer RDMA connection.
There are two options for how we could implement this:
new_view_callback
, which does not get called until after state transfer is complete (at the end ofViewManager::finish_view_change
). Also, for both new and existing members, ViewManager would need to have access to RPCManager in order to get the P2PConnections, or the P2PConnections object would have to be shared between them (in the way the tcp_connections used to be shared).I think option 2 is slightly better, but since I didn't implement P2PConnections, I'm not confident in that opinion. Whatever we decide, though, this seems like a good opportunity for speeding up the process of restarting or joining a group, since transferring a lot of state over a TCP socket is an obvious bottleneck.
(Note that this issue is a specialization of #118)
The text was updated successfully, but these errors were encountered: