Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from mdns to using the kaboodle peer mesh #48

Merged
merged 11 commits into from
Mar 28, 2023
Merged

Conversation

ceejbot
Copy link
Contributor

@ceejbot ceejbot commented Mar 24, 2023

The agent and the cli now both attempt to join a network of peers, established via kaboodle, and use that network instead of mdns to discover relevant services.

The most interesting implementation details are in the file utils/src/mesh.rs. This file includes some wrapper traits that probably belong in Kaboodle itself, since they were designed to be re-usable and to encode the responsibilities of any Kaboodle implementation: KaboodleMesh and KaboodlePeer. There are implementations plus some serval-specific functions in the structs ServalMesh and PeerMetadata. (All names are negotiable. I already don't like my initial choices.)

The peer metadata struct gets a little complicated because I attempted to provide some future-proofing via a way to version it. It's encoded via binencode, with a version envelope wrapper to allow us to make newer agents that change what gets packed into identity able to interoperate with older agents, and let older agents ignore newer ones. I welcome improvements to this somewhat ad-hoc design.

Agent roles are now an enum, ServalRoles. The strum crate provides Display and FromString implementations. This adds some safety and validity-checking to what had been some magic strings.

We knew we'd stop blocking in the cli some day, and this is that day: the cli is now fully async.

Finally, and most noisily, corrected the spelling of "Wasm" everywhere. Apologies for that.

@ceejbot ceejbot requested review from shinypb and jkbecker March 24, 2023 00:41
utils/src/mesh.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@shinypb shinypb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I think it makes sense to move your helpers into the kaboodle repo once we're confident that they're the correct shape (I have no reason to doubt they are, but keeping them here is maximally flexible for the time being).

I don't have any substantial feedback. I have a few friendly nitpicky suggestions and conversation openers below. Shippit!

utils/src/mesh.rs Outdated Show resolved Hide resolved
@@ -171,14 +168,20 @@ async fn main() -> Result<()> {
);
}

let mut roles: Vec<String> = Vec::new();
if blob_path.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't really matter, but if you change this to if let Some(blob_path) = blob_path { then you can get rid of the :? in the format string and the output will look nicer.

// TODO: this is actually fallible
impl KaboodlePeer for PeerMetadata {
fn from_identity(address: SocketAddr, encoded: Vec<u8>) -> Self {
// TODO: this is actually fallible; when might it fail?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data is coming in over unreliable transport, so it's possible for a bit to get flipped and lead to a corrupted encoded payload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider shipping a CRC with it?

@@ -22,35 +23,40 @@ pub async fn relay_request(
service_name: &str,
source_instance_id: &Uuid,
) -> Result<Response, ServalError> {
let node_info = discover_service(service_name).await.map_err(|err| {
log::warn!("proxy_unavailable_services failed to find a node offering the service; service={service_name}; err={err:?}");
let mesh = MESH.get().expect("Peer network not initialized!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Thinking aloud 🤔 ) Is it worth promoting this to ServalError, or (since this happening would be a Programming Error™) is the catastrophic nature of the .expect preferable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catastrophic nature of the expect is desirable: this is a crash-and-restart situation. It would be programmer error, most likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will enjoy living to regret this statement.

let host = info.get_addresses().iter().next().unwrap(); // unwrap is safe because discover_service will never return a service without addresses
let port = info.get_port();
let target_instance_id = peer.name();
let Some(socket_addr) = peer.address() else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am making this comment in the wrong place, but why does peer.address() return an Option`?

Copy link
Contributor Author

@ceejbot ceejbot Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an option because the underlying peers() function in kaboodle returns a vec of (Option<SocketAddr>, Vec<u8>)! I should probably filter that list to remove any peer that doesn't have an address, which would simplify the code in the agent.

cli/src/main.rs Outdated
env_var_override_name: &str,
) -> Result<String> {
if let Ok(override_url) = std::env::var(env_var_override_name) {
async fn maybe_find_peer(role: &ServalRole, override_var: &str) -> Result<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on initializing the mesh elsewhere and passing a reference to it into this function? (Or does the CLI only ever need to find a peer a single time?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that some commands wouldn't need to join the mesh, and joining is expensive in terms of time, and it is one-shot. But if we come across a case where it isn't, then we'd refactor. It's fine either way.

@@ -13,6 +14,8 @@ pub trait KaboodleMesh {

/// Create a new entry for a Kaboodle peer network and add ourselves to the mesh.
async fn start(&mut self) -> Result<(), KaboodleError>;
/// Remove this peer from the mesh.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nit but maybe consider Remove ourselves from the mesh so it's more obvious that there is no way to remove forcibly an other peer from the mesh.

@@ -320,7 +319,7 @@ async fn main() -> Result<()> {
SERVAL_NODE_URL.lock().unwrap().replace(baseurl);

match args.cmd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just .await? this match statement rather than doing it in all of the arms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that would look cleaner - would it be functionally equivalent?

Copy link
Contributor Author

@ceejbot ceejbot Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it would not be functionally equivalent if we ever have a non-async command handler. I was preserving the option.

let metadata = PeerMetadata::new(format!("{}:{}", host, port), roles, None);
let metadata = PeerMetadata::new(
Uuid::new_v4().to_string(),
format!("http://{}:{}", host, port),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a thing for this PR, but it would be ideal to start passing around SocketAddrs rather than host and port, because the way you stringify host is different for IPv4 and IPv6 (which needs [brackets] around the IP).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we track this in an issue? Sounds like a boring but potentially large footprint refactor? Or should this just be done when tackling #50 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a lot of this work in a commit to this branch today, so I think it's mostly done? I'll review.

} else {
Err(anyhow!("Unable to locate a peer with the {role} role"))
};

mesh.stop().await?;
mesh.stop().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this finally made it through my skull and brings up a very interesting architectural consideration.

Nothing needs to change right now, but it's worth discussion:

Up until now, I had never contemplated the thought that an instance of the serval CLI tool would have to join the mesh to do any work. That totally works, of course, but it seems a bit odd for something so ephemeral to create so much work for the mesh—everybody is going to have to update their known peers list, and it will take things a while to settle down—all for a peer who will vanish a few seconds later.

From the perspective of the mesh, a better design would be for the serval CLI tool to connect to an instance of serval-agent running on localhost. If none is running on localhost, joining the mesh to find a node to talk to would be the fallback. Perhaps we could store all of this in $TMP or /var/run or whatever?

let metadata = PeerMetadata::new(
Uuid::new_v4().to_string(),
format!("http://{}:{}", host, port),
roles,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn’t actually matter, and we certainly don't have to change anything right now, but should we encode the roles in the identity payload as bit masks rather than as strings? The identity payload is included in every single kaboodle packet, so little optimizations add up.

Surely every conceivable role we'd have would fit into a single byte, and since we have versioning (brilliant idea, by the way—I would not have thought of that) we can expand it if we ever need more than 8 roles.

If you agree, I can make a GitHub issue so we don't lose the idea.

Copy link
Contributor Author

@ceejbot ceejbot Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at how bincode is actually encoding the roles :D It does a very nice job. Well-chosen crate there, colleague!

@ceejbot ceejbot force-pushed the ceej/kit-and-kaboodle branch from 860aa90 to 3b47d9a Compare March 24, 2023 18:36
let metadata = PeerMetadata::new(format!("{}:{}", host, port), roles, None);
let metadata = PeerMetadata::new(
Uuid::new_v4().to_string(),
format!("http://{}:{}", host, port),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we track this in an issue? Sounds like a boring but potentially large footprint refactor? Or should this just be done when tackling #50 ?

owo-colors = "3.5.0"
prettytable = "0.10.0"
reqwest = { version = "0.11.13", default-features = false, features = ["blocking", "deflate", "brotli", "json", "multipart", "stream", "rustls-tls"] }
reqwest = { version = "0.11.13", default-features = false, features = ["deflate", "brotli", "json", "multipart", "stream", "rustls-tls"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'll definitely wait til this is merged before I continue work on #43. I'm doing some blocking reqwests there that need to be refactored...

@@ -320,7 +319,7 @@ async fn main() -> Result<()> {
SERVAL_NODE_URL.lock().unwrap().replace(baseurl);

match args.cmd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that would look cleaner - would it be functionally equivalent?

// TODO: this is actually fallible
impl KaboodlePeer for PeerMetadata {
fn from_identity(address: SocketAddr, encoded: Vec<u8>) -> Self {
// TODO: this is actually fallible; when might it fail?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider shipping a CRC with it?

ceejbot added 10 commits March 27, 2023 15:27
We need to encode some information about nodes in the identity
payload, so I took a stab at writing a wrapper that hides the
machinery from the agent. I also sketch out a way to handle versioning
this payload, but I'm not happy with it yet.
Also, replaced the MDNS usage in the agent with kaboodle usage.
It runs and joins a mesh of one! I haven't replaced peer searches
yet, however.
oh and also probably mostly made the cli join the peer mesh
and try to run jobs over it. does not yet work because of
something slightly surprising involving switching to async main.
Possibly. Have not yet investigated.
No blocking. Only tackling.
This allows job-running to work! SHIP IT.
Also, start generalizing how we find peers with services.
@ceejbot ceejbot force-pushed the ceej/kit-and-kaboodle branch from 79deae8 to 30dfa0a Compare March 27, 2023 22:28
@ceejbot ceejbot merged commit fc7b26a into main Mar 28, 2023
@ceejbot ceejbot deleted the ceej/kit-and-kaboodle branch March 28, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants