Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autonat: infinity loop on dial_back error #5816

Closed
stormshield-frb opened this issue Jan 15, 2025 · 3 comments · Fixed by #5848
Closed

autonat: infinity loop on dial_back error #5816

stormshield-frb opened this issue Jan 15, 2025 · 3 comments · Fixed by #5848

Comments

@stormshield-frb
Copy link
Contributor

Summary

The code of perform_dial_back goes into an infinity loop when an error occurs. I did found out where the error is coming from but I am not sure on how it should be fixed.

fn perform_dial_back(
stream: libp2p_swarm::Stream,
) -> impl futures::Stream<Item = io::Result<IncomingNonce>> {
let state = State {
stream,
oneshot: None,
};
futures::stream::unfold(state, |mut state| async move {
if let Some(ref mut receiver) = state.oneshot {
match receiver.await {
Ok(Ok(())) => {}
Ok(Err(e)) => return Some((Err(e), state)),
Err(_) => {
return Some((
Err(io::Error::new(io::ErrorKind::Other, "Sender got cancelled")),
state,
));
}
}
if let Err(e) = protocol::dial_back_response(&mut state.stream).await {
return Some((Err(e), state));
}
return None;
}
let nonce = match protocol::recv_dial_back(&mut state.stream).await {
Ok(nonce) => nonce,
Err(err) => {
return Some((Err(err), state));
}
};
let (sender, receiver) = oneshot::channel();
state.oneshot = Some(receiver);
Some((Ok(IncomingNonce { nonce, sender }), state))
})
}

The problem is that, when a result is received from receiver.await, if this result is an error, then the stream fires an Err event but don't do anything to either reset the oneshot channel that was just used, or to terminate the stream.

In my opinion, there is two possible solutions:

1. Terminate the stream

When receiver.await returns Err (meaning the channel is closed), we return None effectively closing the stream.

match receiver.await {
    Ok(Ok(())) => {}
    Ok(Err(e)) => return Some((Err(e), state)),
    Err(_) => return None, // <----------
}
2. Reset the oneshot channel

We take the state.oneshot, which will garantee that a receiver can only be polled once.

 if let Some(ref mut receiver) = state.oneshot.take() { // <----------
   ...
}

I am not sure on what is the expected behaviour in such cases, that is why, @umgefahren, if you could look at this and tell me what you think it would be amazing. I will open a PR as soon as I know what is the expected behaviour: stop performing dial_back when any error occurs, or if an error occurs, restart from the beginning.

Expected behavior

Do not got into an infinity loop.

Actual behavior

Goes into an infinity loop.

Relevant log output

2025-01-14T07:17:22.333743Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: InvalidData, error: "Received unexpected nonce: 3883418905447951235 from Qm....VC" }
2025-01-14T07:17:22.333782Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333811Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333837Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333864Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333890Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333917Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333943Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333969Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.333996Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334023Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334050Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334076Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334102Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334129Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334155Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334181Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334207Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334234Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334265Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334294Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334320Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334346Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334372Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334402Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334430Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334456Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334483Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334509Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334541Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }
2025-01-14T07:17:22.334567Z DEBUG libp2p_autonat::v2::client::handler::dial_back: Dial back handler failed with: Custom { kind: Other, error: "Sender got cancelled" }

Possible Solution

No response

Version

Master.

Would you like to work on fixing this bug?

Yes

@elenaf9
Copy link
Contributor

elenaf9 commented Jan 31, 2025

Thank you for debugging and reporting this issue!

I will open a PR as soon as I know what is the expected behaviour: stop performing dial_back when any error occurs, or if an error occurs, restart from the beginning.

We should stop performing dial_back and close the connection. There is no mechanism in the protocol spec to signal the remote that it should send us a dial_back request again (i.e. to "restart"). The only mechanism that we have is to close the stream. Also, as far as I can tell this scenario can only happen if the remote tries a dial-back with an invalid nonce, which is a protocol violation anyway.

1. Terminate the stream

When receiver.await returns Err (meaning the channel is closed), we return None effectively closing the stream.

match receiver.await {
    Ok(Ok(())) => {}
    Ok(Err(e)) => return Some((Err(e), state)),
    Err(_) => return None, // <----------
}

Sounds good to me.

@stormshield-frb
Copy link
Contributor Author

Thanks @elenaf9 !!

I have talked to Hannes and he said he will look at this issue in the next days. So, out of politeness, I will wait for him to confirm since its his code. If he does not have the time to do it by monday or tuesday, we will do this solution. I think it makes sense 😊

Thank you very much nonetheless.

@umgefahren
Copy link
Contributor

@elenaf9 is right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants