Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple nodes in the LC network without a public IP #788

Open
cortze opened this issue Feb 13, 2025 · 0 comments
Open

Multiple nodes in the LC network without a public IP #788

cortze opened this issue Feb 13, 2025 · 0 comments

Comments

@cortze
Copy link

cortze commented Feb 13, 2025

Description

ProbeLab is publishing weekly network health reports for Avail’s LC DHT network at probelab.io as of January of this year. The reports show a variety of metrics and are intended to alert engineering (and not only) teams about the unexpected behaviour of network nodes.

One thing we noticed is the increased number of no_ip_address errors, which persisted over the last three weeks at least - example from Week 4, 2025 (screenshot below). This error happens when a peer publishes only non-reachable IP addresses to the network, i.e., it is common for clustered peers in data-centers to assign private addresses, reducing the communication latency between the cluster. Subsequently, our crawler, which only tries to connect to public IPs, ends up reporting the no_ip_address error seen in the chart when it can’t find a reachable IP for them.

Performance Implications

Although the increased number of errors doesn’t seem to be critical or causing any other issues to the network, it still consumes resources and increases RTTs as nodes attempt to discover peers or connect to unreachable addresses.

One of the possible effects of these "unreachable" nodes is a reduction in the DHT’s overall performance. With 27% of the DHT servers marked as 'unreachable,' cell availability may decline over time. Public DHTs rely on a replication factor to adapt their routing to the network's node-churn dynamics. If this replication factor is reduced early on, could interfere or delay with the retrieval of DA cells. Furthermore, since fewer nodes store and serve the data in the network, it can induce some overload on nodes that are providing popular DA calls, potentially leading to some congestion in the network.

Flagging this issue, as it’s something that the engineering teams might want to look into and correct.

CC: @sh3ll3x3c @jakubcech

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant