Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes with same RECEPTOR_NODE_ID can exist on the same mesh #159

Open
elyezer opened this issue Mar 9, 2020 · 0 comments
Open

Nodes with same RECEPTOR_NODE_ID can exist on the same mesh #159

elyezer opened this issue Mar 9, 2020 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@elyezer
Copy link
Member

elyezer commented Mar 9, 2020

When starting a receptor node the RECEPTOR_NODE_ID environment variable can be set and receptor will use as the node ID. The problem with that is: there is no validation that the specified node ID is not being used by another node in the mesh.

When more than one node has the same ID the message routing does not work as expected and therefore can lead to message loss since a message may be routed to the wrong node.

This can be easily seen by doing the following. First run a 3 nodes mesh:

$ poetry run receptor --debug --node-id=controller -d /tmp/controller controller --listen=receptor://127.0.0.1:9999
$ poetry run receptor --debug --node-id=node-a -d /tmp/node-a node --listen=receptor://127.0.0.1:9998 --peer=receptor://localhost:9999
$ poetry run receptor --debug --node-id=node-b d /tmp/node-b node --listen=receptor://127.0.0.1:9997 --peer=receptor://localhost:9998

The above will start a mesh where controller -> node-a -> node-b. Then run two ping commands in parallel, one pinging node-a and the other pinging node-b. Use the controller node as the peer for both ping commands and set the same RECEPTOR_NODE_ID for both:

$ export RECEPTOR_NODE_ID="15477521-bcc0-446d-abc3-e3d80d57ec6b"

$ poetry run receptor -d /tmp/ping-a ping --peer=receptor://127.0.0.1:9999 --delay 0 --count 10 node-a  
{"initial_time": "2020-03-06T18:27:28.124886", "response_time": "2020-03-06 18:27:28.158935", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.130734", "response_time": "2020-03-06 18:27:28.164155", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.138053", "response_time": "2020-03-06 18:27:28.179573", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.146100", "response_time": "2020-03-06 18:27:28.192977", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.158007", "response_time": "2020-03-06 18:27:28.206066", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.162428", "response_time": "2020-03-06 18:27:28.211155", "active_work": []}
WARNING 2020-03-06 13:27:28,237  receptor Received response to acccd167-2361-4b85-8d86-655bf1c70489 but no record of sent message.
WARNING 2020-03-06 13:27:28,247  receptor Received response to 2624d5ed-6815-481a-b16a-ef13844087ac but no record of sent message.
WARNING 2020-03-06 13:27:28,252  receptor Received response to 2248820e-0009-42f6-91b6-93cce83ef8df but no record of sent message.
WARNING 2020-03-06 13:27:28,256  receptor Received response to ebd639c8-1d23-4757-8270-247a1b357257 but no record of sent message.
^C
$ export RECEPTOR_NODE_ID="15477521-bcc0-446d-abc3-e3d80d57ec6b"

$ poetry run receptor -d /tmp/ping-b ping --peer=receptor://127.0.0.1:9999 --delay 0 --count 10 node-b  
{"initial_time": "2020-03-06T18:27:28.120859", "response_time": "2020-03-06 18:27:28.156398", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.125913", "response_time": "2020-03-06 18:27:28.172401", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.133262", "response_time": "2020-03-06 18:27:28.188726", "active_work": []}
WARNING 2020-03-06 13:27:28,207  receptor Received response to 9762e4e9-3fba-414c-9e4f-496ca0382634 but no record of sent message.
{"initial_time": "2020-03-06T18:27:28.140208", "response_time": "2020-03-06 18:27:28.205756", "active_work": []}
WARNING 2020-03-06 13:27:28,232  receptor Received response to 0e5de5d0-9c3e-40e0-b0db-3672c1b3399b but no record of sent message.
WARNING 2020-03-06 13:27:28,244  receptor Received response to 327ec7dc-d779-451e-988d-ba01e99bb36d but no record of sent message.
WARNING 2020-03-06 13:27:28,250  receptor Received response to c7547678-997e-4c38-8b29-d243d9e0ca4b but no record of sent message.
{"initial_time": "2020-03-06T18:27:28.163389", "response_time": "2020-03-06 18:27:28.237886", "active_work": []}
{"initial_time": "2020-03-06T18:27:28.170925", "response_time": "2020-03-06 18:27:28.246019", "active_work": []}
^C

Observe the WARNING messages on both ping command logs. Because both nodes had the same node ID and therefore the router sent incorrectly a message to a node that wasn't the expected one.

All the above is summarized by the following:

receptor-issue-same-node-id

@elyezer elyezer added the bug Something isn't working label Mar 9, 2020
@matburt matburt added this to the 1.0 Release milestone May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants