Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[websocket] parser gets out of sync when data payload not received within timeout (IDFGH-13952) #680

Open
3 tasks done
bryghtlabs-richard opened this issue Oct 25, 2024 · 4 comments
Assignees
Labels
Status: Opened Issue is new

Comments

@bryghtlabs-richard
Copy link
Contributor

Answers checklist.

  • I have read the documentation for esp-protocols components and the issue is not addressed there.
  • I have updated my esp-protocols branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

When reading websocket transport payload data, if the payload data is not received within the read timeout, partial data is pulled from the stream, but the remaining payload data is never pulled from the stream, so the stream parser is then offset. The next time we try to read a message, it parses garbage because the data is misaligned. In this example, the websocket timeout is set to 25 seconds. WiFi router throttles to 5mbps download, 2mbps upload, 20% packet loss.

I (33478) SIOC: parseJsonEvent:778 msgRx len:1637
E (58614) transport_base: esp_tls_conn_read error, errno=No more processes   # 25 seconds later (1x WebSocket timeout), error occurs
E (58614) transport_ws: Error read data(0)
E (58614) transport_ws: Error reading payload data
I (60090) SIOC: did not understand WEBSOCKET_EVENT_DATA op_code:12 # Try to read another message, but parse garbage. Not RFC6445 frame-type
W (60094) SIOC: WsOpcode_Continuation  # Unexpected frame type for this protocol
W (60094) SIOC: WsOpcode_Continuation # Unexpected frame type for this protocol
I (60094) SIOC: did not understand WEBSOCKET_EVENT_DATA op_code:4 # not RFC6445 frame-type
I (60102) SIOC: did not understand WEBSOCKET_EVENT_DATA op_code:13 # not RFC6445 frame-type
I (60110) SIOC: did not understand WEBSOCKET_EVENT_DATA op_code:4 # not RFC6445 frame-type
I (60114) SIOC: got binary length: 97 fin=0 # Unexpected frame type for this protocol
E (60118) SIOC: WsOpcode_Text received without FIN flag, unimplemented, dropping. text: Chessbrah?","img" #misaligned Fragment of actual message
E (60138) SIOC: WsOpcode_Text received without FIN flag, unimplemented, dropping. text:png","countryCode":"ca","is_enabled":true,"is_ #misaligned fragment of message

Often the stream eventually closes itself, but rarely a message fragment is received as valid JSON.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 25, 2024
@github-actions github-actions bot changed the title [websocket] parser gets out of sync when data payload not received within timeout [websocket] parser gets out of sync when data payload not received within timeout (IDFGH-13952) Oct 25, 2024
@david-cermak
Copy link
Collaborator

parser gets out of sync when data payload

Do you mean the parser that is called upon receiving WEBSOCKET_EVENT_DATA from the ws client? That is the application code? Need to take the total data length and the actual offset into consideration:

int payload_len; /*!< Total payload length, payloads exceeding buffer will be posted through multiple events */
int payload_offset; /*!< Actual offset for the data associated with this event */

@bryghtlabs-richard
Copy link
Contributor Author

bryghtlabs-richard commented Oct 29, 2024

It's possible I was mistaken. I was also working on updating to v5.1.4(without #679 fix) that day, and this may have been #679 again. But I'm not certain it was. Separately, we might expand the transport_ws logging to include the error codes on failure, as I have done in my sandbox. Edit: I'm pretty sure this is a real bug, separate from #679, as the printout includes logging the esp-error I had only added with my fix.

Do you mean the parser that is called upon receiving WEBSOCKET_EVENT_DATA from the ws client? That is the application code? Need to take the total data length and the actual offset into consideration:

That's what I would expect to happen. What I am afraid happened is:

  1. ws_read() is called
  2. ws_read() calls ws_read_header() which reads the entire header, header describes a payload.
  3. ws_read() calls rlen = ws_read_payload(), which times out - we know this from "Error read data(0)"
  4. ws_read(), seeing rlen <= 0, reset ws->frame_state.bytes_remaining = 0, we know this from "Error reading payload data". But bytes_remaining = 0 is the wrong response to rlen == 0, because there is one whole payload of bytes remaining in the underlying transport.
  5. The next call to ws_read(), seeing rlen == 0 with payload in the underlying transport, will parse the underlying transport payload as headers, causing the spew of non-RFC6445 frame-types

@david-cermak
Copy link
Collaborator

If we get ws_read() == 0 any time during payload reception, it should signify a timeout error. In such cases, we should abort the connection and notify the server to close it. I'll review the relevant code to handle these edge cases.

@bryghtlabs-richard
Copy link
Contributor Author

Sounds good to me, network_timeout_ms is documented to Abort network operation, and I think that's what gets passed to transport_ws's ws_read().

I would like to mention that if a client would prefers to read data more often than network_timeout_ms, we might want a different read_timeout_ms that doesn't teardown the connection, but I don't currently have that need, so I won't request it here.

Thank you for taking a look at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

No branches or pull requests

3 participants