-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make stdout and stderr handle decode errors gracefully #505
Make stdout and stderr handle decode errors gracefully #505
Conversation
… replacement char; add tests
|
|
|
Do we know why it is happening? Can we improve the encoding in envd or are we doing something improper while encoding? |
My question is basically—why are there undecodable chars in the stdout/stderr? Is this because of the command output or are we messing something up in the envd? |
Essentially when reading data from a network socket, the client might encounter a scenario where a multi-byte UTF-8 character is split across network reads. UTF-8 uses multiple bytes to represent characters with codepoints above 127, and these characters can be interrupted mid-sequence when the buffer is filled. |
Should this happen here when the messages we are getting from envd are actually structured and contain multiple fields in addition to the stdout/stderr? |
With that in mind, should we fix this when reading the messages here ( E2B/packages/python-sdk/e2b_connect/client.py Line 395 in 5f022cc
|
IIUC you don't get to control where the split happens when reading byte stream from the network, but at the same time im surprised this is not a solved problem given the ubiquity of utf8 encoded byte array streaming |
That sounds like a separate PR. This PR looks good to me! |
Description
The
sbx.commands.run
method can generate astdout
with badly encodedUTF-8
bytes, which can cause this kind of errors:see more here: https://linear.app/e2b/issue/E2B-1291/unicodedecodeerror-in-python-sdk-when-using-curl-command
To address this we update the
decode
error handler to replace with the unicode replacement char.Test
poetry run pytest -s -n 4 -k "tests/sync/sandbox_sync/commands/test_run"