Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFTP OOM Error #726

Closed
sullivan-sean opened this issue Dec 10, 2024 · 3 comments
Closed

SFTP OOM Error #726

sullivan-sean opened this issue Dec 10, 2024 · 3 comments

Comments

@sullivan-sean
Copy link

My profiler is reporting very large memory allocation (many GB) from asyncssh compression.py from a few concurrent read calls for relatively small files.

At the time of the OOM there was no space left on the remote device.

image

I'll try to reproduce, but my hunch would be that memory isn't freed properly if the compress call succeeds and the flush call fails here:

try:
return self._comp.compress(data) + \
self._comp.flush(zlib.Z_SYNC_FLUSH)
except zlib.error: # pragma: no cover
return None

Are you open to a PR that frees any allocation or explicitly deletes the zlib.compressobj on failure here?

@ronf
Copy link
Owner

ronf commented Dec 10, 2024

A few comments/questions:

  • On modern networks, enabling compression actually slows down transfers, in addition to using lots of additional CPU on the sender. AsyncSSH is now disabling compression by default (as of 2.18.0), though it will allow it if the remote system requires it. Independent of this memory issue, you might want to try without compression and see if that actually performs better than with compression enabled.

  • Do you know why you are getting compression failures? That should basically never happen. The compression call should work no matter what the data is, and the decompression should only fail if the data bytes were tampered with, but with the end-to-end compression that should never happen unless you are connecting to a buggy (or malicious) SSH peer. In this case, the issue is supposedly with compression, so I don't see how that would ever return None.

  • Any single compression object should not be needing much memory. It's basically just its history and Huffman table, which should be a small number of KB, or at most a few tens of KB. You'd need hundreds of thousands of connections for this allocation to end up growing to multiple GB, unless something very wrong is happening in the compress function.

  • If there was a compression failure, it should trigger a fatal error and immediately close the connection, which should make the compression object eligible for deletion. While the garbage collector might not kick in immediately, it shouldn't be too long before it did, especially if the amount of allocated memory is extremely large.

  • Are you actually seeing failures in the compress call? If so, have you tried to dig in and see what specific error you're getting in the exception?

@sullivan-sean
Copy link
Author

Haven't reproduced yet but this all makes sense. I will come back with more comprehensive logging, etc. if I can come up with a minimal reproduction.

@sullivan-sean
Copy link
Author

More likely an issue with our profiling than asyncssh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants