SFTP OOM Error #726

sullivan-sean · 2024-12-10T02:01:46Z

My profiler is reporting very large memory allocation (many GB) from asyncssh compression.py from a few concurrent read calls for relatively small files.

At the time of the OOM there was no space left on the remote device.

I'll try to reproduce, but my hunch would be that memory isn't freed properly if the compress call succeeds and the flush call fails here:

asyncssh/asyncssh/compression.py

Lines 72 to 76 in e8169bf

    
           try: 
        
               return self._comp.compress(data) + \ 
        
                      self._comp.flush(zlib.Z_SYNC_FLUSH) 
        
           except zlib.error: # pragma: no cover 
        
               return None

Are you open to a PR that frees any allocation or explicitly deletes the zlib.compressobj on failure here?

The text was updated successfully, but these errors were encountered:

ronf · 2024-12-10T05:45:07Z

A few comments/questions:

On modern networks, enabling compression actually slows down transfers, in addition to using lots of additional CPU on the sender. AsyncSSH is now disabling compression by default (as of 2.18.0), though it will allow it if the remote system requires it. Independent of this memory issue, you might want to try without compression and see if that actually performs better than with compression enabled.
Do you know why you are getting compression failures? That should basically never happen. The compression call should work no matter what the data is, and the decompression should only fail if the data bytes were tampered with, but with the end-to-end compression that should never happen unless you are connecting to a buggy (or malicious) SSH peer. In this case, the issue is supposedly with compression, so I don't see how that would ever return None.
Any single compression object should not be needing much memory. It's basically just its history and Huffman table, which should be a small number of KB, or at most a few tens of KB. You'd need hundreds of thousands of connections for this allocation to end up growing to multiple GB, unless something very wrong is happening in the compress function.
If there was a compression failure, it should trigger a fatal error and immediately close the connection, which should make the compression object eligible for deletion. While the garbage collector might not kick in immediately, it shouldn't be too long before it did, especially if the amount of allocated memory is extremely large.
Are you actually seeing failures in the compress call? If so, have you tried to dig in and see what specific error you're getting in the exception?

sullivan-sean · 2024-12-10T06:11:34Z

Haven't reproduced yet but this all makes sense. I will come back with more comprehensive logging, etc. if I can come up with a minimal reproduction.

sullivan-sean · 2024-12-17T00:15:09Z

More likely an issue with our profiling than asyncssh

sullivan-sean closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFTP OOM Error #726

SFTP OOM Error #726

sullivan-sean commented Dec 10, 2024

ronf commented Dec 10, 2024 •

edited

Loading

sullivan-sean commented Dec 10, 2024

sullivan-sean commented Dec 17, 2024

SFTP OOM Error #726

SFTP OOM Error #726

Comments

sullivan-sean commented Dec 10, 2024

ronf commented Dec 10, 2024 • edited Loading

sullivan-sean commented Dec 10, 2024

sullivan-sean commented Dec 17, 2024

ronf commented Dec 10, 2024 •

edited

Loading