You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are already a few tickets here about large memory consumption, but they don't clarify the reason for this.
I probably don't understand exactly what tool is doing that is causing it to need so much RAM to run (more than 20 GB in our case).
We run azopy sync every day (Azure Files -> Local Dir). The total number of files in storage is about 7 million, of which about 50k files are deleted and added a day (the rest of the files are immutable).
As I can assume, azcopy sync collects the full list of files from the source and maps it to the target folder. I did a rudimentary test - and I manage to store this list consuming only 2 GB versus the 20 GB that azcopy consumes. I should note that I didn't use any optimizations, but just formed hashmap with full file paths as key (as UTF-16 strings) and 64 bytes with arbitrary information (enough to store dates, attributes and much more).
What else does azcopy sync do that it needs so much extra memory?
Perhaps a consistent snapshot of the storage is provided, so it's much more complicated than I'm assuming? If so, could it be possible to have a synchronization mode that allows some data to be changed already during the copying process, but will consume many times less memory?
Maybe it caches a lot of data during copying before putting it in the target dir? If so, is it possible to make cache smaller?
The text was updated successfully, but these errors were encountered:
In your case, sync is caching the list of source/dest file pairs to determine which files to transfer but it is also using memory for the downloads happening.
There are already a few tickets here about large memory consumption, but they don't clarify the reason for this.
I probably don't understand exactly what tool is doing that is causing it to need so much RAM to run (more than 20 GB in our case).
We run
azopy sync
every day (Azure Files -> Local Dir). The total number of files in storage is about 7 million, of which about 50k files are deleted and added a day (the rest of the files are immutable).As I can assume,
azcopy sync
collects the full list of files from the source and maps it to the target folder. I did a rudimentary test - and I manage to store this list consuming only 2 GB versus the 20 GB thatazcopy
consumes. I should note that I didn't use any optimizations, but just formed hashmap with full file paths as key (as UTF-16 strings) and 64 bytes with arbitrary information (enough to store dates, attributes and much more).What else does
azcopy sync
do that it needs so much extra memory?Perhaps a consistent snapshot of the storage is provided, so it's much more complicated than I'm assuming? If so, could it be possible to have a synchronization mode that allows some data to be changed already during the copying process, but will consume many times less memory?
Maybe it caches a lot of data during copying before putting it in the target dir? If so, is it possible to make cache smaller?
The text was updated successfully, but these errors were encountered: