Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another question on RAM consumption #2867

Open
ArtemAvramenko opened this issue Nov 15, 2024 · 1 comment
Open

Another question on RAM consumption #2867

ArtemAvramenko opened this issue Nov 15, 2024 · 1 comment

Comments

@ArtemAvramenko
Copy link

ArtemAvramenko commented Nov 15, 2024

There are already a few tickets here about large memory consumption, but they don't clarify the reason for this.

I probably don't understand exactly what tool is doing that is causing it to need so much RAM to run (more than 20 GB in our case).

We run azopy sync every day (Azure Files -> Local Dir). The total number of files in storage is about 7 million, of which about 50k files are deleted and added a day (the rest of the files are immutable).

As I can assume, azcopy sync collects the full list of files from the source and maps it to the target folder. I did a rudimentary test - and I manage to store this list consuming only 2 GB versus the 20 GB that azcopy consumes. I should note that I didn't use any optimizations, but just formed hashmap with full file paths as key (as UTF-16 strings) and 64 bytes with arbitrary information (enough to store dates, attributes and much more).

What else does azcopy sync do that it needs so much extra memory?

Perhaps a consistent snapshot of the storage is provided, so it's much more complicated than I'm assuming? If so, could it be possible to have a synchronization mode that allows some data to be changed already during the copying process, but will consume many times less memory?

Maybe it caches a lot of data during copying before putting it in the target dir? If so, is it possible to make cache smaller?

@gapra-msft
Copy link
Member

Hi @ArtemAvramenko Please see this page for information on how to limit memory usage. https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-optimize#optimize-memory-use

In your case, sync is caching the list of source/dest file pairs to determine which files to transfer but it is also using memory for the downloads happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants