Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal performance for large writes #215

Open
ANahr opened this issue Nov 16, 2024 · 2 comments
Open

Suboptimal performance for large writes #215

ANahr opened this issue Nov 16, 2024 · 2 comments
Assignees
Labels
performance issues or enhancements related to the performance characteristics of the project

Comments

@ANahr
Copy link

ANahr commented Nov 16, 2024

I have tested V3 with some larger writes and there seems to be some performance-problematic behavior there.
Even when just writing (not producing any free sectors in between) the performance diminishes rapidly.

I have made a benchmark (using Microsoft.IO.RecyclableMemoryStreamManager package, but that does not affect the general problem).
It creates some stores and some streams in the store. Mostly small streams and one large one. Altogether approx. 2,5GB in final size.

Running this on my machine takes several minutes, when switching to a filestream i could not even await it.

using OpenMcdf;
using System.Diagnostics;

byte[] randomBuffer = new byte[10000];
var stopwatch = Stopwatch.StartNew();

var manager = new Microsoft.IO.RecyclableMemoryStreamManager();
var baseStream = new Microsoft.IO.RecyclableMemoryStream(manager);

using var rootStorage = RootStorage.Create(baseStream);
for (int k = 0; k < 20; k++)
{
    var store = rootStorage.CreateStorage("TestStorage" + k);
    for (int i = 0; i < 100; i++)
    {
        using var stream = store.CreateStream("TestStream" + i);

        var to = i == 0 ? 10000 : 1;
        for (int j = 0;  j < to; j++)
            stream.Write(randomBuffer, 0, 10000);
    }
}

Console.WriteLine($"Elapsed: {stopwatch.Elapsed}");

The time is spent 99,9x% in FatEnumerator.MoveNext()

@jeremy-visionaid
Copy link
Collaborator

@ANahr Thanks for testing that out! I haven't yet tried profiling anything beyond 1Mb streams simply since that was what was there in the original benchmarks for 2.3/2.4. In my use case too, the files are large, but the individual streams are only a few megabytes (and I use V4 with 4 KB sectors).

So, the FAT chain extension isn't optimized at all for long chains - the good news is that it should be reasonably straightforward to improve it. I'm mostly working to improve the test coverage right now, but I can probably have a go at improving the large stream write performance in the next few days.

@jeremy-visionaid jeremy-visionaid added the performance issues or enhancements related to the performance characteristics of the project label Nov 17, 2024
@jeremy-visionaid jeremy-visionaid self-assigned this Nov 17, 2024
@jeremy-visionaid
Copy link
Collaborator

jeremy-visionaid commented Nov 18, 2024

@ANahr Just thinking about this one... Although I haven't checked with against the 2.4 branch, I modified your benchmarks for comparison to Microsoft's reference implementation/baseline....

    static void MultiStorageAndStreamWrite()
    {
        int storageCount = 8;
        int streamCount = 8;
        int writeCount = 1024;
        byte[] buffer = new byte[32 * 512];

        Microsoft.IO.RecyclableMemoryStreamManager manager = new ();
        Microsoft.IO.RecyclableMemoryStream baseStream = new(manager);
        baseStream.Capacity = 2 * (storageCount * buffer.Length * writeCount + storageCount * (streamCount - 1) * buffer.Length);

        using var rootStorage = RootStorage.Create(baseStream);
        for (int k = 0; k < storageCount; k++)
        {
            Console.WriteLine($"Creating Storage {k}");
            Storage storage = rootStorage.CreateStorage($"TestStorage{k}");
            for (int i = 0; i < streamCount; i++)
            {
                using CfbStream stream = storage.CreateStream($"TestStream{i}");

                int to = i == 0 ? writeCount : 1;
                for (int j = 0; j < to; j++)
                    stream.Write(buffer, 0, buffer.Length);
            }
        }
    }

    static void MultiStorageAndStreamWriteBaseline()
    {
        int storageCount = 8;
        int streamCount = 8;
        int writeCount = 1024;
        byte[] buffer = new byte[32 * 512];
        int capacity = 2 * (storageCount * buffer.Length * writeCount + storageCount * (streamCount - 1) * buffer.Length);

        using var rootStorage = StructuredStorage.Storage.CreateInMemory(capacity);
        for (int k = 0; k < storageCount; k++)
        {
            Console.WriteLine($"Creating Storage {k}");
            var storage = rootStorage.CreateStorage($"TestStorage{k}");
            for (int i = 0; i < streamCount; i++)
            {
                using var stream = storage.CreateStream($"TestStream{i}");

                int to = i == 0 ? writeCount : 1;
                for (int j = 0; j < to; j++)
                    stream.Write(buffer, 0, buffer.Length);
            }
        }
    }

For OpenMcdf 3.0:

Creating Storage 0
Creating Storage 1
Creating Storage 2
Creating Storage 3
Creating Storage 4
Creating Storage 5
Creating Storage 6
Creating Storage 7
Elapsed: 00:00:00.4910917

And for V4: 00:00:00.1447701

For Microsoft Structured Storage (baseline):

Creating Storage 0
Creating Storage 1
Creating Storage 2
Creating Storage 3
Creating Storage 4
Creating Storage 5
Creating Storage 6
Creating Storage 7
Elapsed: 00:01:28.1054421

It seems like performance is actually pretty good compared to baseline already. It's of course possible there's some issue with the wrapper (though the profiler shows 99.9% of the time is spent in Windows.Win32.System.Com.IStream.Write and since almost all the time is within the p/invokes not the wrapper, it seems likely to be representative). Perhaps this is more just that you're writing a really large number of sectors (~5 million). i.e. My initial thoughts are that this might be more of an issue with expectations rather than there being an actual performance problem...

I'm just curious if there was some actual problem for your use case, or whether it's just slower than you expected/hoped for (i.e. are you doing something like this on 2.x with it being faster there)?

Looking at the implementation again, I think there is still some room for improvement, but I'm thinking that the bottleneck is more likely around searching the FAT for free sectors rather than extending the chain in itself. Without caching, a lot of the time when creating new streams, the whole FAT is going to get searched to find an initial free sector, and that obviously takes progressively longer as the FAT is extended.

This was referenced Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues or enhancements related to the performance characteristics of the project
Projects
None yet
Development

No branches or pull requests

2 participants