Suboptimal performance for large writes #215

ANahr · 2024-11-16T19:45:34Z

I have tested V3 with some larger writes and there seems to be some performance-problematic behavior there.
Even when just writing (not producing any free sectors in between) the performance diminishes rapidly.

I have made a benchmark (using Microsoft.IO.RecyclableMemoryStreamManager package, but that does not affect the general problem).
It creates some stores and some streams in the store. Mostly small streams and one large one. Altogether approx. 2,5GB in final size.

Running this on my machine takes several minutes, when switching to a filestream i could not even await it.

using OpenMcdf;
using System.Diagnostics;

byte[] randomBuffer = new byte[10000];
var stopwatch = Stopwatch.StartNew();

var manager = new Microsoft.IO.RecyclableMemoryStreamManager();
var baseStream = new Microsoft.IO.RecyclableMemoryStream(manager);

using var rootStorage = RootStorage.Create(baseStream);
for (int k = 0; k < 20; k++)
{
    var store = rootStorage.CreateStorage("TestStorage" + k);
    for (int i = 0; i < 100; i++)
    {
        using var stream = store.CreateStream("TestStream" + i);

        var to = i == 0 ? 10000 : 1;
        for (int j = 0;  j < to; j++)
            stream.Write(randomBuffer, 0, 10000);
    }
}

Console.WriteLine($"Elapsed: {stopwatch.Elapsed}");

The time is spent 99,9x% in FatEnumerator.MoveNext()

The text was updated successfully, but these errors were encountered:

jeremy-visionaid · 2024-11-17T10:32:06Z

@ANahr Thanks for testing that out! I haven't yet tried profiling anything beyond 1Mb streams simply since that was what was there in the original benchmarks for 2.3/2.4. In my use case too, the files are large, but the individual streams are only a few megabytes (and I use V4 with 4 KB sectors).

So, the FAT chain extension isn't optimized at all for long chains - the good news is that it should be reasonably straightforward to improve it. I'm mostly working to improve the test coverage right now, but I can probably have a go at improving the large stream write performance in the next few days.

jeremy-visionaid · 2024-11-18T00:15:04Z

@ANahr Just thinking about this one... Although I haven't checked with against the 2.4 branch, I modified your benchmarks for comparison to Microsoft's reference implementation/baseline....

    static void MultiStorageAndStreamWrite()
    {
        int storageCount = 8;
        int streamCount = 8;
        int writeCount = 1024;
        byte[] buffer = new byte[32 * 512];

        Microsoft.IO.RecyclableMemoryStreamManager manager = new ();
        Microsoft.IO.RecyclableMemoryStream baseStream = new(manager);
        baseStream.Capacity = 2 * (storageCount * buffer.Length * writeCount + storageCount * (streamCount - 1) * buffer.Length);

        using var rootStorage = RootStorage.Create(baseStream);
        for (int k = 0; k < storageCount; k++)
        {
            Console.WriteLine($"Creating Storage {k}");
            Storage storage = rootStorage.CreateStorage($"TestStorage{k}");
            for (int i = 0; i < streamCount; i++)
            {
                using CfbStream stream = storage.CreateStream($"TestStream{i}");

                int to = i == 0 ? writeCount : 1;
                for (int j = 0; j < to; j++)
                    stream.Write(buffer, 0, buffer.Length);
            }
        }
    }

    static void MultiStorageAndStreamWriteBaseline()
    {
        int storageCount = 8;
        int streamCount = 8;
        int writeCount = 1024;
        byte[] buffer = new byte[32 * 512];
        int capacity = 2 * (storageCount * buffer.Length * writeCount + storageCount * (streamCount - 1) * buffer.Length);

        using var rootStorage = StructuredStorage.Storage.CreateInMemory(capacity);
        for (int k = 0; k < storageCount; k++)
        {
            Console.WriteLine($"Creating Storage {k}");
            var storage = rootStorage.CreateStorage($"TestStorage{k}");
            for (int i = 0; i < streamCount; i++)
            {
                using var stream = storage.CreateStream($"TestStream{i}");

                int to = i == 0 ? writeCount : 1;
                for (int j = 0; j < to; j++)
                    stream.Write(buffer, 0, buffer.Length);
            }
        }
    }

For OpenMcdf 3.0:

Creating Storage 0
Creating Storage 1
Creating Storage 2
Creating Storage 3
Creating Storage 4
Creating Storage 5
Creating Storage 6
Creating Storage 7
Elapsed: 00:00:00.4910917

And for V4: 00:00:00.1447701

For Microsoft Structured Storage (baseline):

Creating Storage 0
Creating Storage 1
Creating Storage 2
Creating Storage 3
Creating Storage 4
Creating Storage 5
Creating Storage 6
Creating Storage 7
Elapsed: 00:01:28.1054421

It seems like performance is actually pretty good compared to baseline already. It's of course possible there's some issue with the wrapper (though the profiler shows 99.9% of the time is spent in Windows.Win32.System.Com.IStream.Write and since almost all the time is within the p/invokes not the wrapper, it seems likely to be representative). Perhaps this is more just that you're writing a really large number of sectors (~5 million). i.e. My initial thoughts are that this might be more of an issue with expectations rather than there being an actual performance problem...

I'm just curious if there was some actual problem for your use case, or whether it's just slower than you expected/hoped for (i.e. are you doing something like this on 2.x with it being faster there)?

Looking at the implementation again, I think there is still some room for improvement, but I'm thinking that the bottleneck is more likely around searching the FAT for free sectors rather than extending the chain in itself. Without caching, a lot of the time when creating new streams, the whole FAT is going to get searched to find an initial free sector, and that obviously takes progressively longer as the FAT is extended.

jeremy-visionaid added the performance issues or enhancements related to the performance characteristics of the project label Nov 17, 2024

jeremy-visionaid self-assigned this Nov 17, 2024

This was referenced Nov 18, 2024

Add large write perf methods #222

Merged

v3 Planning & Objectives #194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal performance for large writes #215

Suboptimal performance for large writes #215

ANahr commented Nov 16, 2024

jeremy-visionaid commented Nov 17, 2024

jeremy-visionaid commented Nov 18, 2024 •

edited

Loading

Suboptimal performance for large writes #215

Suboptimal performance for large writes #215

Comments

ANahr commented Nov 16, 2024

jeremy-visionaid commented Nov 17, 2024

jeremy-visionaid commented Nov 18, 2024 • edited Loading

jeremy-visionaid commented Nov 18, 2024 •

edited

Loading