Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introducing BufferRecyclerPool #1061

Closed
wants to merge 1 commit into from

Conversation

mariofusco
Copy link
Contributor

This pull request is a work in progress and at the moment it is not intended to be merged as it is. This is mostly to start a discussion and proposing a first tentative fix for the problem reported here.

In particular this is trying to implement the solution 1. suggested by @franz1981. The solution 2. would have the advantage of being the less invasive for jackson (and would be probably guarantee better performances), but it is totally impossible now and even in a near or far future it is very unlikely that the JDK team will agree to make the per-carrier-thread cache generally available since it would break some of the virtual threads encapsulations.

Also following what suggested in this comment by @cowtowncoder, I'm trying to better define the lifecycle of the JsonParser and JsonGenerator making their close methods to even close the underlying IOContext. The IOContext is now the only owner of its BufferRecycler: instead of being injected with a BufferRecycler instance as an argument of its constructor, it borrows the BufferRecycler directly from the new ObjectPool, and put it back into the pool when it is closed.

This solution of course relies on the fact that the close methods of the JsonParser and JsonGenerator are always called correctly. I haven't found a place when this doesn't happen and in reality I conversely found at least one situation when this happens twice, without any apparent valid reason.

The ObjectPool implementation is extremely naive and the main reason why this pull request has to be considered just a draft. However it only has the borrow and offer methods, so I believe that it could be easily replaced by any real-world and battle tested high performance implementation. To this purpose I'd prefer to avoid reinventing the wheel and reusing one of the object pool implementations already available like the one in JCTools suggested by @franz1981 or one of the many alternatives listed here, even though not all of them are really an option because some also internally relies on ThreadLocals in some way.

So here the first questions: is it ok for jackson-core to directly depend on one of those libraries providing a production ready object pool or do we need/want to have an implementation directly inside jackson codebase? Do you know any of those libraries/implementations or have a specific preference for any of them? /cc @cowtowncoder @pjfanning

Finally I also made a first attempt to check the performance of this solution, despite and regardless the poor object pool implementation, using a benchmark that I'm pasting at the end of this comment. Running this benchmark against the current master branch I obtained the following results:

Benchmark                                                                           (parallelTasks)  (useVirtualThreads)   Mode  Cnt         Score        Error   Units
JacksonMultithreadWriteVanilla.writePojoMediaItem                                              1000                 true  thrpt    5       509.060 ±      6.795   ops/s
JacksonMultithreadWriteVanilla.writePojoMediaItem                                              1000                false  thrpt    5      1402.935 ±     56.546   ops/s

while rerunning again using this pull request I got:

Benchmark                                                                           (parallelTasks)  (useVirtualThreads)   Mode  Cnt        Score       Error   Units
JacksonMultithreadWriteVanilla.writePojoMediaItem                                              1000                 true  thrpt    5      680.704 ±    14.860   ops/s
JacksonMultithreadWriteVanilla.writePojoMediaItem                                              1000                false  thrpt    5     1279.282 ±     4.998   ops/s

As expected the native threads execution is suffering of the worse performing object pool implementation, compared with the current ThreadLocal based one, while the one running on virtual threads is already faster in my version, so I'm confident of being on the right path for a good solution once the object pooling problem will be properly addressed.

Any remark or suggestion to improve this (including any concern on why this solution couldn't work) is warmly welcome. The code of the benchmark I implemented and used follows:

package com.fasterxml.jackson.perf.json;

import tools.jackson.jr.ob.JSON;
import com.fasterxml.jackson.perf.model.MediaItem;
import com.fasterxml.jackson.perf.model.MediaItems;
import com.fasterxml.jackson.perf.util.NopOutputStream;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.Consumer;

@State(value = Scope.Benchmark)
@Fork(1)
public class JacksonMultithreadWriteVanilla {
    private static final JSON json = JSON.std;

    private static final MediaItem item = MediaItems.stdMediaItem();

    @Param({"true", "false"})
    private boolean useVirtualThreads;

    @Param({"1000"})
    private int parallelTasks;

    private Consumer<Runnable> runner;

    @Setup
    public void setup() {
        this.runner = createRunner(useVirtualThreads);
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void writePojoMediaItem(Blackhole bh) throws Exception {
        CountDownLatch countDown = new CountDownLatch(parallelTasks);

        for (int i = 0; i < parallelTasks; i++) {
            runner.accept(() -> {
                bh.consume(write(item, json));
                countDown.countDown();
            });
        }

        try {
            countDown.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    protected final int write(Object value, JSON writer) {
        NopOutputStream out = new NopOutputStream();
        writer.write(value, out);
        return out.size();
    }

    private Consumer<Runnable> createRunner(boolean useVirtualThreads) {
        if (useVirtualThreads) {
            return Thread::startVirtualThread;
        } else {
            return Executors.newWorkStealingPool()::execute;
        }
    }
}


@Override
public void close() {
if (closed.compareAndSet(false, true)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be concurrent? If not, and just "ordered" you can use a plain bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I implemented it in a concurrent way, but I agree that this constraint can be relaxed if not necessary.

*
* @return BufferRecycler instance to use
*/
public BufferRecycler _getBufferRecycler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely that we will accept having any public APIs removed - you will ideally need to keep them and possibly deprecate them.

Otherwise, this PR will need to be retargeted and master and aimed at the 3.0 release which has no release date set for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is targeted at master - but be aware, it could be years before 3.0 is released.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I indeed sent this PR against master, as I expect this to be ready and merged for the 3.0 release. I don't think that we could keep alive both this method and the solution based on the new object pool at the same time, because using one will break the assumptions made by the other. That said my understanding was that the methods and fields starting with an underscore were intended only for internal use and not considered part of the public API, am I missing something?

Copy link
Contributor Author

@mariofusco mariofusco Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is targeted at master - but be aware, it could be years before 3.0 is released.

Ok, if so I guess this won't be acceptable for our usage purposes in Quarkus. Should I retarget this against 2.16 branch? Also can you clarify the naming convention around method names starting with an underscore? Is it safe to assume that they are not part of the public API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public means public. @cowtowncoder may consider allowing an exception - he's the BDFL.

Since it's following the underscore naming convention, you might say it is public but not guaranteed not to be removed at short notice (not up to me to make this call though).

@@ -99,6 +101,8 @@ public class IOContext
*/
protected char[] _nameCopyBuffer;

private volatile AtomicBoolean closed = new AtomicBoolean(false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per later, maybe not needed any atomic type, if really need it - prefer AtomicIntegerFieldUpdater (no bool sadly :"( unless using VarHandle, which is probably not available for the base supported JDK version) or make it final and not volatile (itself) too.

The suggestion for the field updater is due to the number of IOContext instances alive, to reduce the footprint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure how one could reduce footprint as IOContexts are independent and independently handled?

@cowtowncoder
Copy link
Member

Ok, first of all, thank you for offering this, @mariofusco! I need time to actually read the draft, hoping to get to that soon.

Some comments:

  1. Yes, as per @pjfanning, changes to public API (aside from additions) can't really go in 2.x. And while I have high hopes to get back to 3.0 dev after 2.16, there's now longish history of postponement
  2. Adding dependency to object pooling etc from jackson-core: fairly strong no; embedding code possible as long as its bounded in size. We have had bit of mixed success wrt Fast Double Parsing effort (although I think eventually found a way that works ok)
  3. I'd be +1 for incremental refactoring of things in 2.x, minor version by minor version, but this is probably rather slow for anyone excepting to use things in near future (realistically getting up to 2 minor revisions per year). That is/was my hope anyway, with some preparatory steps to be able to allow pluggability by, say, 2.17 or 2.18.

@Override
public void close() {
if (closed.compareAndSet(false, true)) {
BufferRecyclerPool.offerBufferRecycler(_bufferRecycler);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If final definition was removed, we could perhaps use null-ness of _bufferRecycler itself (i.e. only offer if not null)


public class BufferRecyclerPool {

private static final ObjectPool<BufferRecycler> pool = ObjectPool.newLockFreePool(BufferRecycler::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok; since this is now global pool (across all factories), there definitely should be maximum size. But then again, having such setting would lead to need for configuration.
And static singletons are generally a bad idea and not suitable for use by (low-level) libraries.

In fact, now that I think about this, I think pooling (or default implementation at any rate) really needs to be segmented by factory instance -- no buffers should be shared (by default impl) across TokenStreamFactory instances.

return new LockFreeObjectPool<>(factory, destroyer);
}

class LockFreeObjectPool<T> implements ObjectPool<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that any pool to use should have maximum size, as a general principle.

@cowtowncoder
Copy link
Member

Ok so... first, apologies if I misread code. I tried to go over it with limited amount of time so I may well have misunderstood something.

But with that, I think my main concern is that it seems that buffer recycling would now become global across all TokenStreamFactory instances.
Ideally (IMO) recycling would -- by default, at least -- be limited to all parsers/generators constructed by a single factory. While there are pros and cons for single global pool vs smaller per-factory ones, I feel that scoping by Jackson should always be by factories (JsonFactory, ObjectMapper) to keep allocations and contestion bounded.
(I do not mind ability to provide other implementations, with different segmentation).

With per-factory recycling/pooling, configurability could then also be exposed on per-factory basis -- otherwise any configurability would have to be global (static), affecting all use by all frameworks.

To do this, I think factory would need to hold on to factory pools BufferRecyclers; create one on construction, pass it on to IOContext.

Does this make sense?

@cowtowncoder
Copy link
Member

I suspect this should be closed for now; we should be able to merge #1061 up to master once it gets resolved.

@mariofusco
Copy link
Contributor Author

I suspect this should be closed for now; we should be able to merge #1061 up to master once it gets resolved.

Yes, I will close this pull request. I will keep working on 2.16 branch and will forward port that work to master when it will be finalized and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants