Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contention in TypeNameIdResolver.idFromClass() #2556

Closed
CodingFabian opened this issue Nov 29, 2019 · 16 comments
Closed

Contention in TypeNameIdResolver.idFromClass() #2556

CodingFabian opened this issue Nov 29, 2019 · 16 comments
Milestone

Comments

@CodingFabian
Copy link
Contributor

We are using Jackson 2.9.10 and we have many threads serializing Java objects onto kafka.

Under high load (cough, black friday), we see several threads blocking on the type map that is contained in TypeNameIdResolver

downstream-7 id=121 state=BLOCKED
    - waiting to lock <0x1a5caa08> (a java.util.HashMap)
     owned by downstream-4 id=118
    at com.fasterxml.jackson.databind.jsontype.impl.TypeNameIdResolver.idFromClass(TypeNameIdResolver.java:95)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeNameIdResolver.idFromValue(TypeNameIdResolver.java:83)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase.idFromValue(TypeSerializerBase.java:82)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase._generateTypeId(TypeSerializerBase.java:67)
    at com.fasterxml.jackson.databind.jsontype.impl.TypeSerializerBase.writeTypePrefix(TypeSerializerBase.java:45)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeWithType(BeanSerializerBase.java:600)
    at com.fasterxml.jackson.databind.ser.std.CollectionSerializer.serializeContents(CollectionSerializer.java:147)
    at com.fasterxml.jackson.databind.ser.std.CollectionSerializer.serialize(CollectionSerializer.java:107)
    at com.fasterxml.jackson.databind.ser.std.CollectionSerializer.serialize(CollectionSerializer.java:25)
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
    at com.fasterxml.jackson.module.afterburner.ser.ObjectMethodPropertyWriter.serializeAsField(ObjectMethodPropertyWriter.java:40)
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:719)
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:155)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:400)
    at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1392)
    at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1120)
    at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsBytes(ObjectWriter.java:1017)

I see that this is a synchronized block here:
https://github.com/FasterXML/jackson-databind/blob/master/src/main/java/com/fasterxml/jackson/databind/jsontype/impl/TypeNameIdResolver.java#L99

Could that be changed? Should we somehow use several TypeNameIdResolvers (how would one do that?

@CodingFabian
Copy link
Contributor Author

One solution we are thinking of is to copy() the ObjectMapper instance before obtaining a writer (which we then put in a ThreadLocal)

@cowtowncoder
Copy link
Member

Results should quickly get cached, so even with synchronization it would seem odd that it would become bottle-neck, so my first suspicion is usually regarding whether something else is not getting reused.
In this case, scope of TypeNameIdResolver is per-property (or possibly, per-polymorphic class).

I would not recommend copying ObjectMapper as that is as expensive operation is fully creating a new mapper instance, and having separate copy per-thread is rather heavy-weight (due to caching of all (de)serializers).

However: I can change synchronization itself to reduce scope so that (usually successful) lookup, and other processing (introspection of cached id not found, addition to lookup map) are not within same sync block, and use separate sync for put'ing of new value.

My main concern is just that I do not have any way to really test this, so if there was a way to test this somehow (like running under profiler) to verify improvement, that'd be great.

But I can make a change to get released in 2.10.2 I think; I have made similar changes to synchronized scope in the past, with good results.

@CodingFabian
Copy link
Contributor Author

The ThreadLocal copies gave us 10% throughput improvements. We have plenty of memory, so that was a valuable tradeoff.

Not sure if we can make a good benchmark, but now that we know where the Threads block, we should be able to compare versions with our loadtests.

@CodingFabian
Copy link
Contributor Author

turning _idToType and _typeToId into ConcurrentHashMap with computeIfAbsent should probably fix this.

@cowtowncoder cowtowncoder modified the milestones: 2.10.1, 2.10.2 Dec 1, 2019
@cowtowncoder
Copy link
Member

Jackson-databind can not use Java 8 methods until 3.0, so unfortunately it is not available.
But I think change I am checking in will reduce or remove the issue.

@cowtowncoder cowtowncoder changed the title Contention in TypeNameIdResolver Contention in TypeNameIdResolver.idFromClass() Dec 1, 2019
@cowtowncoder
Copy link
Member

With fix, it should be possible to build (or use from Sonatype OSS) 2.10.2-SNAPSHOT build to see if change resolved thread block.

I am still somewhat surprised as to how it could have resulted in performance problems: after resolving type ids once, they should stick, unless ObjectMapper instances were discarded, or deserializers not cached. Or something...

@CodingFabian
Copy link
Contributor Author

We will check it. But you are right. That’s why I proposed the ConcurrentHashMap. I think in our case the successful reads just block (so the synchronized block that is still in). We have around 50k ObjectMapper.writeValue per second.

@cowtowncoder
Copy link
Member

That rate does explain why it could matter (impressive that polymorphic handling can scale as that seems like it could add quite a bit of overhead).
But the part I do not get is not that resolution gets called often enough to matter, but rather that "miss" case -- id not yet resolved -- would occur often enough to matter. It should only be hit once per type id, fill the Map, and then never be touched again.

Alternatively I could be mistaken and it actually is about contested get(): if so, then ConcurrentHashMap would be the logical next step.
Even if computeIfAbsent() can't yet be used (with 2.x, anyway -- and with 3.0 it might make sense to avoid to minimize block length), it would be more efficient with get/put pair too, I assume.

So I guess results from benchmark would be interesting. :)

@CodingFabian
Copy link
Contributor Author

Terve @cowtowncoder,

I am currently travelling, so please take my benchmarks with a grain of salt. I will re-run them when I have a stable power connection.

Our code uses afterburner, so I included that in my benchmarks.
As you will be able to see in the results, your change does very little. That is because the Map does contain the information and no lookup is necessary. it is really the contention on "get".
Also I need to correct myself. the lookups are probably around 500k/s in our case, as our pojo has about 10 entries in the list.

I did exchange the map for a ConcurrentHashMap and got a reproducible 10% speedup (that is about the same as we get with our ThreadLocal ObjectWriter)

Find my benchmark in this gist:
https://gist.github.com/CodingFabian/4d3612a256628fd25521f4946389f133

I will make a PR to illustrate my CHM change (although its really straightforward, just wrap it in the constructor)

@CodingFabian
Copy link
Contributor Author

For 2.10.1


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9417713.874 ±(99.9%) 1518796.826 ops/s [Average]
  (min, avg, max) = (8736898.377, 9417713.874, 9710435.409), stdev = 394426.713
  CI (99.9%): [7898917.047, 10936510.700] (assumes normal distribution)

Full Output:

# JMH version: 1.22
# VM version: JDK 1.8.0_192, OpenJDK 64-Bit Server VM, 25.192-b01
# VM invoker: /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention

# Run progress: 0.00% complete, ETA 00:01:00
# Fork: 1 of 1
# Warmup Iteration   1: 8151895.492 ops/s
Iteration   1: 9710435.409 ops/s
Iteration   2: 9569245.825 ops/s
Iteration   3: 8736898.377 ops/s
Iteration   4: 9430753.815 ops/s
Iteration   5: 9641235.944 ops/s


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9417713.874 ±(99.9%) 1518796.826 ops/s [Average]
  (min, avg, max) = (8736898.377, 9417713.874, 9710435.409), stdev = 394426.713
  CI (99.9%): [7898917.047, 10936510.700] (assumes normal distribution)


# Run complete. Total time: 00:01:00

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt        Score         Error  Units
ContentionBenchmark.benchContention  thrpt    5  9417713.874 ± 1518796.826  ops/s

@CodingFabian
Copy link
Contributor Author

For your 2.10.2-SNAPSHOT change:

Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9537120.750 ±(99.9%) 200726.195 ops/s [Average]
  (min, avg, max) = (9468759.138, 9537120.750, 9593482.627), stdev = 52127.956
  CI (99.9%): [9336394.554, 9737846.945] (assumes normal distribution)

Full Output:

# JMH version: 1.22
# VM version: JDK 1.8.0_192, OpenJDK 64-Bit Server VM, 25.192-b01
# VM invoker: /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention

# Run progress: 0.00% complete, ETA 00:01:00
# Fork: 1 of 1
# Warmup Iteration   1: 7751979.386 ops/s
Iteration   1: 9468759.138 ops/s
Iteration   2: 9496704.427 ops/s
Iteration   3: 9565790.804 ops/s
Iteration   4: 9593482.627 ops/s
Iteration   5: 9560866.753 ops/s


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  9537120.750 ±(99.9%) 200726.195 ops/s [Average]
  (min, avg, max) = (9468759.138, 9537120.750, 9593482.627), stdev = 52127.956
  CI (99.9%): [9336394.554, 9737846.945] (assumes normal distribution)


# Run complete. Total time: 00:01:00

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt        Score        Error  Units
ContentionBenchmark.benchContention  thrpt    5  9537120.750 ± 200726.195  ops/s

@CodingFabian
Copy link
Contributor Author

For your 2.10.2-SNAPSHOT change + my CHM change

Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  10298811.073 ±(99.9%) 602139.284 ops/s [Average]
  (min, avg, max) = (10138689.645, 10298811.073, 10554062.979), stdev = 156373.660
  CI (99.9%): [9696671.790, 10900950.357] (assumes normal distribution)
# JMH version: 1.22
# VM version: JDK 1.8.0_192, OpenJDK 64-Bit Server VM, 25.192-b01
# VM invoker: /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 1 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 4 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention

# Run progress: 0.00% complete, ETA 00:01:00
# Fork: 1 of 1
# Warmup Iteration   1: 9178392.125 ops/s
Iteration   1: 10554062.979 ops/s
Iteration   2: 10290009.714 ops/s
Iteration   3: 10215555.017 ops/s
Iteration   4: 10138689.645 ops/s
Iteration   5: 10295738.012 ops/s


Result "com.fasterxml.jackson.benchmark.ContentionBenchmark.benchContention":
  10298811.073 ±(99.9%) 602139.284 ops/s [Average]
  (min, avg, max) = (10138689.645, 10298811.073, 10554062.979), stdev = 156373.660
  CI (99.9%): [9696671.790, 10900950.357] (assumes normal distribution)


# Run complete. Total time: 00:01:00

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                             Mode  Cnt         Score        Error  Units
ContentionBenchmark.benchContention  thrpt    5  10298811.073 ± 602139.284  ops/s

@CodingFabian
Copy link
Contributor Author

So even the worst result of the ConcurrentHashMap run is about 5% faster than the best run with synchronized

CodingFabian added a commit to CodingFabian/jackson-databind that referenced this issue Dec 1, 2019
@CodingFabian
Copy link
Contributor Author

PR is here: #2557

Thanks for a) your hard work on this framework b) for being so super reactive and c) for considering each and every of the many issues/prs that is reported.

@CodingFabian
Copy link
Contributor Author

Still looking for a better public API to bench this on, as the memory allocation for the result makes it a bit noisy

@cowtowncoder
Copy link
Member

Fix merged, made a slight change to avoid copying (and had to dig to remember why deserialization case needed non-null map in one edge case, wrt #1198).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants