Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(java): jit support for chunk based map serialization #2027

Merged

Conversation

chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Jan 26, 2025

What does this PR do?

This PR added jit support for chunk based map serialization, it supports all kinds of map serializaiton by generated code:

  • final map key and value field type
  • polymorphic map key and value field type
  • nested map key and value type

This PR also removed the old map serialization protocol code.

The new chunk based protocol improve serialized size by 2.3X at most.

data:

stringMap: {"k1": "v1", "k2": "v2, ..., "k10": "v10" }
intMap: {1:2, 2:4, 3: 6, ..., 10: 20}

new protocol:

stringMapBytes 68
stringKVStructBytes 69
intMapBytes 28
intKVStructBytes 29

old protocol:

stringMapBytes 104
stringKVStructBytes 87
intMapBytes 64
intKVStructBytes 47

And improve performance by 20%

Related issues

Closes #925

#2025

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

chunk-jmh-result.csv
nochunk-jmh-result.csv

image

@chaokunyang chaokunyang marked this pull request as draft January 26, 2025 15:24
@chaokunyang chaokunyang marked this pull request as ready for review January 27, 2025 11:21
@chaokunyang chaokunyang force-pushed the jit_for_chunk_based_map_serialization branch from 2907327 to ca1483d Compare January 27, 2025 16:41
@chaokunyang
Copy link
Collaborator Author

For following struct with map field:

public class SimpleMapFieldStruct {
  public Map<String, String> map;
}

This PR will generate code like:

package org.apache.fury.serializer.collection;

import org.apache.fury.Fury;
import org.apache.fury.builder.Generated.GeneratedObjectSerializer;
import org.apache.fury.memory.MemoryBuffer;
import org.apache.fury.resolver.ClassInfo;
import org.apache.fury.resolver.ClassInfoHolder;
import org.apache.fury.resolver.ClassResolver;
import org.apache.fury.resolver.NoRefResolver;
import org.apache.fury.serializer.Serializer;
import org.apache.fury.serializer.StringSerializer;

public final class MapSerializersTest_SimpleMapFieldStructFuryCodec_0 extends GeneratedObjectSerializer {

  private final NoRefResolver refResolver;
  private final ClassResolver classResolver;
  private final StringSerializer strSerializer;
  private Fury fury;
  private ClassInfo mapClassInfo;
  private final StringSerializer stringSerializer;
  private final ClassInfoHolder map3ClassInfoHolder;

  public MapSerializersTest_SimpleMapFieldStructFuryCodec_0(Fury fury, Class classType) {
      super(fury, classType);
      this.fury = fury;
      fury.getClassResolver().setSerializerIfAbsent(classType, this);
  
      org.apache.fury.resolver.RefResolver refResolver0 = fury.getRefResolver();
      refResolver = ((NoRefResolver)refResolver0);
      classResolver = fury.getClassResolver();
      strSerializer = fury.getStringSerializer();
      mapClassInfo = classResolver.nilClassInfo();
      stringSerializer = ((StringSerializer)classResolver.getRawSerializer(java.lang.String.class));
      map3ClassInfoHolder = classResolver.nilClassInfoHolder();
  }

  private AbstractMapSerializer writeMapClassInfo(java.util.Map map1, MemoryBuffer memoryBuffer) {
      ClassResolver classResolver = this.classResolver;
      Class value = mapClassInfo.getCls();
      Class cls = map1.getClass();
      if ((value != cls)) {
          mapClassInfo = classResolver.getClassInfo(cls);
      }
      classResolver.writeClass(memoryBuffer, mapClassInfo);
      Serializer serializer = mapClassInfo.getSerializer();
      return ((AbstractMapSerializer)serializer);
  }

  private void writeFields(org.apache.fury.serializer.collection.MapSerializersTest.SimpleMapFieldStruct simpleMapFieldStruct1, MemoryBuffer memoryBuffer1) {
      StringSerializer stringSerializer = this.stringSerializer;
      java.util.Map map0 = simpleMapFieldStruct1.map;
      if ((map0 == null)) {
          memoryBuffer1.writeByte(((byte)-3));
      } else {
          memoryBuffer1.writeByte(((byte)0));
          AbstractMapSerializer abstractMapSerializer = this.writeMapClassInfo(map0, memoryBuffer1);
          if (abstractMapSerializer.supportCodegenHook()) {
              java.util.Map map2 = abstractMapSerializer.onMapWrite(memoryBuffer1, map0);
              if ((!map2.isEmpty())) {
                  java.util.Iterator iterator = map2.entrySet().iterator();
                  java.util.Map.Entry entry = (java.util.Map.Entry)iterator.next();
                  while ((entry != null)) {
                    entry = abstractMapSerializer.writeNullChunkKVNoRef(memoryBuffer1, entry, iterator, stringSerializer, stringSerializer);
                    if ((entry != null)) {
                        String key = (String)entry.getKey();
                        String value0 = (String)entry.getValue();
                        memoryBuffer1.writeInt16(((short)-1));
                        int chunkSizeOffset = memoryBuffer1.writerIndex() - 1;
                        memoryBuffer1.putByte((chunkSizeOffset - 1), 36);
                        int chunkSize = 0;
                        while (true) {
                          if (((key == null) || (value0 == null))) {
                              break;
                          }
                          stringSerializer.write(memoryBuffer1, key);
                          stringSerializer.write(memoryBuffer1, value0);
                          chunkSize = (chunkSize + 1);
                          if ((chunkSize == 255)) {
                              break;
                          }
                          if (iterator.hasNext()) {
                              entry = ((java.util.Map.Entry)iterator.next());
                              key = ((String)entry.getKey());
                              value0 = ((String)entry.getValue());
                          } else {
                              entry = null;
                              break;
                          }
                        }
                        memoryBuffer1.putByte(chunkSizeOffset, chunkSize);
                    }
                  }
              }
          } else {
              abstractMapSerializer.write(memoryBuffer1, map0);
          }
      }
  }

  private void readFields(MemoryBuffer memoryBuffer2, org.apache.fury.serializer.collection.MapSerializersTest.SimpleMapFieldStruct simpleMapFieldStruct2) {
      StringSerializer strSerializer = this.strSerializer;
      StringSerializer stringSerializer = this.stringSerializer;
      if ((memoryBuffer2.readByte() != ((byte)-3))) {
          Serializer serializer0 = classResolver.readClassInfo(memoryBuffer2, map3ClassInfoHolder).getSerializer();
          AbstractMapSerializer mapSerializer = (AbstractMapSerializer)serializer0;
          Object object1;
          if (mapSerializer.supportCodegenHook()) {
              java.util.Map map4 = mapSerializer.newMap(memoryBuffer2);
              int size = mapSerializer.getAndClearNumElements();
              if ((size == 0)) {
                  return;
              }
              int chunkHeader = memoryBuffer2.readUnsignedByte();
              while ((size > 0)) {
                long sizeAndHeader = mapSerializer.readJavaNullChunk(memoryBuffer2, map4, chunkHeader, size, stringSerializer, stringSerializer);
                chunkHeader = ((int)(sizeAndHeader & 255));
                size = ((int)(sizeAndHeader >>> 8));
                int chunkSize0 = memoryBuffer2.readUnsignedByte();
                for (int i = 0; i < chunkSize0; i+=1) {
                  String string = strSerializer.readCharsString(memoryBuffer2);
                  String string1 = strSerializer.readCharsString(memoryBuffer2);
                  map4.put(string, string1);
                  size = (size - 1);
                }
                if ((size > 0)) {
                    chunkHeader = memoryBuffer2.readUnsignedByte();
                }
              }
              object1 = mapSerializer.onMapRead(map4);
          } else {
              Object object = mapSerializer.read(memoryBuffer2);
              object1 = object;
          }
          
          simpleMapFieldStruct2.map = ((java.util.Map)object1);
      } else {
          simpleMapFieldStruct2.map = null;
      }
  }

  @Override public final void write(MemoryBuffer buffer, Object obj) {
      org.apache.fury.serializer.collection.MapSerializersTest.SimpleMapFieldStruct simpleMapFieldStruct3 = (org.apache.fury.serializer.collection.MapSerializersTest.SimpleMapFieldStruct)obj;
      this.writeFields(simpleMapFieldStruct3, buffer);
  }

  @Override public final Object read(MemoryBuffer buffer) {
      org.apache.fury.serializer.collection.MapSerializersTest.SimpleMapFieldStruct simpleMapFieldStruct4 = new org.apache.fury.serializer.collection.MapSerializersTest.SimpleMapFieldStruct();
      refResolver.reference(simpleMapFieldStruct4);
      this.readFields(buffer, simpleMapFieldStruct4);
      return simpleMapFieldStruct4;
  }

}

@theweipeng theweipeng self-requested a review January 28, 2025 16:14
@chaokunyang chaokunyang merged commit e952b63 into apache:main Jan 28, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java][Protocol] Chunk by chunk predictive map serialization protocol
3 participants