Skip to content

Commit

Permalink
docs(java): update java serialization schema compatibility doc (#2047)
Browse files Browse the repository at this point in the history
## What does this PR do?

update java serialization schema compatibility doc

## Related issues

<!--
Is there any related issue? Please attach here.

- #xxxx0
- #xxxx1
- #xxxx2
-->

## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fury/issues/new/choose) describing the
need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
  • Loading branch information
chaokunyang authored Feb 7, 2025
1 parent 5ca0c49 commit 1bb794b
Showing 1 changed file with 124 additions and 63 deletions.
187 changes: 124 additions & 63 deletions docs/guide/java_serialization_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ public class Example {
| `compressLong` | Enables or disables long compression for smaller size. | `true` |
| `compressString` | Enables or disables string compression for smaller size. | `false` |
| `classLoader` | The classloader should not be updated; Fury caches class metadata. Use `LoaderBinding` or `ThreadSafeFury` for classloader updates. | `Thread.currentThread().getContextClassLoader()` |
| `compatibleMode` | Type forward/backward compatibility config. Also Related to `checkClassVersion` config. `SCHEMA_CONSISTENT`: Class schema must be consistent between serialization peer and deserialization peer. `COMPATIBLE`: Class schema can be different between serialization peer and deserialization peer. They can add/delete fields independently. [See more](#class-inconsistency-and-class-version-check). | `CompatibleMode.SCHEMA_CONSISTENT` |
| `compatibleMode` | Type forward/backward compatibility config. Also Related to `checkClassVersion` config. `SCHEMA_CONSISTENT`: Class schema must be consistent between serialization peer and deserialization peer. `COMPATIBLE`: Class schema can be different between serialization peer and deserialization peer. They can add/delete fields independently. [See more](#class-inconsistency-and-class-version-check). | `CompatibleMode.SCHEMA_CONSISTENT` |
| `checkClassVersion` | Determines whether to check the consistency of the class schema. If enabled, Fury checks, writes, and checks consistency using the `classVersionHash`. It will be automatically disabled when `CompatibleMode#COMPATIBLE` is enabled. Disabling is not recommended unless you can ensure the class won't evolve. | `false` |
| `checkJdkClassSerializable` | Enables or disables checking of `Serializable` interface for classes under `java.*`. If a class under `java.*` is not `Serializable`, Fury will throw an `UnsupportedOperationException`. | `true` |
| `registerGuavaTypes` | Whether to pre-register Guava types such as `RegularImmutableMap`/`RegularImmutableList`. These types are not public API, but seem pretty stable. | `true` |
Expand All @@ -125,7 +125,7 @@ public class Example {
Single thread fury:

```java
Fury fury=Fury.builder()
Fury fury = Fury.builder()
.withLanguage(Language.JAVA)
// enable reference tracking for shared/circular reference.
// Disable it will have better performance if no duplicate reference.
Expand All @@ -137,14 +137,14 @@ Fury fury=Fury.builder()
// enable async multi-threaded compilation.
.withAsyncCompilation(true)
.build();
byte[]bytes=fury.serialize(object);
System.out.println(fury.deserialize(bytes));
byte[] bytes = fury.serialize(object);
System.out.println(fury.deserialize(bytes));
```

Thread-safe fury:

```java
ThreadSafeFury fury=Fury.builder()
ThreadSafeFury fury = Fury.builder()
.withLanguage(Language.JAVA)
// enable reference tracking for shared/circular reference.
// Disable it will have better performance if no duplicate reference.
Expand All @@ -160,10 +160,45 @@ ThreadSafeFury fury=Fury.builder()
// enable async multi-threaded compilation.
.withAsyncCompilation(true)
.buildThreadSafeFury();
byte[]bytes=fury.serialize(object);
System.out.println(fury.deserialize(bytes));
byte[] bytes = fury.serialize(object);
System.out.println(fury.deserialize(bytes));
```

### Handling Class Schema Evolution in Serialization

In many systems, the schema of a class used for serialization may change over time. For instance, fields within a class
may be added or removed. When serialization and deserialization processes use different versions of jars, the schema of
the class being deserialized may differ from the one used during serialization.

By default, Fury serializes objects using the `CompatibleMode.SCHEMA_CONSISTENT` mode. This mode assumes that the
deserialization process uses the same class schema as the serialization process, minimizing payload overhead.
However, if there is a schema inconsistency, deserialization will fail.

If the schema is expected to change, to make deserialization succeed, i.e. schema forward/backward compatibility.
Users must configure Fury to use `CompatibleMode.COMPATIBLE`. This can be done using the
`FuryBuilder#withCompatibleMode(CompatibleMode.COMPATIBLE)` method.
In this compatible mode, deserialization can handle schema changes such as missing or extra fields, allowing it to
succeed even when the serialization and deserialization processes have different class schemas.

Here is an example of creating Fury to support schema evolution:

```java
Fury fury = Fury.builder()
.withCompatibleMode(CompatibleMode.COMPATIBLE)
.build();

byte[] bytes = fury.serialize(object);
System.out.println(fury.deserialize(bytes));
```

This compatible mode involves serializing class metadata into the serialized output. Despite Fury's use of
sophisticated compression techniques to minimize overhead, there is still some additional space cost associated with
class metadata.

To further reduce metadata costs, Fury introduces a class metadata sharing mechanism, which allows the metadata to be
sent to the deserialization process only once. For more details, please refer to the [Meta Sharing](#MetaSharing)
section.

### Smaller size

`FuryBuilder#withIntCompressed`/`FuryBuilder#withLongCompressed` can be used to compress int/long for smaller size.
Expand All @@ -184,9 +219,9 @@ For long compression, fury support two encoding:
- Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |`
- Fury PVL(Progressive Variable-length Long) Encoding:
- First bit in every byte indicate whether has next byte. if first bit is set, then next byte will be read util
first bit of next byte is unset.
first bit of next byte is unset.
- Negative number will be converted to positive number by `(v << 1) ^ (v >> 63)` to reduce cost of small negative
numbers.
numbers.

If a number are `long` type, it can't be represented by smaller bytes mostly, the compression won't get good enough
result,
Expand All @@ -199,22 +234,18 @@ space savings.
Deep copy example:

```java
Fury fury=Fury.builder()
...
.withRefCopy(true).build();
SomeClass a=xxx;
SomeClass copied=fury.copy(a)
Fury fury = Fury.builder().withRefCopy(true).build();
SomeClass a = xxx;
SomeClass copied = fury.copy(a);
```

Make fury deep copy ignore circular and shared reference, this deep copy mode will ignore circular and shared reference.
Same reference of an object graph will be copied into different objects in one `Fury#copy`.

```java
Fury fury=Fury.builder()
...
.withRefCopy(false).build();
SomeClass a=xxx;
SomeClass copied=fury.copy(a)
Fury fury = Fury.builder().withRefCopy(false).build();
SomeClass a = xxx;
SomeClass copied = fury.copy(a);
```

### Implement a customized serializer
Expand Down Expand Up @@ -257,8 +288,8 @@ class FooSerializer extends Serializer<Foo> {
Register serializer:

```java
Fury fury=getFury();
fury.registerSerializer(Foo.class,new FooSerializer(fury));
Fury fury = getFury();
fury.registerSerializer(Foo.class, new FooSerializer(fury));
```

### Security & Class Registration
Expand All @@ -279,9 +310,9 @@ Note that class registration order is important, serialization and deserializati
should have same registration order.

```java
Fury fury=xxx;
fury.register(SomeClass.class);
fury.register(SomeClass1.class,200);
Fury fury = xxx;
fury.register(SomeClass.class);
fury.register(SomeClass1.class,200);
```

If you invoke `FuryBuilder#requireClassRegistration(false)` to disable class registration check,
Expand All @@ -290,19 +321,20 @@ allowed
for serialization. For example, you can allow classes started with `org.example.*` by:

```java
Fury fury=xxx;
fury.getClassResolver().setClassChecker((classResolver,className)->className.startsWith("org.example."));
Fury fury = xxx;
fury.getClassResolver().setClassChecker(
(classResolver, className) -> className.startsWith("org.example."));
```

```java
AllowListChecker checker=new AllowListChecker(AllowListChecker.CheckLevel.STRICT);
ThreadSafeFury fury=new ThreadLocalFury(classLoader->{
Fury f=Fury.builder().requireClassRegistration(true).withClassLoader(classLoader).build();
AllowListChecker checker = new AllowListChecker(AllowListChecker.CheckLevel.STRICT);
ThreadSafeFury fury = new ThreadLocalFury(classLoader -> {
Fury f = Fury.builder().requireClassRegistration(true).withClassLoader(classLoader).build();
f.getClassResolver().setClassChecker(checker);
checker.addListener(f.getClassResolver());
return f;
});
checker.allowClass("org.example.*");
});
checker.allowClass("org.example.*");
```

Fury also provided a `org.apache.fury.resolver.AllowListChecker` which is allowed/disallowed list based checker to
Expand Down Expand Up @@ -360,30 +392,30 @@ forward/backward compatibility automatically.
// // share meta across serialization.
// .withMetaContextShare(true)
// Not thread-safe fury.
MetaContext context=xxx;
fury.getSerializationContext().setMetaContext(context);
byte[]bytes=fury.serialize(o);
MetaContext context = xxx;
fury.getSerializationContext().setMetaContext(context);
byte[] bytes = fury.serialize(o);
// Not thread-safe fury.
MetaContext context=xxx;
fury.getSerializationContext().setMetaContext(context);
fury.deserialize(bytes)
MetaContext context = xxx;
fury.getSerializationContext().setMetaContext(context);
fury.deserialize(bytes);

// Thread-safe fury
fury.setClassLoader(beanA.getClass().getClassLoader());
byte[]serialized=fury.execute(
f->{
f.getSerializationContext().setMetaContext(context);
return f.serialize(beanA);
fury.setClassLoader(beanA.getClass().getClassLoader());
byte[] serialized = fury.execute(
f -> {
f.getSerializationContext().setMetaContext(context);
return f.serialize(beanA);
}
);
);
// thread-safe fury
fury.setClassLoader(beanA.getClass().getClassLoader());
Object newObj=fury.execute(
f->{
f.getSerializationContext().setMetaContext(context);
return f.deserialize(serialized);
fury.setClassLoader(beanA.getClass().getClassLoader());
Object newObj = fury.execute(
f -> {
f.getSerializationContext().setMetaContext(context);
return f.deserialize(serialized);
}
);
);
```

### Deserialize non-existent classes
Expand All @@ -404,10 +436,10 @@ Fury support mapping object from one type to another type.
> Notes:
>
> 1. This mapping will execute a deep copy, all mapped fields are serialized into binary and
deserialized from that binary to map into another type.
deserialized from that binary to map into another type.
> 2. All struct types must be registered with same ID, otherwise Fury can not mapping to correct struct type.
> Be careful when you use `Fury#register(Class)`, because fury will allocate an auto-grown ID which might be
> inconsistent if you register classes with different order between Fury instance.
> Be careful when you use `Fury#register(Class)`, because fury will allocate an auto-grown ID which might be
> inconsistent if you register classes with different order between Fury instance.
```java
public class StructMappingExample {
Expand Down Expand Up @@ -460,12 +492,12 @@ the binary are generated by jdk serialization, you use following pattern to make
then upgrade serialization to fury in an async rolling-up way:

```java
if(JavaSerializer.serializedByJDK(bytes)){
if (JavaSerializer.serializedByJDK(bytes)) {
ObjectInputStream objectInputStream=xxx;
return objectInputStream.readObject();
}else{
} else {
return fury.deserialize(bytes);
}
}
```

### Upgrade fury
Expand All @@ -482,18 +514,18 @@ serialized data
using code like following to keep binary compatibility:

```java
MemoryBuffer buffer=xxx;
buffer.writeVarInt32(2);
fury.serialize(buffer,obj);
MemoryBuffer buffer = xxx;
buffer.writeVarInt32(2);
fury.serialize(buffer, obj);
```

Then for deserialization, you need:

```java
MemoryBuffer buffer=xxx;
int furyVersion=buffer.readVarInt32()
Fury fury=getFury(furyVersion);
fury.deserialize(buffer);
MemoryBuffer buffer = xxx;
int furyVersion = buffer.readVarInt32();
Fury fury = getFury(furyVersion);
fury.deserialize(buffer);
```

`getFury` is a method to load corresponding fury, you can shade and relocate different version of fury to different
Expand All @@ -520,9 +552,38 @@ consistent between serialization and deserialization.

### Deserialize POJO into another type

Fury allows you to serialize one POJO and deserialize it into a different POJO. To achieve this, configure Fury with
Fury allows you to serialize one POJO and deserialize it into a different POJO. The different POJO means the schema inconsistency. Users must to configure Fury with
`CompatibleMode` set to `org.apache.fury.config.CompatibleMode.COMPATIBLE`.

```java
public class DeserializeIntoType {
static class Struct1 {
int f1;
String f2;

public Struct1(int f1, String f2) {
this.f1 = f1;
this.f2 = f2;
}
}

static class Struct2 {
int f1;
String f2;
double f3;
}

static ThreadSafeFury fury = Fury.builder()
.withCompatibleMode(CompatibleMode.COMPATIBLE).buildThreadSafeFury();

public static void main(String[] args) {
Struct1 struct1 = new Struct1(10, "abc");
byte[] data = fury.serializeJavaObject(struct1);
Struct2 struct2 = (Struct2) fury.deserializeJavaObject(bytes, Struct2.class);
}
}
```

### Use wrong API for deserialization

If you serialize an object by invoking `Fury#serialize`, you should invoke `Fury#deserialize` for deserialization
Expand Down

0 comments on commit 1bb794b

Please sign in to comment.