Releases: avro-kotlin/avro4k
v2.1.1
What's Changed
- Add migration from AvroEncodeFormat.Binary example by @rutkowskij in #247
- deps: Upgrade to avro 1.11.4, resolving CVEs by @Chuckame in #266
New Contributors
- @rutkowskij made their first contribution in #247
Full Changelog: v2.1.0...v2.1.1
v2.1.0
What's Changed
- feat: Allow writing object-container files from blocking and async/suspend contexts by @Chuckame in #257
Experimental breaking change: inAvroObjectContainer
,encodeToStream
(including its extensions) has been replaced byopenWriter
which returns a writer to write elements to the container, instead of a Sequence which makes difficult to write from coroutines. Don't forget to close the stream to flush the buffered elements.
Full Changelog: v2.0.0...v2.1.0
v2.0.0
Introduction of v2
Back in the days, Avro4k has been created in 2019. During 5 years, a lot of work has been done greatly around avro generic records and generating schemas.
Recently, kotlinx-serialization and kotlin did big releases, improving a lot of stuff (features, performances, better APIs). The json API of kotlinx-serialization propose a great API, so we tried to replicate its simplicity.
A big focus has been done to make Avro4k more lenient to simplify devs' life and improve adoption.
I hope this major release will make Avro easier to use, even more in pure kotlin 🚀
As a side note, we may implement our own plugins to generate data classes and schemas, stay tuned !
Highlights and Breaking changes
Performances & benchmark
Long story
Well... Trying to make a similar benchmark is complicated, as the v2 adds a lot of features and fixes compared to v1.The following benchmark is not fully representative as it is not comparing all the features.
We will compare an easy use case: encoding and decoding a simple data class with all the primitive types, a String and a list of strings:
@Serializable
data class SimpleDataClass(
val bool: Boolean,
val byte: Byte,
val short: Short,
val int: Int,
val long: Long,
val float: Float,
val double: Double,
val string: String,
val bytes: ByteArray,
)
The benchmark has been executed on a Macbook air M2 in a mono-threaded environment.
Avro4k v2 (binary) is MUCH faster than v1 (generic records), and also now more performant than jackson and the standard apache avro (using reflection). Not tested for the moment with SpecificRecord.
Encoding Performance
Version | Encoding (ops/s) | Relative Difference (%) |
---|---|---|
Avro4k v1 (generic records) | 109 327 | 0% |
Jackson | 134 774 | +23% |
Avro4k v2 (generic records) | 190 365 | +74% |
Apache avro ReflectData (direct binary) | 332 438 | +204% |
Avro4k v2 (direct binary) | 459 751 | +321% 🚀 |
Decoding Performance
Version | Decoding (ops/s) | Relative Difference (%) |
---|---|---|
Avro4k v1 (generic records) | 67 825 | 0% |
Jackson | 71 146 | +5% |
Avro4k v2 (generic records) | 114 511 | +69% |
Apache avro ReflectData (direct binary) | 151 287 | +123% |
Avro4k v2 (direct binary) | 174 063 | +157% 🚀 |
Migration guide
As there is a lot of changed APIs, classes, packages, and more, here is the migration guide. Don't hesitate to file an issue if something is missing!
Needs Kotlin 2.0.0 and kotlinx.serialization 1.7.0
You need at least Kotlin 2.0.0 and kotlinx.serialization 1.7.0 to use Avro4k v2.0.0+ (version matrix is indicated in the README) as there is breaking changes in kotlinx-serialization plugin and library (released in tandem with kotlin version).
More information here: kotlinx-serialization v1.7.0
ExperimentalSerializationApi
Since the API deeply changed, all the new functions, properties, classes, annotations that are annotated with ExperimentalSerializationApi
will show you a warn as they could change at any moment. Those annotated members will be un-annotated after a few releases if they proved their stability 🪨
You can experience a lot of ExperimentalSerializationApi
warnings, as everything has been reworked. The common APIs may be stable more quickly, so they could be un-annotated in the next minor release. For the more complex or less used APIs, they could be un-annotated later.
To suppress this warning, you may opt-in the experimental serialization API. It is advised to not opt-in globally in the compiler arguments to avoid surprises when using experimental stuff 😅
Warning
Any API removal with ExperimentalSerializationApi
won't be considered as a breaking change regarding the semver standard, so given a version A.B.C
, only the minor B
number will be incremented, not the major A
.
Direct binary serialization
Before, serializing avro using Avro4k was done through a generic step, that converted first the data classes to generic maps, and then pass this generic data to the apache avro library.
Now, encoding to and decoding from binary is done directly, that improved a lot the performances (see Performances & benchmark section).
Note
We are still supporting the generic data serialization as long as there is a solution for kafka schema registry serialization (future avro4k module to be created), but it may be removed in the future to simplify the avro4k library as it is not really a serialization but more a conversion.
Support anything to encode and decode at root level
Before, we were only able to encode and decode GenericRecord
. No primitive, no arrays, no value class, just generic records.
Now, no need to wrap your value in a record, you can serialize nearly everything and generate the corresponding schema!
This includes any data class, enum, sealed interface or class, value class, primitive values or contextual serializers 🚀
Totally new API
The previous API needed to well understand how to use it, especially when playing with InputStream and OutputStream.
There is now different entrypoints for different purposes:
Avro
: the main entrypoint to generate schemas, encode and decode in the avro format. This is the pure raw avro format without anything else around it.AvroObjectContainer
: the entrypoint to encode avro data files, following the official spec, and usingAvro
for each value serialization.AvroSingleObject
: the entrypoint for encoding a single object prefixed with the schema fingerprint, following the official spec, and also usingAvro
for value serialization.
Warning
Avro.encodeToByteArray
is now encoding in pure binary avro. If you still need to encode in the object container format as the v1 (in the DATA format), you have to use AvroObjectContainer
Implicit nulls by default
Previously, when a nullable field was missing from the writer schema while decoding, then a failure happened.
Now, it decodes null
and is not failing for all the nullable fields. To opt-out this feature, configure your Avro
instance with implicitNulls = false
.
It has been enabled by default to simplify the use of Avro4k and make it more lenient for a better adoption.
Implicit empty maps, collections and arrays by default
Previously, when a map or collection-like field was missing from the writer schema while decoding, then a failure happened.
Now, it decodes an empty collection and is not failing (an empty map, list, array or set depending on the field type). To opt-out this feature, configure your Avro
instance with implicitEmptyCollections = false
.
It has been enabled by default to simplify the use of Avro4k and make it more lenient for a better adoption.
Lenient
The apache avro library is strict regarding the types and strongly follow the avro spec. As an example, a float in kotlin can be written as a float, while being decoded as a float and a double.
Avro4k is pushing the lenience where a float can be written and read as a float, a double, a string, an int and a long in avro.
A type matrix has been written inside README
.
No more reflection
Thanks to this little change,
Absolutely no more reflection, so that allows you to use android or GraalVM AOT native compilation (not tested, but should work, let us know!).
Unified & cleaned annotations
AvroJsonProp
has been merged toAvroProp
: the json content is automatically detected, so any non-json content is handled as a stringAvroAliases
has been merged toAvroAlias
: there is now avarags
to pass as many aliases as you want using the same annotationAvroInline
has been removed in favor of kotlin nativevalue class
AvroEnumDefault
is now to be applied directly on the default enum memberScalePrecision
has been renamed toAvroDecimal
to keep and unify to a common prefix. Also, thedecimal
'sscale
andprecision
do not have defaults anymoreAvroNamespace
andAvroName
has been replaced by the native kotlinx-serializationSerialName
annotationAvroStringable
has been added to easily for a field type to be inferred as a string (this is working for all the primitive types and the built-in logical types)AvroFixed
is now only applying on compatible types (ByteArray, String, decimal logical type), annotating other types will just do nothing
Only ByteArray is now handled as BYTES
Previously, all the collections-like of bytes were handled as BYTES.
Now, only ByteArray is handled as BYTES, and the other collections-like of bytes...
v2.0.0-RC7
What's Changed
- Add dependabot by @Chuckame in #227
- docs: Fix avro version in docs by @Chuckame in #226
- handle nullable bytearrays and add null values in benchmark by @Chuckame in #228
- feat: Add duration logical type by @Chuckame in #233
- fix: Only handle ByteArrays as bytes or fixed, and collection of Byte as arrays of int by @Chuckame in #234
- fix: No more automatic padding for fixed type by @Chuckame in #235
- fix: Update docs by @Chuckame in #237
- feat: Add @AvroStringable by @Chuckame in #236
- feat: Remove AvroDecimal defaults by @Chuckame in #238
Full Changelog: v2.0.0-RC6...v2.0.0-RC7
v2.0.0-RC6
What's Changed
- fix: Removed AvroNamespaceOverride as it was not fully implemented by @Chuckame in #224
- Improve benchmark by @Chuckame in #225
Full Changelog: v2.0.0-RC5...v2.0.0-RC6
v2.0.0-RC5
v2.0.0-RC2
Introduction of v2
Back in the days, Avro4k has been created in 2019. During 5 years, a lot of work has been done greatly around avro generic records and generating schemas.
Recently, kotlinx-serialization and kotlin did big releases, improving a lot of stuff (features, performances, better APIs). The json API of kotlinx-serialization propose a great API, so we tried to replicate its simplicity.
A big focus has been done to make Avro4k more lenient to simplify devs' life and improve adoption.
I hope this major release will make Avro easier to use, even more in pure kotlin 🚀
As a side note, we may implement our own plugins to generate data classes and schemas, stay tuned !
Highlights and Breaking changes
Needs Kotlin 2.0.0 and kotlinx.serialization 1.7.0-RC
You need at least Kotlin 2.0.0 and kotlinx.serialization 1.7.0-RC to use Avro4k v2 (version matrix is indicated in the README) as there is breaking changes in kotlinx-serialization plugin and library (released in tandem with kotlin version).
More information here: kotlinx-serialization v1.7.0-RC
ExperimentalSerializationApi
Since the API deeply changed, all the new functions, properties, classes, annotations that are annotated with ExperimentalSerializationApi
will show you a warn as they could change at any moment. Those annotated members will be un-annotated after a few releases if they proved their stability 🪨
To suppress this warning, you may opt-in the experimental serialization API. It is advised to not opt-in globally in the compiler arguments to avoid surprises when using experimental stuff 😅
Direct binary serialization
Before, serializing avro using Avro4k was done through a generic step, that converted first the data classes to generic maps, and then pass this generic data to the apache avro library.
Now, encoding to and decoding from binary is done directly, that improved a lot the performances (see Performances & benchmark section).
Note
We are still supporting the generic data serialization as long as there is a solution for kafka schema registry serialization (future avro4k module to be created), but it will be removed in the future to simplify the avro4k library as it is not really a serialization but more a conversion.
Support anything to encode and decode at root level
Now, no need to wrap your value in a record, you can serialize nearly everything and generate the corresponding schema!
This includes any data class, enum, sealed interface, value class, primitive or contextual values 🚀
Totally new API
The previous API needed to well understand how to use it, especially when playing with InputStream and OutputStream.
There is now different entrypoints for different purposes:
Avro
: the main entrypoint to generate schemas, encode and decode avro format. This is the pure raw avro format without anything elseAvroObjectContainerFile
: the entrypoint to encode avro data files, following the official spec, and usingAvro
for each value serialization.AvroSingleObject
: the entrypoint for encoding a single object prefixed with the schema fingerprint, following the official spec, and also usingAvro
for value serialization.
Here are some examples of the changes:
Pure avro serialization (no specific format, no prefix, no magic byte, just pure avro binary)
// Previously
val bytes = Avro.default.encodeToByteArray(TheDataClass.serializer(), TheDataClass(...))
Avro.default.decodeFromByteArray(TheDataClass.serializer(), bytes)
// Now
val bytes = Avro.encodeToByteArray(TheDataClass(...))
Avro.decodeFromByteArray<TheDataClass>(bytes)
generic data serialization (convert a kotlin data class to a GenericRecord to then be handled by a `GenericDatumWriter` in avro)
// Previously
val genericRecord: GenericRecord = Avro.default.toRecord(TheDataClass.serializer(), TheDataClass(...))
Avro.default.fromRecord(TheDataClass.serializer(), genericRecord)
// Now
val genericData: Any? = Avro.encodeToGenericData(TheDataClass(...))
Avro.decodeFromGenericData<TheDataClass>(genericData)
Configure the `Avro` instance
// Previously
val avro = Avro(
AvroConfiguration(
namingStrategy = FieldNamingStrategy.SnackCase,
implicitNulls = true,
),
SerializersModule {
contextual(CustomSerializer())
}
)
// Now
val avro = Avro {
namingStrategy = FieldNamingStrategy.SnackCase
implicitNulls = true
serializersModule = SerializersModule {
contextual(CustomSerializer())
}
}
Changing the name of a record
// Previously
@AvroName("TheName")
@AvroNamespace("a.custom.namespace")
data class TheDataClass(...)
// Now
@SerialName("a.custom.namespace.TheName")
data class TheDataClass(...)
Writing an avro object container file with a custom field naming strategy
// Previously
Files.newOutputStream(Path("/your/file.avro")).use { outputStream ->
Avro(AvroConfiguration(namingStrategy = SnakeCaseNamingStrategy))
.openOutputStream(TheDataClass.serializer()) { encodeFormat = AvroEncodeFormat.Data(CodecFactory.snappyCodec()) }
.to(outputStream)
.write(TheDataClass(...))
.write(TheDataClass(...))
.write(TheDataClass(...))
.close()
}
// Now
val dataSequence = sequenceOf(
TheDataClass(...),
TheDataClass(...),
TheDataClass(...),
)
val avro = Avro { fieldNamingStrategy = FieldNamingStrategy.SnakeCase }
Files.newOutputStream(Path("/your/file.avro")).use { outputStream ->
AvroObjectContainerFile(avro)
.encodeToStream(dataSequence, outputStream) {
codec(CodecFactory.snappyCodec())
// you can also add your metadata !
metadata("myProp", 1234L)
metadata("a string metadata", "hello")
}
}
Warning
Migration guide: WIP
Implicit nulls by default
Previously, when nothing were decoded for a nullable field was failing.
Now, it decodes null
and is not failing. To opt-out this feature, configure your Avro
instance with implicitNulls = false
.
It has been enabled by default to simplify the use of Avro4k and make it
Lenient
The apache avro library is strict regarding the types and strongly follow the avro spec. An example is that a float in kotlin can be written and read as a float and a double in avro.
Avro4k is pushing the lenience where a float can be written and read as a float, a double, a string, an int and a long in avro.
A type matrix has been written inside README
.
No more reflection
Thanks to this little change,
Absolutely no more reflection, so that allows using android or GraalVM AOT native compilation (need kotlinx-serialization 1.7.0).
Unified & cleaned annotations
Some numbers: 4 annotations has been removed over 12!
AvroJsonProp
has been merged toAvroProp
: the json content is automatically detected, so any non-json content is handled as a stringAvroAliases
has been merged toAvroAlias
: there is now avarags
to pass as many aliases as you want using the same annotationAvroInline
has been removed in favor of kotlin nativevalue class
AvroEnumDefault
is now to be applied directly on the default enum memberScalePrecision
has been renamed toAvroDecimal
to keep a common prefixAvroNamespace
andAvroName
has been replaced by the native kotlinx-serializationSerialName
annotationAvroNamespaceOverride
has been created to allow replacing the namespace of a field schema (⚠️ this annotation is not stable and can disappear at any moment)
Caching
All schemas are cached using WeakIdentityHashMap
to allow the GC to remove the cache entries in case of low available memory.
Also some other internal expensive parts are cached for quicker encoding and decoding.
Performances & benchmark
Warning
WIP
What's Changed
- fix: Assume kotlin.Pair as a normal data class instead of an union by @Chuckame in #174
- feat!: No more reflection and customizable logical types by @Chuckame in #175
- feat: Add support for decoding with avro aliases by @Chuckame in #177
- Generalize encoding/decoding tests (#168) by @Chuckame in #179
- chore: Add spotless with ktlint + editorconfig by @Chuckame in #180
- feat: Support kotlin's value classes by @Chuckame in #183
- feat: Revamp naming strategy and related annotations by @Chuckame in #182
- feat: Merge ScalePrecision to AvroDecimalLogicalType by @Chuckame in #191
- chore: Upgrade github actions and use standard gradle actions by @Chuckame in #192
- feat: revamp the schema generation by @Chuckame in #190
- feat: New Avro entrypoint by @Chuckame in...
v1.10.1
What's Changed
- fix(annotations): Set the
@Language
value toJSON
by @Chuckame in #157 - feat(aliases): Merge AvroAliases annotation to AvroAlias by @Chuckame in #156
- Updated apache org.apache.avro:avro to resolve CVE-2023-39410 by @TNijman1990 in #172
- chore(build): Replace buildSrc by gradle's versionCatalogs by @Chuckame in #173
- Added support for decoding with avro alias by @trdw in #171
- Generalize encoding/decoding tests by @thake in #168
- fix: Allow encoding null array items or null map values by @Chuckame in #197
New Contributors
- @TNijman1990 made their first contribution in #172
- @trdw made their first contribution in #171
Full Changelog: v1.10.0...v1.10.1
v1.10.0
v1.9.0
What's Changed
- feat: Set default to null when the field type is nullable (activable by the configuration) by @Chuckame in #140
- Bump snappy-java version by @AdamBlance in #146
New Contributors
- @AdamBlance made their first contribution in #146
Full Changelog: v1.8.0...v1.9.0