-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: add variant builder implementation #11857
base: main
Are you sure you want to change the base?
Conversation
f8f8e8f
to
d806c66
Compare
d806c66
to
16210a0
Compare
@rdblue, @RussellSpitzer Please help review. Thanks. |
16210a0
to
d4f1d0e
Compare
@@ -20,6 +20,10 @@ | |||
|
|||
/** An variant array value. */ | |||
public interface VariantArray extends VariantValue { | |||
default int numElements() { | |||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has VariantArray
been in a release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked. It's not in any release. What is the question for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been added so I just rebased so we don't need this anymore.
*/ | ||
package org.apache.iceberg.variants; | ||
|
||
public class VariantSizeLimitException extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an exception here? Do we need to limit the size of variants that are being constructed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had the discussion that we will not limit the size. Let me remove that.
private final VariantMetadata metadata; | ||
private final VariantValue value; | ||
|
||
public VariantImpl(byte[] metadata, byte[] value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variants should use ByteBuffer
to avoid copying problems from byte[]
. That's why the current implementation is built around ByteBuffer
.
*/ | ||
package org.apache.iceberg.variants; | ||
|
||
public class VariantConstants { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other constants in Variants
. Why add a new class for these? Should we standardize where they live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. You intend to place the constants in Variants? That works.
|
||
public static final int MAX_DECIMAL4_PRECISION = 9; | ||
public static final int MAX_DECIMAL8_PRECISION = 18; | ||
public static final int MAX_DECIMAL16_PRECISION = 38; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all of these constants need to be public? What is the intended use outside of this package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored to VariantBuilderBase since they are only used there.
Preconditions.checkArgument( | ||
json != null && !json.isEmpty(), "Input JSON string cannot be null or empty."); | ||
|
||
try (JsonParser parser = new JsonFactory().createParser(json)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are existing utilities in JSONUtil
for a lot of the boilerplate around parsing and generating JSON. Is there a reason not to use the same approach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to parse the JSON token by token so we can encode them in binary. Seems we only have the version to parse the JSON string into a full JSONNode and then create the object.
/** | ||
* Attempts to parse a JSON number as a decimal and write it. The input must meet the following | ||
* criteria: - Be in a valid decimal format (integer with an optional '.'). - Not in scientific | ||
* notation. - Fit within the precision and scale limits of decimal types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: lines seem to have been accidentally wrapped.
* | ||
* @param value The numeric value to append. | ||
*/ | ||
protected void writeNumericInternal(long value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this duplicate all of the serialization that is already implemented in PrimitiveWrapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added writePrimitive
() in ByteBufferWrapper
which will call the serialization in PrimitiveWrapper
.
dataSize, fieldOffsetSize, offsetStart + numElements * fieldOffsetSize); | ||
} | ||
|
||
protected ByteBufferWrapper getBuffer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Iceberg does not use get
in accessor names and discourages its use in general. If there's a more helpful verb then it is generally better to use it. Otherwise we usually omit get
.
* @param value The numeric value to write. | ||
* @param numBytes The number of bytes to write (e.g., 2 for short, 4 for int, 8 for long). | ||
*/ | ||
void writeLittleEndianUnsigned(long value, int numBytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These utilities already exist. Why add them a second time? Are the existing ones incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to work on the resizable ByteBuffer and just implement the logic. But should be able to reuse. Let me update.
9cba102
to
86bf18b
Compare
fa34686
to
3ad94d3
Compare
3ad94d3
to
38e6f2f
Compare
This PR adds the implementation to build a variant for primitive, object and array.
The usage:
Build from Json
VariantBuilder.parseJson(json_string);
build a primitive
Nested arrays and objects can be supported as well. Please see the test cases for the usage. Compared to build from JSON string, we also support some additional types such as timestamp, date.
Part of: #10392