-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: add hashcode cache in StructType #11764
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -723,6 +723,9 @@ public int hashCode() { | |
|
||
public static class StructType extends NestedType { | ||
private static final Joiner FIELD_SEP = Joiner.on(", "); | ||
private static final int NO_HASHCODE = Integer.MIN_VALUE; | ||
|
||
private transient int hashCode = NO_HASHCODE; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for making this transient, when I saw this PR that's the one aspect I wanted to make sure was the case. We really don't want any weird issues that will happen in a distributed setting where the cached hashcode on one struct type is different than the hashcode for the same struct type that's on a different node (which can easily happen since hashcodes across JVMS can be different) Making it transient will avoid all those kinds of issues. |
||
|
||
public static StructType of(NestedField... fields) { | ||
return of(Arrays.asList(fields)); | ||
|
@@ -824,7 +827,10 @@ public boolean equals(Object o) { | |
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(NestedField.class, Arrays.hashCode(fields)); | ||
if (hashCode == NO_HASHCODE) { | ||
hashCode = Objects.hash(NestedField.class, Arrays.hashCode(fields)); | ||
} | ||
return hashCode; | ||
Comment on lines
+830
to
+833
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this have a multi-threaded access ? if yes can we have double check locking ? to avoid recompute There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this scenario, there is no multi-threaded access, but the method I think the main purpose of this cache is to reduce a significant amount of redundant computation. Introducing additional complexity to completely avoid redundant computation might not be necessary, as even with multi-threaded access, the redundant computation would only occur a few times (up to the number of threads), which should be negligible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I don't think it's worth the complexity of double checked locking to avoid a little bit of redundant computation in the multi-threaded case. |
||
} | ||
|
||
private List<NestedField> lazyFieldList() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally going to suggest we follow the same pattern we do in
CharSequenceWrapper
where we have two fields, the hashCode and a booleanhashIsZero
flag. This way in case the hashCode is actually zero we don't recompute it.In the current implementation, we would recompute the hashCode if it's actually Integer.MIN_VALUE but arguably not worth the complexity to handle that for this case.