Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TTL support for compressedBucketWrite fn #2930

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jacoblukose
Copy link

@jacoblukose jacoblukose commented Sep 30, 2024

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

(#200 - Link your issue number here: You can write "Fixes #XXX". Please use the proper keyword so that the issue gets closed automatically. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
Any of the following keywords can be used: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved)

Description

This PR enables to initialize ZkBucketDataAccessor class with option to configure TTL on znodes created. To support the same, underlying base ZkBaseDataAccessor class methods are modified to accept ttl as a param.

  • Added ability to initialized ZkBucketDataAccessor with TTL configured
  • If TTL enabled, for zk operations ZkBucketDataAccessor will use PERSISTENT_WITH_TTL accessoption.
  • Updated ZkBaseDataAccessor class to accept ttl as a param in doUpdate(), setChildren() methods.

Tests

  • Unit test added for changes made to ZkBaseDataAccessor class
  • Unit test added for changes made to ZkBucketDataAccessor class

Changes that Break Backward Compatibility (Optional)

This shouldn't break existing func.

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@jacoblukose jacoblukose marked this pull request as ready for review September 30, 2024 16:47
Copy link
Contributor

@junkaixue junkaixue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests

@jacoblukose jacoblukose marked this pull request as draft October 1, 2024 06:48
@jacoblukose
Copy link
Author

Please add tests

@junkaixue regarding tests, as the change is related to enablement of native ttl func while znode creation, I dont think the ZkTestBase based inmemory zk can be used to test the same as it doesnt support TTL is what I understand from reading the code. If so, is the advice for the changes to write an integration test? cc: @GrantPSpencer

@junkaixue
Copy link
Contributor

You can start a local zk instead of inmemory zk. At least, my bottom line is to have a unit test.

@jacoblukose
Copy link
Author

You can start a local zk instead of inmemory zk. At least, my bottom line is to have a unit test.

There is already a unit test file TestZkBucketDataAccessor but its leveraging ZkTestBase which starts in memory zk. I will need guidance on how to write unit test given the in memory zk doesnt support ttl.

@@ -206,7 +213,7 @@ public <T extends HelixProperty> boolean compressedBucketWrite(String rootPath,
buckets.add(binaryMetadata);

// Do an async set to ZK
boolean[] success = _zkBaseDataAccessor.setChildren(paths, buckets, AccessOption.PERSISTENT);
boolean[] success = _zkBaseDataAccessor.setChildren(paths, buckets, AccessOption.PERSISTENT, _znodeTTLms);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we using ttl, accessOption also need to be updated to use PERSISTENT_WITH_TTL

@jacoblukose jacoblukose marked this pull request as ready for review October 9, 2024 16:30
@jacoblukose
Copy link
Author

@junkaixue @GrantPSpencer could you folks help do a second pass.

thanks!

@jacoblukose
Copy link
Author

@junkaixue @GrantPSpencer could you folks help review thanks again!

@GrantPSpencer
Copy link
Contributor

My comments have been addressed 👍
Will leave it to committers for approval now
As a side note, your PR CI may intermittently fail due to flaky tests in the CI. You can see a list of these in the issues tab . Here's an example: #2945
If this occurs, note the flaky test that failed in a comment

@jacoblukose
Copy link
Author

jacoblukose commented Oct 29, 2024

Test which failed is a flaky test: #2795.

From https://github.com/apache/helix/actions/runs/11349924518/job/32019361349?pr=2930

Run .github/scripts/printTestResult.sh
[info] ./helix-core/target/surefire-reports/TestSuite.txt: Tests run: 1449, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 6,802.569 s <<< FAILURE! - in TestSuite
Error:  Test failed: testInconsistentStateEventProcessing(org.apache.helix.integration.spectator.TestRoutingTableProviderFromCurrentStates)  Time elapsed: 7.092 s  <<< FAILURE!
[info] ./zookeeper-api/target/surefire-reports/TestSuite.txt: Tests run: 99, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 231.62 s - in TestSuite
[info] ./helix-common/target/surefire-reports/TestSuite.txt: Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.287 s - in TestSuite
[info] ./metrics-common/target/surefire-reports/TestSuite.txt: Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.275 s - in TestSuite
[info] ./metadata-store-directory-common/target/surefire-reports/TestSuite.txt: Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.932 s - in TestSuite
[info] ./meta-client/target/surefire-reports/TestSuite.txt: Tests run: 71, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 675.25 s - in TestSuite

@jacoblukose
Copy link
Author

@junkaixue @xyuanlu please review/approve. all comments are addressed.

* ttl is only used when creating new znode, hence if znode is already created with a ttl, further
* update operations will not update the znode ttl even if ttl is provided in the options
*/
public AccessResult doUpdate(String path, DataUpdater<T> updater, int options, long ttl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally helper functions with this naming convention should be private.

Copy link
Author

@jacoblukose jacoblukose Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked other fns (ex: doCreate, doSet and also the other invocation of doUpdate) and they are having public as the scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally if the fuc is not used outside class, we would set as private.
Also more importantly, it is a bit odd to allow user to set ttl but not actually update ttl in update. If user provide a new ttl value and the node already exit, this function will return RetCode.OK without actually change anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I said earlier there are several instances of doCreate, doSet, doUpdate which are declared public. I followed the similar convention. I am not familiar with the helix code base, this is my first PR. If you can guide me what exact change to make here, happy to correct it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on @xyuanlu.

I can understand TTL for create. But why we need TTL for update? The right operation is to let the user delete the node and recreate with a TTL.

I am not confortable with this feature. Before we have strong justification. I dont think we should let this in.

@jacoblukose It would be good to figure out why you would like to support this.

Copy link
Contributor

@xyuanlu xyuanlu Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this might not be the best approach, but I am OK as long as @junkaixue feels OK. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not buying this. As I said, when a node is created, we should not update its property and change it from a persistent node to TTL node.

If you need to do this, then it should be the application logic to make that smoothly migrate from Non TTL to TTL. Also, since this is the API support Helix. I dont see any use cases in HELIX to leverage this.

Accessor is Helix internal thing. If you would like to leverage for company specific feature and not necessary help Helix, please fork the repo and implement it internally.

Copy link
Author

@jacoblukose jacoblukose Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xyuanlu thanks for reponding and on the ack. It will be helpful if you can also callout alternative approach here to improve the craft if you thinks thats seen comprised. Happy to correct it.

@junkaixue thanks for responding, would you be kind enough to highlight the code part where the said concern: I am not buying this. As I said, when a node is created, we should not update its property and change it from a persistent node to TTL node. is. doUpdate fn is not going to update ttl for an existing node, only when underlying create is called, it leverage ttl passed if any. And also, if you see, public facing doUpdate doesnt have ttl option exposed. It would be great if you can suggest an alternative on how this need to handled as well.

Also I think there is some misunderstanding, the usecase is not to change znodes created with persistent mode to migrate to use ttl mode. Idea is to just expand the scope of ZkBucketDataAccessor to leverage zk ttl feature when the same is being initialized.

Copy link
Contributor

@xyuanlu xyuanlu Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I think more time is needed when design the ZK update API to keep consistency. I dont have an solid idea in mind how. But I think the fundamental rule is keep the behavior simple, consistent and not ambiguous.

  • The Update API has a TTL as parameter, but the updated ZNode may or may not have the passed in TTL value, is not a consistent behavior. However, I don't have an idea of how to achieve this, I think this is the reason why we don't have update with TTL in the first place. More effort is needed when design this - if ever needed.

Copy link
Author

@jacoblukose jacoblukose Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @xyuanlu the only reason why doUpdate need a ttl value to be passed in is because the current code is using a doCreate fn inside the doUpdate. If there is no doCreate call made within doUpdate then there is no need to pass in the ttl as well, reason being, ttl can only be enforced during creation of znode.

the related q is, is it a correct behaviour on the existing doUpdate fn code to call a doCreate internally? If not, this should be removed, and application need to handle it by issuing an explicit doCreate call. if its correct, then comes to the challenge on how to pass in ttl value to the internal doCreate fn inside doUpdate

@@ -233,7 +241,7 @@ public <T extends HelixProperty> boolean compressedBucketWrite(String rootPath,
}
};
if (!_zkBaseDataAccessor.update(rootPath + "/" + LAST_SUCCESSFUL_WRITE_KEY,
Copy link
Author

@jacoblukose jacoblukose Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note(myself): I think this update() calls also need to be changed to either create if first time and then update for later updates. Or move to doUpdate fn with ttl support, based on how the PR discussion concludes.

// 1. Increment lastWriteVersion using DataUpdater
ZkBaseDataAccessor.AccessResult result = _zkBaseDataAccessor.doUpdate(
rootPath + "/" + LAST_WRITE_KEY, lastWriteVersionUpdater, AccessOption.PERSISTENT);
Copy link
Author

@jacoblukose jacoblukose Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note(myself): based on how pr discussion concludes, we might need to use explicit create for first time and then update calls from second time onwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants