Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reaper endpoints: Async Repair Endpoint #358

Merged
merged 17 commits into from
Aug 31, 2023

Conversation

Miles-Garnsey
Copy link
Member

@Miles-Garnsey Miles-Garnsey commented Aug 22, 2023

This PR modifies the existing repair method in NodeOpsProvider to accept arguments suitable for Reaper. When complete, it should also create a new v2 endpoint to accept new repairs and return a repairID which can later be queried for status updates.

Fixes issue
https://github.com/riptano/mission-control/issues/506

@github-actions
Copy link

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

@Miles-Garnsey
Copy link
Member Author

Miles-Garnsey commented Aug 22, 2023

@emerkle826 this is draft still, but I have a few questions and thought we might best collaborate with some concrete code to look at.

Questions:

  1. Can the management API RPC mechanism handle polymporphic functions? We might need one which accepts a RingRange and one which accepts beginToken and endToken arguments, since you can repair non-contiguous ranges via the former. I know we've encountered issues with this in the past with the OpenAPI generator, so wanted to see what the best way to tackle it was.
  2. Can you pass custom types into the RPC methods? The current methods seem to use only simple native Java types. Do they need to be registered as UDTs to make this work, or can you use Java types directly?
  3. If we can't use polymorphism here to adapt to which signature we want to call, would it be reasonable to pass nulls instead?

The only place this matters is in the definition of the ranges, since the existing function just takes a start and end token, while the new function needs to take a list of start and end tokens (which is realised as a RingRange, but we could potentially use a List<Tuple> or List<List<BigInteger>> in the worst case).

@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from 975be5d to ecb59f1 Compare August 22, 2023 08:04
@emerkle826
Copy link
Contributor

  1. Can the management API RPC mechanism handle polymporphic functions? We might need one which accepts a RingRange and one which accepts beginToken and endToken arguments, since you can repair non-contiguous ranges via the former. I know we've encountered issues with this in the past with the OpenAPI generator, so wanted to see what the best way to tackle it was.

The RPC calls don't handle polymorphism (see here). RPC Methods have to have a unique RPC name, so we can't even register the same method with 2 different RPC names to handle a polymorphic situation.

@emerkle826
Copy link
Contributor

2. Can you pass custom types into the RPC methods? The current methods seem to use only simple native Java types. Do they need to be registered as UDTs to make this work, or can you use Java types directly?

I think we can, but I'm not sure. To do so may require enhancements to some of the RPC serialization code.

@emerkle826
Copy link
Contributor

3. If we can't use polymorphism here to adapt to which signature we want to call, would it be reasonable to pass nulls instead?

Yes, but I think in this case it might be better to have the API side of things determine which Range type is being requested and then build a specific RPC call for a RingRange or a start/end token. Unless I'm missing something, it seems like we will have 2 RPC methods on the Agent side, and the API/server side will handle the request and determine which of the RPC methods to call.

Comment on lines 16 to 18
private final BigInteger start;
private final BigInteger end;

Copy link
Contributor

@emerkle826 emerkle826 Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue:
Something @olim7t just figured out is that model objects on the Agent side of things likely have to have their attributes as public in order for the Object serialization to work. See the code here:
https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/feature/get-tables/management-api-agent-3.x/src/main/java/com/datastax/mgmtapi/rpc/ObjectSerializer3x.java#L61

@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from ecb59f1 to 30e2606 Compare August 23, 2023 07:31
@Miles-Garnsey
Copy link
Member Author

Got confused about where the beginToken/endToken vs RingRange thing was happening. It was in Reaper, not mgmt API. So we can stick with having a single repair method. That solves one problem.

I've revamped the existing RPC method, and I have the old Resource calling it now.

@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from 30e2606 to 3c9aa74 Compare August 23, 2023 08:50
@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from 3c9aa74 to d82219b Compare August 23, 2023 08:55
@Miles-Garnsey
Copy link
Member Author

@emerkle826 can you give me a hand with this when you're in? I'm getting the below statement about illegally formatted files from CI:

Error:  To fix formatting errors, run "mvn com.coveo:fmt-maven-plugin:format"
Error:  Non complying file: /home/runner/work/management-api-for-apache-cassandra/management-api-for-apache-cassandra/management-api-agent-common/src/main/java/com/datastax/mgmtapi/rpc/models/RingRange.java
Error:  Non complying file: /home/runner/work/management-api-for-apache-cassandra/management-api-for-apache-cassandra/management-api-agent-common/src/main/java/com/datastax/mgmtapi/NodeOpsProvider.java
Error:  Failed to execute goal com.coveo:fmt-maven-plugin:2.9:check (default) on project datastax-mgmtapi-agent-common: Found 2 non-complying files, failing build -> [Help 1]
Error:  

But when I run the suggested command it doesn't appear to change any files, or come up with any further information about what is wrong with the formatting. Do you know what I'm supposed to do with this?

@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch 2 times, most recently from a59b202 to d631cf6 Compare August 24, 2023 01:03
@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch 2 times, most recently from 282f382 to b1bd97f Compare August 24, 2023 07:12
@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from b1bd97f to da8b860 Compare August 24, 2023 07:14
@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch 5 times, most recently from 91a928a to 17d298f Compare August 24, 2023 08:52
@Miles-Garnsey
Copy link
Member Author

Miles-Garnsey commented Aug 25, 2023

@olim7t can Erik and I get some feedback from you on the topic of Optionals? The context is that in the NodeOpsProvider, we want to avoid adding keys to the repair options map under several circumstances:

  1. When the option is passed in as null.
  2. When the option is an empty collection.
  3. When the option is the zero value of the type.

This is because we want the NodeOpsProvider logic to service v0-v3 endpoints, and some options are unavailable in the v0/v1 endpoints. We want to avoid adding entries with null values, empty collections, etc. into the options map in this case since we don't know how Cassandra will interpret them.

To facilitate this, I have the new options being passed as Optional<> to the NodeOpsProvider right now, and in the resources layers we will munge the data to retain existing logic and pass empty Optionals when the endpoint doesn't support those options.

But Erik has rightly noted that the calls over the RPC channel aren't type checked, so wrapping things in Optional<> may not be hugely valuable. He would prefer to keep null checks inside the NodeOpsProvider and consolidate emptiness into null values in the resources layer.

Neither of us have a strong feeling on this - do you have any thoughts?

@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from f47f74a to ce1b848 Compare August 25, 2023 04:07
@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from ce1b848 to f29c16b Compare August 25, 2023 04:08
@RpcParam(name = "full") Boolean full,
@RpcParam(name = "notifications") boolean notifications)
@RpcParam(name = "notifications") boolean notifications,
@RpcParam(name = "repairParallelism") Optional<RepairParallelism> repairParallelism,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue:
There are 2 issues here:

  • RepairParallelism and Optional are not included in the various com.datastax.mgmtapi.rpc.GenericSerializer classes
  • RepairParallelism is a class that exists in Cassandra (org.apache.cassandra.repair.RepairParallelism) and in the Management API server code (com.datastax.mgmtapi.resources.v2.models.RepairParallelism)

We could add the classes to the GenericSerializers, but that's a slippery slope. Right now, it supports Java primitives and basic collection types.

The second issue is more of a confusing thing. The v2 model object looks a lot like the Cassandra object, but they are not the same. And in the Resource code, we are not sending the Cassandra object, we are sending the V2 model object. So this will never deserialize correctly since we are expecting the Cassandra object in this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second issue is more of a confusing thing. The v2 model object looks a lot like the Cassandra object, but they are not the same. And in the Resource code, we are not sending the Cassandra object, we are sending the V2 model object. So this will never deserialize correctly since we are expecting the Cassandra object in this method.

Woops... Sorry, I should have spotted that one :(

Wouldn't we be better off adding a generic serializer for the collections interface to make things more generic? I have to be honest, I do not like the way management API is not using the type system very much throughout. I understand some of that is unavoidable due to the way the RPC calls work, but if we can't even serialise our own types, that is unfortunate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I need to understand why we aren't just serializing straight to json or using some off the shelf serialization here - was there a reason for writing our own serializer?

@RpcParam(name = "notifications") boolean notifications,
@RpcParam(name = "repairParallelism") Optional<RepairParallelism> repairParallelism,
@RpcParam(name = "datacenters") Optional<Collection<String>> datacenters,
@RpcParam(name = "associatedTokens") Optional<List<RingRange>> associatedTokens,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue:
We likely have the same GenericSerialization issue for RingRanges as we do for RepairParallelism above

@olim7t
Copy link
Contributor

olim7t commented Aug 25, 2023

We want to avoid adding entries with null values, empty collections, etc. into the options map in this case since we don't know how Cassandra will interpret them.

I haven't checked every implementation, but looking at 3.x's RepairOption.parse, it looks pretty safe. It's just looking up known options in the map, if there are a few extra unknown ones, they should be ignored.

@emerkle826 emerkle826 force-pushed the feature/repair-endpoint branch from 4aef006 to 1c018cc Compare August 25, 2023 20:31
@Miles-Garnsey
Copy link
Member Author

I haven't checked every implementation, but looking at 3.x's RepairOption.parse, it looks pretty safe. It's just looking up known options in the map, if there are a few extra unknown ones, they should be ignored.

It isn't so much about unknown options, it is more about setting the option values to null or empty collections. @olim7t

@Miles-Garnsey Miles-Garnsey force-pushed the feature/repair-endpoint branch from 1c018cc to 16f701a Compare August 28, 2023 04:10
Miles-Garnsey and others added 2 commits August 30, 2023 11:42
… serialization (#366)

* Remove Optional and other objects not handled by RPC serialization

---------

Co-authored-by: Erik Merkle <[email protected]>
* Add cancelAllRepairs endpoint and requisite NodeOps.

---------

Co-authored-by: Erik Merkle <[email protected]>
@Miles-Garnsey Miles-Garnsey marked this pull request as ready for review August 31, 2023 00:12
Copy link
Contributor

@emerkle826 emerkle826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for merge

@Miles-Garnsey Miles-Garnsey merged commit 148df59 into feature/client-gen Aug 31, 2023
@Miles-Garnsey Miles-Garnsey deleted the feature/repair-endpoint branch August 31, 2023 01:44
emerkle826 pushed a commit that referenced this pull request Sep 15, 2023
* Allow `repair()` RPC method to take more arguments (as required by Reaper). Get the old NodeOpsResources class calling the new RPC method signature.

---------

Co-authored-by: Erik Merkle <[email protected]>

* Cancel repairs endpoint (#368)

* Add cancelAllRepairs endpoint and requisite NodeOps.
emerkle826 pushed a commit that referenced this pull request Sep 15, 2023
* Allow `repair()` RPC method to take more arguments (as required by Reaper). Get the old NodeOpsResources class calling the new RPC method signature.

---------

Co-authored-by: Erik Merkle <[email protected]>

* Cancel repairs endpoint (#368)

* Add cancelAllRepairs endpoint and requisite NodeOps.
@Miles-Garnsey
Copy link
Member Author

This fixed #405

emerkle826 pushed a commit that referenced this pull request Oct 12, 2023
* Allow `repair()` RPC method to take more arguments (as required by Reaper). Get the old NodeOpsResources class calling the new RPC method signature.

---------

Co-authored-by: Erik Merkle <[email protected]>

* Cancel repairs endpoint (#368)

* Add cancelAllRepairs endpoint and requisite NodeOps.
emerkle826 pushed a commit that referenced this pull request Oct 12, 2023
* Allow `repair()` RPC method to take more arguments (as required by Reaper). Get the old NodeOpsResources class calling the new RPC method signature.

---------

Co-authored-by: Erik Merkle <[email protected]>

* Cancel repairs endpoint (#368)

* Add cancelAllRepairs endpoint and requisite NodeOps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants