-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ElastiCache L2 RFC #464
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great! I have a couple of questions just to clarify some
things, but I think the implementation looks pretty solid.
Can you also add a more advanced example for both types of replication groups?
Seeing how all the properties play together would help.
|
||
## Constructs | ||
|
||
### Defining a RedisReplicationGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of things that I think we need to explore:
-
UseOnlineResharding
seems to have a lot to take into consideration
Should this be enabled by default? If so, how should we handle some of these?-
NumNodeGroups
&NodeGroupConfiguration
The
docs
state that "As a best practice, when you create a replication group in a stack
template, include an ID for each node group you specify". It seems like we
should require the ID if they specify a configuration (or maybe we can
generate one). -
NodeGroupConfiguration.Slots
"When you use an UseOnlineResharding update policy to update the number of node
groups without interruption, ElastiCache evenly distributes the keyspaces
between the specified number of slots. This cannot be updated later. Therefore,
after updating the number of node groups in this way, you should remove the
value specified for the Slots property of each NodeGroupConfiguration from the
stack template, as it no longer reflects the actual values in each node group.
For more information" docs
-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this, I am missing deep enough understanding of ElastiCache to make the best trade-offs. E.g. for UseOnlineResharding
, there is a warning that you should update NumNodeGroups
and NodeGroupConfiguration
only in isolation, which might not be obvious when using AWS CDK.
I agree on making ids required and will add it to the RFC. Seems to be a no-brainer.
The last point also speaks against setting UseOnlineResharding
by default.
After looking into this, I would actually suggest that we make useOnlineResharding
a required property for now. This way, introducing a default value in a later PR is a non-breaking change that can be done with close support from someone with deep ElastiCache understanding.
If you actually have deep enough ElastiCache understanding (or can bring in an expert who has), I'm also open to just implementing what you discussed. Though I would be prefer to not delay the implementation too much based on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw that NodeGroupConfiguration
currently is marked as a prop that is not part of the first implementation anyways, so I would only add the information on making the id required if we actually go deeper into this part anyways. Otherwise, I would leave this up for a future RFC or implementation discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior should be to use online resharding. It increases availability and allows you to easily add and remove shards from your cluster. Offline resharding is necessary when you want to do more than just add or remove shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dorser if this is enabled by default and someone who doesn't know about the details updates NumNodeGroups and other properties at the same time, what would happen? The docs caution against this, and I would aim for an L2 construct to be usable without deep understanding of the service details.
Similarly, would we need to handle NodeGroupConfiguration.Slots
differently in that case to align with the recommended approach of deleting it from the template after initial deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to proceed on this discussion. @dorser @corymhall what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that by default you omit the node group configuration slots and you always set the UseOnlineResharding. This is a bit of legacy about our service, but we initially launched only with OfflineResharding
. Customers cared way more about online resharding than they did about configuring slots, so when we added online resharding we didn't allow custom slot configurations. If someone REALLY wants to set slots, they can use the L1 constructs, but my guess is that is a very small number of users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbartholomae based on @madolson let's have UseOnlineResharding
as the default.
Pull request has been modified.
@corymhall quick bump :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbartholomae sorry for the delay! Everything looks really good and I'm ready to approve. I'll send an email to the team notifying them that this has entered final comments period. If no blocking issues are raised I'll merge this next week.
Thanks! I've fixed the markdown linting and reached out for community feedback both in the Slack and in the related issues. |
A `RedisClusterReplicationGroup` is similar to a `RedisReplicationGroup`, but | ||
with multiple shards. The documentation calls them "Redis (cluster mode enabled)". | ||
The main difference is that a `RedisClusterReplicationGroup` requires a | ||
`numNodeGroups` to be set to a value of 2 or higher. Only `RedisClusterReplicationGroup`s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numNodeGroups
can be set to 1 or higher. In CloudFormation it defaults to 1 and I think we should keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But at 1, it is a RedisReplicationGroup
, while at 2
or higher, it is a RedisClusterReplicationGroup
, which accept different props. It doesn't make sense to allow 1
for a RedisClusterReplicationGroup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't get me wrong, you most likely know way more about ElastiCache than me, I'm just trying to wrap my head around this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dorser @corymhall how do we proceed here? I currently see no need for change, but I might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RedisClusterReplicationGroup could have a single shard, it's allowed in the API. What is different is the API on the dataplane protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madolson are you agreeing with @dbartholomae then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corymhall Yes
|
||
## Constructs | ||
|
||
### Defining a RedisReplicationGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior should be to use online resharding. It increases availability and allows you to easily add and remove shards from your cluster. Offline resharding is necessary when you want to do more than just add or remove shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be support for memcached too?
This RFC does not cover memcached, but should leave a way ahead to add memcached later on. |
I've updated the RFC based on the comments. Since this has been a bit of time now for comments, I would like to get the RFC to a close. There are three open conversations left. |
@dbartholomae great job on this! Sorry it took so long!! |
Thanks! I'll start working on an implementation maybe already this week :) |
@dbartholomae how's going? Are you able to make any progress on this one? |
@evgenyka Unfortunately I didn't get to start with this yet, as it isn't highest priority for me right now. |
This is a request for comments about an RFC to add L2 constructs to ElastiCache. See #456 for
additional details.
APIs will be signed off by @corymhall.
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license