Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ElastiCache L2 RFC #464

Merged
merged 6 commits into from
Feb 27, 2023
Merged

Add ElastiCache L2 RFC #464

merged 6 commits into from
Feb 27, 2023

Conversation

dbartholomae
Copy link
Contributor

This is a request for comments about an RFC to add L2 constructs to ElastiCache. See #456 for
additional details.

APIs will be signed off by @corymhall.


By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license

Copy link
Contributor

@corymhall corymhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! I have a couple of questions just to clarify some
things, but I think the implementation looks pretty solid.

Can you also add a more advanced example for both types of replication groups?
Seeing how all the properties play together would help.

text/0456-elasticache-l2.md Outdated Show resolved Hide resolved
text/0456-elasticache-l2.md Show resolved Hide resolved

## Constructs

### Defining a RedisReplicationGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of things that I think we need to explore:

  • UseOnlineResharding seems to have a lot to take into consideration
    Should this be enabled by default? If so, how should we handle some of these?

    • NumNodeGroups & NodeGroupConfiguration
      The
      docs
      state that "As a best practice, when you create a replication group in a stack
      template, include an ID for each node group you specify". It seems like we
      should require the ID if they specify a configuration (or maybe we can
      generate one).

    • NodeGroupConfiguration.Slots
      "When you use an UseOnlineResharding update policy to update the number of node
      groups without interruption, ElastiCache evenly distributes the keyspaces
      between the specified number of slots. This cannot be updated later. Therefore,
      after updating the number of node groups in this way, you should remove the
      value specified for the Slots property of each NodeGroupConfiguration from the
      stack template, as it no longer reflects the actual values in each node group.
      For more information" docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this, I am missing deep enough understanding of ElastiCache to make the best trade-offs. E.g. for UseOnlineResharding, there is a warning that you should update NumNodeGroups and NodeGroupConfiguration only in isolation, which might not be obvious when using AWS CDK.

I agree on making ids required and will add it to the RFC. Seems to be a no-brainer.

The last point also speaks against setting UseOnlineResharding by default.

After looking into this, I would actually suggest that we make useOnlineResharding a required property for now. This way, introducing a default value in a later PR is a non-breaking change that can be done with close support from someone with deep ElastiCache understanding.
If you actually have deep enough ElastiCache understanding (or can bring in an expert who has), I'm also open to just implementing what you discussed. Though I would be prefer to not delay the implementation too much based on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that NodeGroupConfiguration currently is marked as a prop that is not part of the first implementation anyways, so I would only add the information on making the id required if we actually go deeper into this part anyways. Otherwise, I would leave this up for a future RFC or implementation discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior should be to use online resharding. It increases availability and allows you to easily add and remove shards from your cluster. Offline resharding is necessary when you want to do more than just add or remove shards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorser if this is enabled by default and someone who doesn't know about the details updates NumNodeGroups and other properties at the same time, what would happen? The docs caution against this, and I would aim for an L2 construct to be usable without deep understanding of the service details.
Similarly, would we need to handle NodeGroupConfiguration.Slots differently in that case to align with the recommended approach of deleting it from the template after initial deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to proceed on this discussion. @dorser @corymhall what do you think?

Copy link

@madolson madolson Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that by default you omit the node group configuration slots and you always set the UseOnlineResharding. This is a bit of legacy about our service, but we initially launched only with OfflineResharding. Customers cared way more about online resharding than they did about configuring slots, so when we added online resharding we didn't allow custom slot configurations. If someone REALLY wants to set slots, they can use the L1 constructs, but my guess is that is a very small number of users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbartholomae based on @madolson let's have UseOnlineResharding as the default.

text/0456-elasticache-l2.md Outdated Show resolved Hide resolved
text/0456-elasticache-l2.md Show resolved Hide resolved
@mergify mergify bot dismissed corymhall’s stale review November 16, 2022 18:53

Pull request has been modified.

@dbartholomae
Copy link
Contributor Author

@corymhall quick bump :)

Copy link
Contributor

@corymhall corymhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbartholomae sorry for the delay! Everything looks really good and I'm ready to approve. I'll send an email to the team notifying them that this has entered final comments period. If no blocking issues are raised I'll merge this next week.

@dbartholomae dbartholomae changed the title Add first RFC draft Add ElastiCache L2 RFC Dec 9, 2022
@dbartholomae
Copy link
Contributor Author

Thanks! I've fixed the markdown linting and reached out for community feedback both in the Slack and in the related issues.

A `RedisClusterReplicationGroup` is similar to a `RedisReplicationGroup`, but
with multiple shards. The documentation calls them "Redis (cluster mode enabled)".
The main difference is that a `RedisClusterReplicationGroup` requires a
`numNodeGroups` to be set to a value of 2 or higher. Only `RedisClusterReplicationGroup`s
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numNodeGroups can be set to 1 or higher. In CloudFormation it defaults to 1 and I think we should keep it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at 1, it is a RedisReplicationGroup, while at 2 or higher, it is a RedisClusterReplicationGroup, which accept different props. It doesn't make sense to allow 1 for a RedisClusterReplicationGroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't get me wrong, you most likely know way more about ElastiCache than me, I'm just trying to wrap my head around this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dorser @corymhall how do we proceed here? I currently see no need for change, but I might be missing something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RedisClusterReplicationGroup could have a single shard, it's allowed in the API. What is different is the API on the dataplane protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson are you agreeing with @dbartholomae then?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text/0456-elasticache-l2.md Show resolved Hide resolved
text/0456-elasticache-l2.md Show resolved Hide resolved
text/0456-elasticache-l2.md Outdated Show resolved Hide resolved

## Constructs

### Defining a RedisReplicationGroup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior should be to use online resharding. It increases availability and allows you to easily add and remove shards from your cluster. Offline resharding is necessary when you want to do more than just add or remove shards.

text/0456-elasticache-l2.md Show resolved Hide resolved
text/0456-elasticache-l2.md Show resolved Hide resolved
Copy link

@edisongustavo edisongustavo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be support for memcached too?

@dbartholomae
Copy link
Contributor Author

Will there be support for memcached too?

This RFC does not cover memcached, but should leave a way ahead to add memcached later on.

@dbartholomae
Copy link
Contributor Author

I've updated the RFC based on the comments. Since this has been a bit of time now for comments, I would like to get the RFC to a close. There are three open conversations left.

@corymhall
Copy link
Contributor

@dbartholomae great job on this! Sorry it took so long!!

@corymhall corymhall merged commit ad94145 into aws:master Feb 27, 2023
@dbartholomae dbartholomae deleted the patch-1 branch March 5, 2023 08:16
@dbartholomae
Copy link
Contributor Author

Thanks! I'll start working on an implementation maybe already this week :)

@evgenyka
Copy link

@dbartholomae how's going? Are you able to make any progress on this one?

@dbartholomae
Copy link
Contributor Author

@evgenyka Unfortunately I didn't get to start with this yet, as it isn't highest priority for me right now.

@evgenyka evgenyka added the l2-request request for new L2 construct label Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
l2-request request for new L2 construct
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants