Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: unsubscribe pub/sub connections after cluster migration #4529

Merged
merged 8 commits into from
Feb 24, 2025
Merged

Conversation

kostasrim
Copy link
Contributor

After a slot migration we should unsubscribe all connections from affected channels. This PR adds this support.

  • unsubscribe pub/sub connections for migrated slots
  • add test

resolves #4517

@kostasrim kostasrim self-assigned this Jan 29, 2025
@kostasrim kostasrim changed the base branch from main to kpr1 January 29, 2025 20:20
@kostasrim
Copy link
Contributor Author

Needs a little bit of clean up which I will handle tomorrow.

Comment on lines 629 to 631
auto* channel_store = ServerState::tlocal()->channel_store();
auto deleted = SlotSet(deleted_slots);
channel_store->UnsubscribeAfterClusterSlotMigration(deleted);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this inside DeleteSlots

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@@ -223,6 +223,16 @@ size_t ConnectionContext::UsedMemory() const {
return facade::ConnectionContext::UsedMemory() + dfly::HeapSize(conn_state);
}

void ConnectionContext::Unsubscribe(std::string_view channel) {
auto* sinfo = conn_state.subscribe_info.get();
sinfo->channels.erase(channel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add DCHECK result > 0

Comment on lines +356 to +374
cb.update_mu.lock();
auto* store = cb.most_recent.load(memory_order_relaxed);

// Deep copy, we will remove channels
auto* target = new ChannelStore::ChannelMap{*store->channels_};

for (auto key : ops_) {
auto it = target->find(key);
freelist_.push_back(it->second.Get());
target->erase(it);
continue;
}

// Prepare replacement.
auto* replacement = new ChannelStore{target, store->patterns_};

// Update control block and unlock it.
cb.most_recent.store(replacement, memory_order_relaxed);
cb.update_mu.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lock_guard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer using lock_guard with a scope {}. However, here I need to release the lock on line 367 but I need to use the variables defined in the lock scope store at the end of this function. Therefore, to avoid declaring those in an outerscope I rather do this manually via lock and unlock

// queued SubscribeMaps in the freelist are no longer in use.
shard_set->pool()->AwaitFiberOnAll([this](unsigned idx, util::ProactorBase*) {
ServerState::tlocal()->UnsubscribeSlotsAndUpdateChannelStore(
ops_, ChannelStore::control_block.most_recent.load(memory_order_relaxed));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to read ChannelStore::control_block.most_recent if you already have it a couple of lines above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this code should also be executed under lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to read ChannelStore::control_block.most_recent if you already have it a couple of lines above

This is how our channel store works -- via a ReadCopyUpdate operation. What that means in practice, is that a thread copies the channel store locally, updates its state and finally streamlines the new channel store to all proactors. That's why you see ChannelStore::control_block.most_recent.load(memory_order_relaxed).

We do not need any lock here. This is how it works. For more info see channel_store.h, it has a bunch of comments there. I agree it looks weird, but these are the semantics 🤷

Base automatically changed from kpr1 to main February 19, 2025 09:39
@kostasrim kostasrim merged commit 692967a into main Feb 24, 2025
15 checks passed
@kostasrim kostasrim deleted the kpr3 branch February 24, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evict subscribed sharded pub/sub on cluster migrations
2 participants