Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka connect distributed #12960

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

niksaveliev
Copy link
Collaborator

@niksaveliev niksaveliev commented Dec 25, 2024

No description provided.

Copy link

github-actions bot commented Dec 25, 2024

2024-12-25 00:57:28 UTC Pre-commit check linux-x86_64-release-asan for ee21fbe has started.
2024-12-25 00:57:41 UTC Artifacts will be uploaded here
2024-12-25 01:00:09 UTC ya make is running...
🟢 2024-12-25 01:00:15 UTC Tests successful.

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
0 0 0 0 0 0

🟢 2024-12-25 01:00:20 UTC Build successful.

Copy link

github-actions bot commented Dec 25, 2024

2024-12-25 00:58:11 UTC Pre-commit check linux-x86_64-relwithdebinfo for ee21fbe has started.
2024-12-25 00:58:22 UTC Artifacts will be uploaded here
2024-12-25 01:00:48 UTC ya make is running...
🟢 2024-12-25 01:00:53 UTC Tests successful.

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
0 0 0 0 0 0

🟢 2024-12-25 01:00:59 UTC Build successful.

Copy link

github-actions bot commented Dec 25, 2024

2024-12-25 07:24:15 UTC Pre-commit check linux-x86_64-release-asan for e106f83 has started.
2024-12-25 07:24:28 UTC Artifacts will be uploaded here
2024-12-25 07:26:58 UTC ya make is running...
🟢 2024-12-25 07:27:04 UTC Tests successful.

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
0 0 0 0 0 0

🟢 2024-12-25 07:27:09 UTC Build successful.

Copy link

github-actions bot commented Dec 25, 2024

2024-12-25 07:26:28 UTC Pre-commit check linux-x86_64-relwithdebinfo for e106f83 has started.
2024-12-25 07:26:39 UTC Artifacts will be uploaded here
2024-12-25 07:28:59 UTC ya make is running...
🟢 2024-12-25 07:29:05 UTC Tests successful.

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
0 0 0 0 0 0

🟢 2024-12-25 07:29:10 UTC Build successful.



void TKafkaBalancerActor::Handle(NKqp::TEvKqp::TEvCreateSessionResponse::TPtr& ev, const TActorContext& ctx) {
const TString createSessionError = "Failed to create KQP session";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get error text also from Issues inside response.

bool TKqpTxHelper::HandleCreateSessionResponse(NKqp::TEvKqp::TEvCreateSessionResponse::TPtr& ev, const TActorContext&) {
const auto& record = ev->Get()->Record;

if (record.GetYdbStatus() != Ydb::StatusIds::SUCCESS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add logs here

void TKafkaBalancerActor::Handle(NKqp::TEvKqp::TEvQueryResponse::TPtr& ev, const TActorContext& ctx) {
const TString kqpQueryError = "KQP query error";
if (ev->Cookie != KqpReqCookie) {
KAFKA_LOG_CRIT("Unexpected cookie in TEvQueryResponse");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really crit? Is this situation possible?


const auto& record = ev->Get()->Record;
auto status = record.GetYdbStatus();
auto kafkaErr = KqpStatusToKafkaError(status);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill kqpQueryError from status issues and status

if (kafkaErr != EKafkaErrors::NONE_ERROR) {
switch (RequestType) {
case JOIN_GROUP:
SendJoinGroupResponseFail(ctx, CorellationId, kafkaErr, kqpQueryError);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this code. Make SendResponseFail() function and switch with RequestType inside. Otherwise there is copy-paste.

LIMIT 1;
)";

const TString INSERT_MEMBER_AND_SELECT_MASTER = R"(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In code you using INSERT_MEMBER_AND_SELECT_MASTER_QUERY. No _QUERY suffix here.
And you should check that visibility of transaction changes is enabled. What if min join-time in this request?

$Generation,
$MemberId,
CurrentUtcDateTime(),
CurrentUtcDateTime() + Interval("PT5S"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WTF? Make all dates parameters. No 5 seconds in YQL.

if (IsMaster) {
auto wakeup = std::make_unique<TEvents::TEvWakeup>(0);
ctx.ActorSystem()->Schedule(
TDuration::Seconds(WAKE_UP_DELAY_SECONDS),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why constant? Get this from client.


void TKafkaBalancerActor::Handle(NKqp::TEvKqp::TEvCreateSessionResponse::TPtr& ev, const TActorContext& ctx) {
const TString createSessionError = "Failed to create KQP session";
if (!Kqp->HandleCreateSessionResponse(ev, ctx)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Error is TLI may be you should restart action from begin? However response with error is possible solution. Make an issue for this.

Copy link

github-actions bot commented Feb 5, 2025

🔴 Unable to merge your PR into the base branch. Please rebase or merge it with the base branch.

DECLARE $Database AS Utf8;
DECLARE $Master AS Utf8;

INSERT INTO `/Root/.metadata/kafka_consumer_groups`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change /Root to correct database or remove it. Check how this is done in WriteSession in choosing partition for write.

$Generation,
$State,
$Database,
CurrentUtcDateTime(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move Timestamp into parameters please.

SET
state = $State,
generation = $Generation,
last_heartbeat_time = CurrentUtcDateTime(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All values must be in parameters.

if (status == Ydb::StatusIds::SUCCESS) {
return EKafkaErrors::NONE_ERROR;
}
return EKafkaErrors::UNKNOWN_SERVER_ERROR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А тут еще будут доработки? Или такие ошибки невозможно протранслировать в какие-то конкретные ощибки кафки?


void TKafkaBalancerActor::Handle(NMetadata::NProvider::TEvManagerPrepared::TPtr&, const TActorContext& ctx) {
TablesInited++;
if (TablesInited == 2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number here. Please change to constant value to improve code readability.

HandleHeartbeatResponse(ev, ctx);
break;
default:
KAFKA_LOG_CRIT("Unknown RequestType in TEvCreateSessionResponse");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should add this log line in every other switch case? To catch a situation, when we add new request type but forget to add it in some switch clause.

auto& record = ev->Get()->Record;
auto& resp = record.GetResponse();
if (resp.GetYdbResults().empty()) {
outGroupExists = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be return complex object with outGroupExists, outGeneration, outState, outMasterId and outTtl fields instead of affecting arguments? I feel like it is an antipattern (in java sure it is, but in c++ i'm not sure). Ready to discuss)

This new returning struct can have some meaningful name like GroupStatus (if i understood code correctly). It will vastly improve readability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself searching through code for what does state variable mean in this context and I couldn't find it. This new domain object (say GroupStatus) could also have comments for every field to simplify code search.


bool TKafkaBalancerActor::ParseAssignments(
NKqp::TEvKqp::TEvQueryResponse::TPtr ev,
TString& assignments)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. I'd suggest returning this assginemnts string as a return value of this function. Ready to discuss)

member.MemberId = mId;
member.MetaStr = meta;
member.Metadata = member.MetaStr;
TBuffer buffer(member.Metadata.value().data() + sizeof(TKafkaVersion), member.Metadata.value().size_bytes() - sizeof(TKafkaVersion));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this buffer is used here?

if (!ParseCheckStateAndGeneration(ev, groupExists, generation, state, masterId, groupTtl) ||
!groupExists || generation != GenerationId || state != GROUP_STATE_SYNC) { //
SendSyncGroupResponseFail(ctx, CorrelationId,
EKafkaErrors::UNKNOWN_SERVER_ERROR,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be INVALID_REQUEST is better here cause its error suggests user to check broker logs.

Comment on lines 710 to 735
KqpReqCookie++;
NYdb::TParamsBuilder params;
params.AddParam("$ConsumerGroup").Utf8(GroupId).Build();
params.AddParam("$Database").Utf8(Kqp->DataBase).Build();
params.AddParam("$Generation").Uint64(GenerationId).Build();
params.AddParam("$State").Uint64(GROUP_STATE_WORKING).Build();

if (SyncGroupRequestData->Assignments.size() == 0) {
SendSyncGroupResponseFail(ctx, CorrelationId, EKafkaErrors::INVALID_REQUEST);
PassAway();
return;
}

auto& assignmentList = params.AddParam("$Assignments").BeginList();
for (auto& assignment: SyncGroupRequestData->Assignments) {


assignmentList.AddListItem()
.BeginStruct()
.AddMember("MemberId").Utf8(assignment.MemberId.value())
.AddMember("Assignment").String(TString(assignment.Assignment.value().data(),
assignment.Assignment.value().size()))
.EndStruct();
}
assignmentList.EndList().Build();
Kqp->SendYqlRequest(UPSERT_ASSIGNMENTS_AND_SET_WORKING_STATE, params.Build(), KqpReqCookie, ctx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest extracting building of KQP request and sending in to KQP to the separate method (like SendUpsertAssignmentsAndSetWorkingStateKqpRequest) to improve readability.

*/

static const TString SUPPORTED_ASSIGN_STRATEGY = "roundrobin";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not to forget to deprecate this const before commit (or to support feature flag enabled migration)

ReadSessionActorId = ctx.RegisterWithSameMailbox(CreateKafkaReadSessionActor(Context, 0));

void HandleMessage(const TRequestHeaderData* header, const TMessagePtr<TJoinGroupRequestData>& message, const TActorContext& /*ctx*/) {
if (ReadSessionActorId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not to forget to add feature flag dependency here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants