-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue-2969: add hive reconnect time counter #2970
base: main
Are you sure you want to change the base?
Conversation
} else if (msg->ClientId == HiveClientId && ReconnectStartTime) | ||
{ | ||
if (ReconnectPipeCounter) { | ||
ReconnectPipeCounter->Add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
этот счётчик показывает суммарное время всех реконнектов к Hive?
точно ли это полезная информация?
может считать среднее время реконнекта?
или перцентили
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Реконнект всего один(hive-то один и pipe client по факту тоже только один) при разрыве. Мы откладываем время только после восстановления соединения. Не вижу смысла в гистограммах или перцентилях пока
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Но по факту да, суммарное время реконнектов за 15 секунд
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Нам это нужно что тыкать этим в YDB, когда hive слишком долго поднимается. Обсуждали с ними, такой метрики им достаточно
@@ -307,6 +340,12 @@ void THiveProxyActor::HandleConnectionError( | |||
|
|||
for (const auto& actorId: states->Actors) { | |||
auto clientId = ClientCache->Prepare(ctx, hive); | |||
if (!HiveClientId) { | |||
HiveClientId = clientId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
зачем нужен HiveClientId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Чтобы понять что текущий сдох после разрыва соединения. Судя по коду UnboundClientCache если клиент получает сообщение о разьединении то его просто убивают
@@ -177,14 +177,28 @@ class THiveProxyActor final | |||
|
|||
const ui64 TenantHiveTabletId; | |||
|
|||
const NMonitoring::TDynamicCounterPtr CountersRoot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
просто Counters
@@ -177,14 +177,28 @@ class THiveProxyActor final | |||
|
|||
const ui64 TenantHiveTabletId; | |||
|
|||
const NMonitoring::TDynamicCounterPtr CountersRoot; | |||
NMonitoring::TDynamicCounters::TCounterPtr ReconnectPipeCounter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveReconnectTimeCounter
@@ -177,14 +177,28 @@ class THiveProxyActor final | |||
|
|||
const ui64 TenantHiveTabletId; | |||
|
|||
const NMonitoring::TDynamicCounterPtr CountersRoot; | |||
NMonitoring::TDynamicCounters::TCounterPtr ReconnectPipeCounter; | |||
ui64 ReconnectStartTime = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveReconnectStartTime
@@ -216,6 +239,13 @@ void THiveProxyActor::HandleConnect( | |||
auto error = MakeKikimrError(msg->Status, TStringBuilder() | |||
<< "Connect to hive " << hive << " failed"); | |||
HandleConnectionError(ctx, error, hive, true); | |||
} else if (msg->ClientId == HiveClientId && ReconnectStartTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
зачем понадобился HiveClientId, почему не хватает проверки ReconnectStartTime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Чтобы подстраховаться от возможных гонок
namespace NCloud::NStorage { | ||
|
||
//////////////////////////////////////////////////////////////////////////////// | ||
|
||
NActors::IActorPtr CreateHiveProxy(THiveProxyConfig config); | ||
NActors::IActorPtr CreateHiveProxy( | ||
THiveProxyConfig config, | ||
NMonitoring::TDynamicCounterPtr CountersRoot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counters
if (config.FallbackMode) { | ||
return std::make_unique<THiveProxyFallbackActor>(std::move(config)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
копипаста
HiveReconnectTimeCounter->Add( | ||
CyclesToDuration(GetCycleCount() - HiveReconnectStartTime).MicroSeconds()); | ||
} | ||
HiveReconnectStartTime = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIveReconnectStartCycles
for (int retries = 0; retries < 5 && !hiveLockRequests; ++retries) { | ||
// Pipe to hive may take a long time to connect | ||
// Wait until hive receives the lock request | ||
runtime.DispatchEvents(TDispatchOptions(), TDuration::Seconds(1)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
лучше вечный цикл
#2969