Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoscaler interval log #2009

Merged
merged 93 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
118eacd
autoscaler interval log
RonShvarz Sep 16, 2024
4465ff3
updated gitflow deprecated download-artifact@v2
RonShvarz Sep 16, 2024
f2fd79c
added actual task # added
RonShvarz Sep 18, 2024
371cf31
task executor log message
Adir111 Sep 19, 2024
e19cd25
decorator for logs
RonShvarz Sep 19, 2024
9a18fe5
Merge branch 'streaming-logs-scaleupspeed' of https://github.com/kube…
RonShvarz Sep 19, 2024
d273dc3
added algo name + count of algos.
Adir111 Sep 19, 2024
0958ac4
Merge branch 'streaming-logs-scaleupspeed' of https://github.com/kube…
Adir111 Sep 19, 2024
2ce2b7d
removed redundant logging
Adir111 Sep 19, 2024
2ed33d8
fixed total calculation
Adir111 Sep 19, 2024
3353686
removed underscore
Adir111 Sep 19, 2024
2fe9c2a
total sum
Adir111 Sep 19, 2024
7713088
removed total algorithms logging
Adir111 Sep 19, 2024
d35b4df
log required cutoff by maxReplicasPerTick
RonShvarz Sep 26, 2024
01e3acd
Merge branch 'streaming-logs-scaleupspeed' of https://github.com/kube…
RonShvarz Sep 26, 2024
70eef09
round trip based secondary scale
RonShvarz Oct 7, 2024
e864e5f
use round trip and not duration.
RonShvarz Oct 9, 2024
0cdd16f
wait for both round trip and reqrate
RonShvarz Oct 9, 2024
6c4e8ba
scaling logic changed
Adir111 Oct 9, 2024
f321fe3
init value
Adir111 Oct 10, 2024
8eb42d9
updated scale logic
Adir111 Oct 10, 2024
6681108
removed unused
Adir111 Oct 10, 2024
dd35564
no need to scale if value remained the same
Adir111 Oct 10, 2024
eb6758f
changed to array
Adir111 Oct 10, 2024
3c48e2e
updated scaling logic
Adir111 Oct 13, 2024
216d509
logging has been added again
Adir111 Oct 13, 2024
6b69b81
added scaling to 0
Adir111 Oct 13, 2024
e34ace3
fixed 0 not being proccessed
Adir111 Oct 13, 2024
b2f15f3
added doc
Adir111 Oct 13, 2024
5c5da31
corrected if
Adir111 Oct 13, 2024
910fe21
refactor
Adir111 Oct 13, 2024
ca4f04d
clean-up un-used codes
Adir111 Oct 13, 2024
66d9681
jsdoc added
Adir111 Oct 13, 2024
4b6fa1b
unused code
Adir111 Oct 13, 2024
fd6785e
unused code
Adir111 Oct 13, 2024
0d84181
updated tests
Adir111 Oct 14, 2024
4149436
removed redundent conditions
Adir111 Oct 14, 2024
c97937a
additional condition
Adir111 Oct 14, 2024
85d671f
changed to config value
Adir111 Oct 14, 2024
e18b654
Merge branch 'master' into streaming-logs-scaleupspeed
Adir111 Oct 15, 2024
a6a285b
removed timeout (added for a check)
Adir111 Oct 27, 2024
01b6573
Merge branch 'streaming-logs-scaleupspeed' of https://github.com/kube…
Adir111 Oct 27, 2024
0a9bdb7
changed scale up condition
Adir111 Oct 27, 2024
ee39cc4
scale up amount changed
Adir111 Oct 27, 2024
ad737de
changed scale up condition
Adir111 Oct 27, 2024
429349e
undo last change
Adir111 Oct 27, 2024
14d4c5b
added logging
Adir111 Oct 27, 2024
0f5c454
logging
Adir111 Oct 27, 2024
41f83d2
added _ since prop didnt exist
Adir111 Oct 28, 2024
2df2e53
redundent, being handled in auto-scaler
Adir111 Oct 28, 2024
82b8258
changed condition to scale up
Adir111 Oct 28, 2024
f199b32
added logging
Adir111 Oct 28, 2024
497ff2c
added logging
Adir111 Oct 28, 2024
8902851
removed =
Adir111 Oct 28, 2024
8ea71bc
removed unused code
Adir111 Oct 28, 2024
ac485c6
removed unused code
Adir111 Oct 28, 2024
9663106
avg of round trip (array)
Adir111 Oct 28, 2024
fb21a0b
removed total logging
Adir111 Oct 28, 2024
3e15311
fixed bug
Adir111 Oct 29, 2024
6529dad
added dynamic max size to fixed-window
Adir111 Oct 30, 2024
da32099
wip
Adir111 Oct 30, 2024
6be15d6
fix problematic value
Adir111 Oct 30, 2024
241f37f
logging
Adir111 Oct 30, 2024
2cf2ee1
logging fixed
Adir111 Oct 30, 2024
c66cdaa
fixed error
Adir111 Oct 30, 2024
1a4f3a0
fix
Adir111 Oct 30, 2024
ca6066e
not needed, changed back
Adir111 Oct 30, 2024
4e9f382
Now not scaling down in case there is queue
Adir111 Oct 30, 2024
4c3d230
removed for checking
Adir111 Oct 30, 2024
18c29a2
undo last check
Adir111 Oct 30, 2024
ac3a091
undo for check
Adir111 Oct 30, 2024
1cf629b
removed for checking
Adir111 Oct 31, 2024
99c6840
corrected config access
Adir111 Oct 31, 2024
d49ce2e
added config parameter for debugging purposes
Adir111 Oct 31, 2024
898cbf2
unused
Adir111 Oct 31, 2024
574fa16
removed unused
Adir111 Oct 31, 2024
f6f647d
fixed
Adir111 Oct 31, 2024
a349926
fix
Adir111 Oct 31, 2024
bd690d8
wip
Adir111 Oct 31, 2024
4a05bce
logging
Adir111 Oct 31, 2024
2a97aaa
fix config logic
Adir111 Oct 31, 2024
601a276
fixed logging
Adir111 Oct 31, 2024
87c695e
updated config value
Adir111 Oct 31, 2024
f4f7fab
updated config value
Adir111 Oct 31, 2024
57c33f4
revert window change update
Adir111 Oct 31, 2024
6d3398c
queue empty when less then 1 sec
Adir111 Oct 31, 2024
db0b5fb
fixed config debug check
Adir111 Oct 31, 2024
35b273c
removed old logging used for checking
Adir111 Oct 31, 2024
9116720
fixed reaching undefined value
Adir111 Oct 31, 2024
7f7160c
removed (used for debugging)
Adir111 Nov 3, 2024
9b2d70c
changed window size
Adir111 Nov 3, 2024
a41fe36
not needed
Adir111 Nov 7, 2024
cd35992
Merge branch 'master' into streaming-logs-scaleupspeed
Adir111 Nov 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions core/task-executor/lib/reconcile/reconciler.js
Original file line number Diff line number Diff line change
Expand Up @@ -682,6 +682,15 @@ const reconcile = async ({ algorithmTemplates, algorithmRequests, workers, jobs,
required: 0
};
}
const _created = reconcileResult[algorithmName].created;
const _skipped = reconcileResult[algorithmName].skipped;
const { paused, resumed, required } = reconcileResult[algorithmName];
const total = _created + _skipped + paused + resumed + required;
if (total !== 0) {
log.info(`CYCLE: task-executor: algo: ${algorithmName} created: ${_created},
skipped: ${_skipped}, paused: ${paused},
resumed: ${resumed}, required: ${required}.`);
}
reconcileResult[algorithmName].active = ws.count;
});
return reconcileResult;
Expand Down
9 changes: 1 addition & 8 deletions core/worker/config/main/config.base.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,10 @@ config.streaming = {
minTimeNonStatsReport: formatters.parseInt(process.env.AUTO_SCALER_NON_STATS_REPORT, 10000),
},
scaleUp: {
replicasExtra: formatters.parseInt(process.env.AUTO_SCALER_EXTRA_REPLICAS, 0.35),
maxScaleUpReplicasPerNode: formatters.parseInt(process.env.AUTO_SCALER_MAX_REPLICAS, 1000),
maxScaleUpReplicasPerTick: formatters.parseInt(process.env.AUTO_SCALER_MAX_REPLICAS_PER_SCALE, 10),
replicasOnFirstScale: formatters.parseInt(process.env.AUTO_SCALER_REPLICAS_FIRST_SCALE, 1),
minTimeToCleanUpQueue: formatters.parseInt(process.env.AUTO_SCALER_MIN_TIME_CLEAN_QUEUE, 30),
},
scaleDown: {
tolerance: formatters.parseInt(process.env.AUTO_SCALER_SCALE_DOWN_TOLERANCE, 0.4),
minTimeIdleBeforeReplicaDown: formatters.parseInt(process.env.AUTO_SCALER_MIN_TIME_WAIT_REPLICA_DOWN, 60000),
minQueueSizeBeforeScaleDown: formatters.parseInt(process.env.AUTO_SCALER_MIN_QUEUE_SIZE_BEFORE_SCALE_DOWN, 0),
minTimeQueueEmptyBeforeScaleDown: formatters.parseInt(process.env.AUTO_SCALER_MIN_TIME_QUEUE_EMPTY, 60000),
minTimeToCleanUpQueue: formatters.parseInt(process.env.AUTO_SCALER_MIN_TIME_CLEAN_QUEUE, 30), // seconds
},
scaleIntervention: {
throttleMs: formatters.parseInt(process.env.SCALE_INTERVENTION_LOG_THROTTLE_TIME, 200)
Expand Down
5 changes: 5 additions & 0 deletions core/worker/lib/streaming/core/metrics.js
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
const { mean } = require('@hkube/stats');
const Logger = require('@hkube/logger');

const _calcRate = (list) => {
let first = list[0];
if (list.length === 1) {
first = { time: first.time - 2000, count: 0 };
}
const last = list[list.length - 1];

const log = Logger.GetLogFromContainer();
log.info(`STATISTICS: first value: ${first.count}, last value: ${last.count}, time diff: ${last.time - first.time} ms`);

const timeDiff = (last.time - first.time) / 1000;
const countDiff = last.count - first.count;
let rate = 0;
Expand Down
31 changes: 15 additions & 16 deletions core/worker/lib/streaming/core/scaler.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ class Scaler {
this._status = SCALE_STATUS.IDLE;
this._startInterval();
this._minStatelessCount = minStatelessCount;
this._isQueueEmpty = true;
}

stop() {
Expand All @@ -58,38 +59,37 @@ class Scaler {
return;
}

let pendingUp = false;
this._status = SCALE_STATUS.IDLE;
const unScheduledAlgorithm = await this._getUnScheduledAlgorithm();

if (unScheduledAlgorithm) {
this._status = `${SCALE_STATUS.UNABLE_SCALE} ${unScheduledAlgorithm.message}`;
pendingUp = true;
}
else {
const queue = await this._getQueue();
if (queue) {
this._status = SCALE_STATUS.PENDING_QUEUE;
pendingUp = true;
}
}

const currentSize = this._getCurrentSize();
const shouldScaleUp = pendingUp ? false : this._shouldScaleUp(currentSize);
const shouldScaleDown = this._shouldScaleDown(currentSize);
const shouldScaleUp = this._shouldScaleUp(currentSize);
const shouldScaleDown = this._isQueueEmpty && this._shouldScaleDown(currentSize);

if (shouldScaleUp) {
const required = this._required - currentSize;
const required = this._required - this._desired;
const replicas = Math.min(required, this._maxScaleUpReplicasPerTick);
log.info(`CYCLE: worker shouldScaleUp required: ${required}, replicas: ${replicas}, desired: ${this._desired}, currentSize: ${currentSize}`);
const scaleTo = replicas + currentSize;
this._desired = this._required;
this._desired += replicas;
this._lastScaleUpTime = Date.now();
this._status = SCALE_STATUS.SCALING_UP;
this._scaleUp({ replicas, currentSize, scaleTo });
}
if (shouldScaleDown) {
const replicas = currentSize - this._required;
const scaleTo = this._required;
log.info(`CYCLE: worker shouldScaleDown scaleTo: ${scaleTo}, replicas: ${replicas}, desired: ${this._desired}, currentSize: ${currentSize}`);
this._desired = this._required;
this._lastScaleDownTime = Date.now();
this._status = SCALE_STATUS.SCALING_DOWN;
Expand All @@ -109,14 +109,17 @@ class Scaler {
return this._status;
}

updateRequired(required) {
this._scale = true;
this._required = Math.min(required, this._maxScaleUpReplicasPerNode);
updateRequired(required, isQueueEmpty) {
this._isQueueEmpty = isQueueEmpty;
if (required !== this._required) {
this._scale = true;
this._required = Math.min(required, this._maxScaleUpReplicasPerNode);
}
}

_shouldScaleUp(currentSize) {
let shouldScaleUp = false;
if (currentSize < this._required
if (currentSize < this._required && this._desired < this._required
&& (!this._lastScaleDownTime || Date.now() - this._lastScaleDownTime > this._minTimeBetweenScales)) {
if (this._desired <= currentSize) {
shouldScaleUp = true;
Expand All @@ -140,11 +143,7 @@ class Scaler {

_shouldScaleDown(currentSize) {
let shouldScaleDown = false;
let limitScaleDown = false;
if ((this.minStatelessCount > 0)) {
limitScaleDown = (this._minStatelessCount >= this._required);
}
if (currentSize > this._required && !limitScaleDown
if (currentSize > this._required
&& (!this._lastScaleUpTime || Date.now() - this._lastScaleUpTime > this._minTimeBetweenScales)) {
if (this._desired >= currentSize) {
shouldScaleDown = true;
Expand Down
8 changes: 4 additions & 4 deletions core/worker/lib/streaming/core/statistics.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,11 @@ class Statistics {

_createStatData({ maxSize }) {
return {
requests: new FixedWindow(maxSize),
requests: new FixedWindow(maxSize), // Brings one value every 2 seconds, meaning for a window_size of 10 we will consider reports of last 20 seconds.
responses: new FixedWindow(maxSize),
durations: new FixedWindow(maxSize),
grossDurations: new FixedWindow(maxSize),
queueDurations: new FixedWindow(maxSize),
durations: new FixedWindow(maxSize * 10), // Brings window_size values every 2 seconds, so for a window_size multiplied by 10 we will consider values that occured in the last 20 seconds.
grossDurations: new FixedWindow(maxSize * 10),
queueDurations: new FixedWindow(maxSize * 10),
};
}
}
Expand Down
Loading
Loading