Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait replication sync command #91

Merged
merged 9 commits into from
Jan 25, 2024
Merged

Conversation

kirillgarbar
Copy link
Contributor

@kirillgarbar kirillgarbar commented Jan 23, 2024

Wait replication sync script is moved to ch-monitoring since it is basically a call to ch-monitoring command inside.
It might be worth moving it to chadmin.

I also not sure how we should implement a call of a click command inside another command, the most direct way to do it is to call a command with ctx.invoke() or ctx.forward(), but the docs suggest it is not a recommended way to do this. That's why I moved replication-lag command to a serapate method for a while.

@option(
"-t",
"--timeout",
type=int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use TimeSpanParamType

Usage example:

@option(
"-g",
"--guard-interval",
"--to-time",
"to_time",
default=DEFAULT_GUARD_INTERVAL,
type=TimeSpanParamType(),
help=("End of inspecting interval in human-friendly format."),
)

Comment on lines 28 to 29
default=3 * 24 * 60 * 60,
help="Max amount of time to wait, in seconds. Default is 30 days.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is 3 days, right? Not 30.

And default values should be automatically added to help messages. So it's not required to write them manually.

help="Max amount of time to wait, in seconds. Default is 30 days.",
)
@pass_context
def wait_replication_sync_command(ctx, status, pause, timeout):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to move the command to chadmin and estimate_replication_lag function from ch_replication_lag.py somewhere to https://github.com/yandex/ch-tools/blob/master/ch_tools/common

Comment on lines +142 to +158
When we execute query on clickhouse01
"""
SYSTEM STOP FETCHES
"""
And we execute query on clickhouse02
"""
INSERT INTO test.table_01 SELECT number FROM numbers(100)
"""
And we sleep for 5 seconds
And we execute command on clickhouse01
"""
ch-monitoring replication-lag -w 4
"""
Then we get response contains
"""
1;
"""
Copy link
Contributor Author

@kirillgarbar kirillgarbar Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the same test that detects replication lag for replication-sync command, but default timeout for replication-lag is 300 seconds. This would require to sleep for 300 seconds and will slow tests a lot.
Maybe we should pass args for replication-lag to replication-sync as discussed above to avoid this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense

@kirillgarbar
Copy link
Contributor Author

kirillgarbar commented Jan 24, 2024

Also I discovered that running a chadmin in tests causes import exception. The reason is urllib3 version > 2 is installed in clickhouse container despite the fact that it is locked to < 1.27 in lock file.
To fix this locally I had to add RUN pip install --force-reinstall -v "urllib3==1.26.5" to clickhouse docker file.

@aalexfvk
Copy link
Contributor

Also I discovered that running a chadmin in tests causes import exception. The reason is urllib3 version > 2 is installed in clickhouse container despite the fact that it is locked to < 1.27 in lock file. To fix this locally I had to add RUN pip install --force-reinstall -v "urllib3==1.26.5" to clickhouse docker file.

At first sight, we can pin requests = "<2.30.0" in pyproject.toml as a workaround

@aalexfvk aalexfvk merged commit 1746a7a into yandex:main Jan 25, 2024
15 checks passed
@kirillgarbar kirillgarbar deleted the replication-sync branch January 30, 2024 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants