-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tests): End-to-end integration for upkeep retry #112
Conversation
else: | ||
time.sleep(1) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event that not all tasks are processed should we have a max time we'll wait for completion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should! Added a timeout for taskworkers.
def check_num_tasks_written(consumer_config: dict) -> int: | ||
attach_db_stmt = f"ATTACH DATABASE '{consumer_config['db_path']}' AS {consumer_config['db_name']};\n" | ||
query = f"""SELECT count(*) as count FROM {consumer_config['db_name']}.inflight_taskactivations;""" | ||
con = sqlite3.connect(consumer_config["db_path"]) | ||
cur = con.cursor() | ||
cur.executescript(attach_db_stmt) | ||
rows = cur.execute(query).fetchall() | ||
count = rows[0][0] | ||
return count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function shows up in a few integration tests, perhaps we should have a module of common integration test helpers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, lifted the function and consumer configs to helpers.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
What does this test do?
This tests is responsible for checking the integrity of the retry mechanism implemented in the upkeep thread of taskbroker. An initial amount of messages is produced to kafka with a set number of retries in its retry policy. Then, the taskworkers fetch and update the task's status to retry. During an interval, the upkeep thread will collect these tasks and re-produce the task to kafka. This process continues until all tasks have been retried the specified number of times.
How does it accomplish this?
The test starts N number of taskworker(s) and a consumer in separate
threads. Synchronization events are use to instruct the taskworker(s) when start processing and shutdown. A shared data structured access by a mutex called TaskRetriedCounter is used to globally keep track of every task retried. Finally, this total number is validated alongside the number of times each individual task was retried.