Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tests): End-to-end integration for upkeep retry #112

Merged
merged 10 commits into from
Jan 10, 2025

Conversation

enochtangg
Copy link
Member

What does this test do?

This tests is responsible for checking the integrity of the retry mechanism implemented in the upkeep thread of taskbroker. An initial amount of messages is produced to kafka with a set number of retries in its retry policy. Then, the taskworkers fetch and update the task's status to retry. During an interval, the upkeep thread will collect these tasks and re-produce the task to kafka. This process continues until all tasks have been retried the specified number of times.

How does it accomplish this?

The test starts N number of taskworker(s) and a consumer in separate
threads. Synchronization events are use to instruct the taskworker(s) when start processing and shutdown. A shared data structured access by a mutex called TaskRetriedCounter is used to globally keep track of every task retried. Finally, this total number is validated alongside the number of times each individual task was retried.

@enochtangg enochtangg requested a review from a team as a code owner January 9, 2025 16:58
Comment on lines +94 to +96
else:
time.sleep(1)
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event that not all tasks are processed should we have a max time we'll wait for completion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should! Added a timeout for taskworkers.

Comment on lines 170 to 178
def check_num_tasks_written(consumer_config: dict) -> int:
attach_db_stmt = f"ATTACH DATABASE '{consumer_config['db_path']}' AS {consumer_config['db_name']};\n"
query = f"""SELECT count(*) as count FROM {consumer_config['db_name']}.inflight_taskactivations;"""
con = sqlite3.connect(consumer_config["db_path"])
cur = con.cursor()
cur.executescript(attach_db_stmt)
rows = cur.execute(query).fetchall()
count = rows[0][0]
return count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function shows up in a few integration tests, perhaps we should have a module of common integration test helpers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, lifted the function and consumer configs to helpers.py

Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@enochtangg enochtangg merged commit 65debdf into main Jan 10, 2025
10 checks passed
@enochtangg enochtangg deleted the retry-and-dql-integration-test branch January 10, 2025 21:46
@enochtangg enochtangg linked an issue Jan 13, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add integration test for retry + dlq tasks
3 participants