Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful handling and notification of errors #191

Merged
merged 10 commits into from
Aug 11, 2024
Merged

Conversation

AshishA26
Copy link
Contributor

@AshishA26 AshishA26 commented Jul 20, 2024

worker_manager.py has been updated to recover from a worker crashing. Code checks if all workers are alive in each manager. If a worker is dead, log the error and restart the worker (terminate and join the worker, drain the input queues, and create and start a new worker).

Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
@AshishA26 AshishA26 requested a review from Xierumeng July 25, 2024 01:36
@AshishA26 AshishA26 force-pushed the notification_of_errors branch from c648676 to b94413f Compare July 25, 2024 03:01
Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
@AshishA26 AshishA26 requested a review from Xierumeng August 6, 2024 22:45
Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

log.txt Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
@AshishA26 AshishA26 force-pushed the notification_of_errors branch from a408ca7 to 05901b9 Compare August 11, 2024 01:58
@AshishA26 AshishA26 requested a review from Xierumeng August 11, 2024 01:59
Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
@AshishA26 AshishA26 requested a review from Xierumeng August 11, 2024 02:13
Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewd.

utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
utilities/workers/worker_manager.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@AshishA26 AshishA26 merged commit f867134 into main Aug 11, 2024
1 check passed
@AshishA26 AshishA26 deleted the notification_of_errors branch August 11, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants