Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade the locking system #1024

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

judahrand
Copy link
Contributor

@judahrand judahrand commented Oct 6, 2022

Problem

We have been facing deadlocks. This is because the locks held by pidfile never expire. So if a process is SIGKILLed (for example by Kubernetes) this will cause a deadlock.

Proposed changes

This alternative locking library sherlock allows us to set a file based lock which is similar to the pidfile. However, it has a number of advantages:

  • The locks expire. Therefore, we are able to hold the lock for only a short time (configured to 30 seconds) and renew it whenever it is close to expiring. If the process is ever killed the lock will expire after at most 30 seconds.
  • The 'owner' of the lock is a UUID rather than a PID. This has advantages when running in Kubernetes since in Kubernetes the PID is ALWAYS 1. Therefore all instances think they have the lock. This was previously okay as PipelineWise also checked the log file for a .running extension. With the new locking this is no longer needed and has been removed.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • I have read the CONTRIBUTING doc
  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@judahrand
Copy link
Contributor Author

@Samira-El This locking mechanism is much much more robust than the PIDFile system used previously. Can you please have a look? I think this is a seriously good addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant