Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multiple mutexes and semaphores #13358

Merged
merged 6 commits into from
Sep 19, 2024
Merged

feat: multiple mutexes and semaphores #13358

merged 6 commits into from
Sep 19, 2024

Conversation

Joibel
Copy link
Member

@Joibel Joibel commented Jul 17, 2024

Fixes #5022
Fixes #11859

This is mostly backwards compatible. It is minorly breaking only in as much as semaphore + mutex at the same time will now actually take both, whereas before it ignored the mutex.

Motivation

As a workflows users I would like to be able to use more than one sync option in a workflow or template.

Modifications

Added mutexes and semaphores to the synchronization block in workflow types as lists. Legacy mutex and semaphore kept in and continue to work (will be added to the mutexes/semaphores list) but marked for deprecation.

Logic changed to acquire a full list of all synchronization items (syncItem) when TryAcquire and Release from sync manager.

TryAcquire from sync_manager calls a new checkAcquire method, whilst holding the sync_manager internal golang sync.Mutex to check if they are all acquirable before acquiring them. This honors priority.

TryAcquire also returns the failed lock name in the case of failure to acquire to reduce the number of exported package sync methods and complexity outside of sync.

The interface to the workflow status block has not changed.

Reduced the exported types and methods from the sync module as far as possible (lower case instead of upper case).

Removed the internal golang sync mutex from PrioritySemaphore as all methods are called from sync_manager.go when under it's own lock. The alternative was unnecessarily complicating the calls inside semaphore.go. This is faster, but less clean in case someone wants to reuse the PrioritySemaphore type.

Renamed the manager in sync_manager.go where used: cm->sm and concurrencyManager->syncManager.

Synchronization documentation updated -- EDIT: split to #13393

Verification

New and updated unit tests.
New and updated e2e smoke tests.

Stacked PR

This is stacked on top of #13393 now.

Signed-off-by: Alan Clucas [email protected]

@Joibel Joibel force-pushed the priority-mutex-removal branch from e662199 to 1f1bb7a Compare July 17, 2024 13:05
@Joibel Joibel marked this pull request as draft July 17, 2024 13:19
@Joibel Joibel force-pushed the multiple-semaphores branch from 6f34440 to 3070c90 Compare July 17, 2024 13:35
@Joibel Joibel added area/parallelism `parallelism` for the Controller, Workflows, or templates prioritized-review For members of the Sustainability Effort labels Jul 17, 2024
@Joibel Joibel marked this pull request as ready for review July 17, 2024 13:51
@terrytangyuan
Copy link
Member

@jessesuen Would you like to review this as well since you might be relying on the existing behavior of the synchronization feature?

docs/synchronization.md Outdated Show resolved Hide resolved
docs/synchronization.md Outdated Show resolved Hide resolved
@Joibel
Copy link
Member Author

Joibel commented Jul 18, 2024

@jessesuen Would you like to review this as well since you might be relying on the existing behavior of the synchronization feature?

It shouldn't be a breaking change for existing users.

@Joibel Joibel force-pushed the multiple-semaphores branch from 09a3526 to f17a0b6 Compare July 18, 2024 10:08
@agilgur5
Copy link

agilgur5 commented Jul 22, 2024

@Joibel could you please re-format your PR descriptions? I think they're line-breaking at 80 chars or something, which depending on the width of the GH description, can sometimes cause a double line-break or otherwise look weird. It also makes quoting look strange too, as other comments are not at that line width. I manually re-formatted a few of your other PRs, but this one is quite long

I can review this as I had worked briefly on these, but I'm not promising any timeline as I already have too many things on my plate (including all the metrics PRs, and docs/UI/Server and some others are often falling on my plate too 😕 most features have docs portions too...)
EDIT: This one is substantially smaller, and I have familiarity with the exact feature and portion of the codebase, and so I can get to it before all the metrics ones. I'm just going to push those further back then.

Copy link

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly backwards compatible. It is minorly breaking only in as much as semaphore + mutex at the same time will now actually take both, whereas before it ignored the mutex.

Are we considering this breaking? The silent ignore is a pure bug to me, where the user intent was to do both, but you'd find out after that didn't happen.

I was actually thinking that would be its own fix PR that would be backported to older versions, whereas the feature to have multiple ofc would not be backported

Copy link

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version specifiers are needed everywhere. Also the parallelism doc should be split into its own PR as it is very backport worthy.

docs/synchronization.md Outdated Show resolved Hide resolved
pkg/apis/workflow/v1alpha1/workflow_types.go Outdated Show resolved Hide resolved
docs/parallelism.md Outdated Show resolved Hide resolved
Base automatically changed from priority-mutex-removal to main July 23, 2024 19:59
@Joibel Joibel force-pushed the multiple-semaphores branch from f17a0b6 to e26f133 Compare July 24, 2024 08:07
@Joibel
Copy link
Member Author

Joibel commented Jul 25, 2024

This is mostly backwards compatible. It is minorly breaking only in as much as semaphore + mutex at the same time will now actually take both, whereas before it ignored the mutex.

Are we considering this breaking? The silent ignore is a pure bug to me, where the user intent was to do both, but you'd find out after that didn't happen.

I was actually thinking that would be its own fix PR that would be backported to older versions, whereas the feature to have multiple ofc would not be backported

These are intertwined features in my implementation. One semaphore and one mutex is just a special case of multiple synchronization items. The bulk of this PR is necessary to make that case work, and I'd consider it too invasive to backport.

I consider the Alternatively wording sufficient to say it was not supposed to work before, hence should not be backported.

from a workflow or template within a workflow. Alternatively, users can

I'm not fussed whether we call this breaking though.

Joibel added a commit that referenced this pull request Jul 25, 2024
Split from #13358

This is a rewrite of the docs for synchronization and parallelism,
split from the multiple mutexes and semaphores code. This PR is
suitable for backporting to 3.5 and 3.4.

* New markdown document on parallelism:
  - locks and parallelism will generally have different use cases. I
    think it is better not to have them on the same page so you don't
    think you could use one instead of the other
  - As far as I know more pages doesn't cost us anything.
  - priority for parallelism documented
  - paralellism in workflow and template spec better explained

* Synchronisation adds:
  - information on queuing
  - better explain whay a mutex is
  - explain behaviour in case of specifying both semaphore and mutex

Conform to style guide and 1 sentence per line

Signed-off-by: Alan Clucas <[email protected]>
@Joibel Joibel marked this pull request as draft July 25, 2024 10:52
Joibel added a commit that referenced this pull request Jul 25, 2024
Split from #13358

This is a rewrite of the docs for synchronization and parallelism,
split from the multiple mutexes and semaphores code. This PR is
suitable for backporting to 3.5 and 3.4.

* New markdown document on parallelism:
  - locks and parallelism will generally have different use cases. I
    think it is better not to have them on the same page so you don't
    think you could use one instead of the other
  - As far as I know more pages doesn't cost us anything.
  - priority for parallelism documented
  - paralellism in workflow and template spec better explained

* Synchronisation adds:
  - information on queuing
  - better explain whay a mutex is
  - explain behaviour in case of specifying both semaphore and mutex

Conform to style guide and 1 sentence per line

Signed-off-by: Alan Clucas <[email protected]>
@Joibel Joibel force-pushed the multiple-semaphores branch from e26f133 to 2b06b8a Compare July 25, 2024 11:01
@Joibel Joibel changed the base branch from main to sync-docs July 25, 2024 11:02
@Joibel Joibel changed the title feat!: multiple mutexes and semaphores feat: multiple mutexes and semaphores Jul 25, 2024
@agilgur5
Copy link

The bulk of this PR is necessary to make that case work, and I'd consider it too invasive to backport.

Gotcha. Yea when I examined the issue it could not be done easily either (whereas multi-semaphore was more straightforward).

In that case, I am wondering whether we should have a separate "fix" to backport that errors out when both are provided. Since this feature will only land in 3.6 and the silent ignore in earlier versions is very confusing/surprising as it stands

@agilgur5 agilgur5 added area/mutex-semaphore and removed area/parallelism `parallelism` for the Controller, Workflows, or templates labels Jul 25, 2024
@Joibel
Copy link
Member Author

Joibel commented Jul 25, 2024

In that case, I am wondering whether we should have a separate "fix" to backport that errors out when both are provided. Since this feature will only land in 3.6 and the silent ignore in earlier versions is very confusing/surprising as it stands

Yes, this would be nice if it got done.

Joibel added a commit that referenced this pull request Aug 28, 2024
Split from #13358

This is a rewrite of the docs for synchronization and parallelism,
split from the multiple mutexes and semaphores code. This PR is
suitable for backporting to 3.5 and 3.4.

* New markdown document on parallelism:
  - locks and parallelism will generally have different use cases. I
    think it is better not to have them on the same page so you don't
    think you could use one instead of the other
  - As far as I know more pages doesn't cost us anything.
  - priority for parallelism documented
  - paralellism in workflow and template spec better explained

* Synchronisation adds:
  - information on queuing
  - better explain whay a mutex is
  - explain behaviour in case of specifying both semaphore and mutex

Conform to style guide and 1 sentence per line

Signed-off-by: Alan Clucas <[email protected]>
Base automatically changed from sync-docs to main September 3, 2024 20:35
@Joibel Joibel force-pushed the multiple-semaphores branch 2 times, most recently from 7618198 to cda4b03 Compare September 4, 2024 13:33
@Joibel Joibel requested a review from agilgur5 September 4, 2024 14:46
@agilgur5 agilgur5 added this to the v3.6.0 milestone Sep 5, 2024
@Joibel Joibel force-pushed the multiple-semaphores branch 2 times, most recently from dc81fbf to 97f7b03 Compare September 11, 2024 13:17
@JPZ13
Copy link
Member

JPZ13 commented Sep 12, 2024

Hey @agilgur5 - have @Joibel's additional commits addressed your change request? We need to get this one moving for a customer

@sarabala1979
Copy link
Member

I had discuss with @Joibel. I liked the Approach and approved. @agilgur5 is aready reviewing the code indetail. Please approved it once comments are address

@Joibel Joibel force-pushed the multiple-semaphores branch 2 times, most recently from 86e3800 to cf67c5e Compare September 17, 2024 10:22
Joibel and others added 4 commits September 18, 2024 11:31
Fixes #5022
Fixes #11859

This is mostly backwards compatbile. It is minorly breaking only in as
much as semaphore + mutex at the same time will now require both,
whereas before it ignored the mutex.

As a workflows users I would like to be able to use more than one sync
option in a workflow or template.

Added `mutexes` and `semaphores` to the `synchronization` block in
workflow types as lists. Legacy `mutex` and `semaphore` kept in and
continue to work (will be added to the mutexes/semaphores list) but
marked for deprecation.

Logic changed to acquire a full list of all synchronization
items (`syncItem`) when `TryAcquire` and `Release` from sync manager.

`TryAcquire` from sync_manager` calls a new `checkAcquire` method,
whilst holding the sync_manager internal golang `sync.Mutex` to check
if they are all acquirable before acquiring them. This honors
priority.

`TryAcquire` also returns the failed lock name in the case of failure
to acquire to reduce the number of exported `package sync` methods and
complexity outside of `sync`.

The interface to the workflow status block has not changed.

Reduced the exported types and methods from the sync module as far as
possible (lower case instead of upper case).

Removed the internal golang sync mutex from `PrioritySemaphore` as all
methods are called from `sync_manager.go` when under it's own
lock. The alternative was unnecessarily complicating the calls inside
semaphore.go. This is faster, but less clean in case someone wants to
reuse the PrioritySemaphore type.

Renamed the manager in `sync_manager.go` where used: `cm`->`sm` and
`concurrencyManager`->`syncManager`.

New and updated unit tests.
New and updated e2e smoke tests.

Signed-off-by: Alan Clucas <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
@Joibel Joibel force-pushed the multiple-semaphores branch from cf67c5e to 8e26e16 Compare September 18, 2024 10:32
Signed-off-by: Alan Clucas <[email protected]>
@agilgur5 agilgur5 dismissed their stale review September 19, 2024 01:23

Dismissing my review to unblock this. We can iterate on the docs later

@agilgur5
Copy link

agilgur5 commented Sep 19, 2024

I haven't been feeling great the past week, so dismissed my review to unblock this. Can iterate more on docs later as I do have some thoughts on them, but haven't quite had the brainpower to review properly

@isubasinghe
Copy link
Member

/retest

1 similar comment
@isubasinghe
Copy link
Member

/retest

@Joibel Joibel merged commit 4e88752 into main Sep 19, 2024
27 checks passed
@Joibel Joibel deleted the multiple-semaphores branch September 19, 2024 14:27
@agilgur5
Copy link

agilgur5 commented Sep 22, 2024

I haven't been feeling great the past week, so dismissed my review to unblock this. Can iterate more on docs later as I do have some thoughts on them, but haven't quite had the brainpower to review properly

I wrote up a follow-up docs PR in #13645 which also addresses the immediate user confusion with versions per #13644 et al

(I also suddenly started feeling much better on Friday, hence all my contributions this weekend. maybe the new meds have kicked in 🤷 🤞)

@agilgur5
Copy link

In that case, I am wondering whether we should have a separate "fix" to backport that errors out when both are provided. Since this feature will only land in 3.6 and the silent ignore in earlier versions is very confusing/surprising as it stands

Yes, this would be nice if it got done.

I have a WIP PR for this in #13669. Might want someone here to take that over as some of the logic would be desirable to transfer to 3.6+ as well. It would also be a good place to add deprecation warnings in the logs in 3.6+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mutex-semaphore prioritized-review For members of the Sustainability Effort
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mutex ignored if using semaphore at same top level Support for multi lock or semaphores
8 participants