Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v8.0] Match all the things #7907

Open
wants to merge 1 commit into
base: rel-v8r0
Choose a base branch
from

Conversation

chrisburr
Copy link
Member

@chrisburr chrisburr commented Nov 22, 2024

In LHCbDIRAC we found that the matcher was struggling when there are many jobs running at a single site. This was caused by the JobDB being slow to do select Type, count(*) from Jobs where Site = %(site)s group by Type. Every 10 seconds the cache would expire and every thread would try to query the DB again and hang for a long time.

Looking at the culmative response time over a couple of hours (with a similar number of requests in both periods) the new version is 98.9% faster.

Before this change:

Screenshot 2024-11-22 at 12 55 22 Screenshot 2024-11-22 at 14 20 42

After this change:

Screenshot 2024-11-22 at 12 55 40 Screenshot 2024-11-22 at 14 20 56

BEGINRELEASENOTES

*WorkloadManagment
CHANGE: Better caching performance in the Matching Limiter

ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Nov 22, 2024
@chrisburr chrisburr force-pushed the match-all-the-things branch 5 times, most recently from 7819a7b to 4ba5244 Compare November 22, 2024 10:59
class Limiter:
# static variables shared between all instances of this class
csDictCache = DictCache()
condCache = DictCache()
newCache = TwoLevelCache(10, 300)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth replacing condCache with newCache but I don't have time for it.

@chrisburr chrisburr marked this pull request as ready for review November 22, 2024 13:17
Comment on lines +357 to +358
data = result["Value"]
data = {k[0][attName]: k[1] for k in data}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = result["Value"]
data = {k[0][attName]: k[1] for k in data}
data = {k[0][attName]: k[1] for k in result["Value"]}

return result
# It is critical that ``future`` is waited for outside of the lock as
# _work aquires the lock before filling the caches. This also means
# we can gaurentee that the future has not yet been removed from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we can gaurentee that the future has not yet been removed from the
# we can guarantee that the future has not yet been removed from the

@@ -12,10 +21,109 @@
from DIRAC.WorkloadManagementSystem.Client import JobStatus


class TwoLevelCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem to me there's anything "specific" to this class. What if you move it to a generic utility module?

self.futures: dict[str, Future] = {}
self.pool = ThreadPoolExecutor(max_workers=max_workers)

def get(self, key: str, populate_func: Callable[[], Any]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get(self, key: str, populate_func: Callable[[], Any]):
def get(self, key: str, populate_func: Callable[[], Any]) -> dict:

@fstagni fstagni changed the title [master] Match all the things [v8.0] Match all the things Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alsoTargeting:integration Cherry pick this PR to integration after merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants