-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add setter methods map to WMWorkload and call it for all reqArg parameters #12099
base: master
Are you sure you want to change the base?
Add setter methods map to WMWorkload and call it for all reqArg parameters #12099
Conversation
Jenkins results:
|
Jenkins results:
|
@amaltaro Please take a look at this PR, I think this one fully covers all the requirements and addresses our fears for affecting scalability due to increased database calls. |
Jenkins results:
|
|
||
# Commit the changes of the current workload object to the database: | ||
workload.saveCouchUrl(workload.specUrl()) | ||
|
||
# Commit all Global WorkQueue changes per workflow in a single go: | ||
self.gq_service.updateElementsByWorkflow(workload.name(), reqArgs, status=['Available']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this call, you will actually change the behavior of priority to get only applied to Available elements.
In addition, do you ensure now that reqArgs
will always be only the supported parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this call, you will actually change the behavior of priority to get only applied to Available elements.
Well, does it even matter? As we discussed with you few days ago - do we actually know if we continue to update any Worqueue Element's parameters at the agent once it is acquired from the GlobalWorkQueue ? If we do, then it matters also for Sitelists as well. So IIRC, during our discussion we decided to stay safe and update only available WQE and not to risk getting into some race conditions related to input data location etc. If we do not care I can remove the requirement for status=['Availiable']
. But if we care to propagate the Priority
information down to the agent for every WQE status - Aqcuired et. all.
, but we want to update site information only at Global WQ, then we simply cannot have a mechanism to do it in a single push. The two calls should be separated. So we cannot have them both:
- a single DB action for updating WQE
- different WQE status requirement based on request parameters
(an escape of this is to hardcode this in the Workqueue module logic ofupdateelementsByWorkflow
, but you know my opinion on hardcoding requirements in the code.
In addition, do you ensure now that reqArgs will always be only the supported parameters?
As long as we follow the same logic as we did while validating the allowed parameters in the first set of actions in these calls (meaning to call validate_request_update_args
before calling the put
method and hence _handleNoStatusUpdate
) then we are keeping the same logic as before - hence following the map of allowed actions from
WMCore/src/python/WMCore/ReqMgr/DataStructs/RequestStatus.py
Lines 118 to 147 in 7ddd036
ALLOWED_ACTIONS_FOR_STATUS = { | |
"new": ["RequestPriority"], | |
"assignment-approved": ["RequestPriority", "Team", "SiteWhitelist", "SiteBlacklist", | |
"AcquisitionEra", "ProcessingString", "ProcessingVersion", | |
"Dashboard", "MergedLFNBase", "TrustSitelists", | |
"UnmergedLFNBase", "MinMergeSize", "MaxMergeSize", | |
"MaxMergeEvents", "BlockCloseMaxWaitTime", | |
"BlockCloseMaxFiles", "BlockCloseMaxEvents", "BlockCloseMaxSize", | |
"SoftTimeout", "GracePeriod", | |
"TrustPUSitelists", "CustodialSites", | |
"NonCustodialSites", "Override", | |
"SubscriptionPriority"], | |
"assigned": ["RequestPriority"], | |
"staging": ["RequestPriority", "SiteWhitelist", "SiteBlacklist"], | |
"staged": ["RequestPriority"], | |
"acquired": ["RequestPriority", "SiteWhitelist", "SiteBlacklist"], | |
"running-open": ["RequestPriority", "SiteWhitelist", "SiteBlacklist"], | |
"running-closed": ["RequestPriority"], | |
"failed": [], | |
"force-complete": [], | |
"completed": [], | |
"closed-out": [], | |
"announced": [], | |
"aborted": [], | |
"aborted-completed": [], | |
"rejected": [], | |
"normal-archived": [], | |
"aborted-archived": [], | |
"rejected-archived": [], | |
} |
reqArgs
to setterMethod
map at updateWorkloadArgs
:
self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority))
self.settersMap['SiteBlacklist'] = setterTuple('SiteBlacklist', self.setSiteBlacklist, inspect.signature(self.setSiteBlacklist))
self.settersMap['SiteWhitelist'] = setterTuple('SiteWhitelist', self.setSiteWhitelist, inspect.signature(self.setSiteWhitelist))
If we want to be extra detailed and fully exhaustive on any possible reqarg
per status
setters, we may also include into this map the eventual methods for all allowed actions per assignment-approved
status.:
"assignment-approved": ["RequestPriority", "Team", "SiteWhitelist", "SiteBlacklist",
"AcquisitionEra", "ProcessingString", "ProcessingVersion",
"Dashboard", "MergedLFNBase", "TrustSitelists",
"UnmergedLFNBase", "MinMergeSize", "MaxMergeSize",
"MaxMergeEvents", "BlockCloseMaxWaitTime",
"BlockCloseMaxFiles", "BlockCloseMaxEvents", "BlockCloseMaxSize",
"SoftTimeout", "GracePeriod",
"TrustPUSitelists", "CustodialSites",
"NonCustodialSites", "Override",
"SubscriptionPriority"],
But for some of them the logic may need to change because some of those setters are parametrized by more than a single argument. While at the same time we already know that for assignment-approved
we would never call _handleNoStatusUpdate
... So that's why, to me it seems safe to proceed only with those 3 setters mapped. Of course, if you ask me - I am completely up for moving the whole logic to be implemented here in a more generic way .... for all status updates, then get rid of a big chunk of code covering custom cases ... and only make the proper calls to this generic method here from upstream modules (e.g. Request in the current case) But I do not think we will have the time during this line of development here to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to distribute this update, but before answering how we can proceed, can you please check the JobUpdater code to check:
- in which databases are the WQEs updated?
- does it use any status filter? If so, which statuses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @amaltaro,
ok, Here it is:
-
JobUpdater
at the agent uses the very same methodupdatePriority
def updatePriority(self, wf, priority):
from theWMCore.Services.WorkQueue
to update the workflow priority (just as before), here is the actual call:
self.workqueue.updatePriority(workflow, priorityCache[workflow]) -
It checks for any diff between the priority in WMBS and couch database before taking actions:
if workqueuePrio != priorityCache[workflow]:
and:
if wmbsPrio != priorityCache[workflow]: -
The couchd database to be used is taken from the agent configuration for the component (actually the regular local workqueue component, not the
JobUpdater
config section):
WMCore/src/python/WMComponent/JobUpdater/JobUpdaterPoller.py
Lines 50 to 51 in 4e0759d
self.workqueue = WorkQueue(self.config.WorkQueueManager.couchurl, self.config.WorkQueueManager.dbname)
which in this case is (taken from one agent configuration):
config.WorkQueueManager.dbname = 'workqueue'
So the inbox db is indeed configured at the WorkqueueManager
component at the agent, but as long as I do not see any actual actions during this update from the updatepriority
method, I do not think we even touch this DB while updating the WQE priority at the agents.
- I found No filters being applied for any of the WQE, but the workflows considerred are the ones returned by:
workflowsToCheck = self.workqueue.getAvailableWorkflows()
and:
WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py
Lines 209 to 218 in 4e0759d
def getAvailableWorkflows(self): """Get the workflows that have all their elements available in the workqueue""" data = self.db.loadView('WorkQueue', 'elementsDetailByWorkflowAndStatus', {'reduce': False, 'stale': 'update_after'}) availableSet = set((x['value']['RequestName'], x['value']['Priority']) for x in data.get('rows', []) if x['key'][1] == 'Available') notAvailableSet = set((x['value']['RequestName'], x['value']['Priority']) for x in data.get('rows', []) if x['key'][1] != 'Available') return availableSet - notAvailableSet
# call the proper setter methods bellow. | ||
|
||
# populate the current instance settersMap | ||
self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you are already checking the parameter name and associating it to the actual function (plus inspecting its signature), isn't it more clear to simply call the actual method according to the parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is exactly what happens here. Well few line bellow though but still the same:
self.settersMap[reqArg].setterFunc(argValue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand it. What I am saying is that creating a map and passing a pointer to a function does not look as friendly as I classic if-elif dealing with the 3 parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a map gives the option of following the same pattern for adding as many setter calls (workflow parameter setter) as one may wish, by providing a generic method of validating and assembling the calls and not changing the logic for the rest of the function (i.e. not forking with hundreds of elifs, and getting into the trap of code growth by number of custom usecases). Here, the addition of a new setter is just a matter of adding yet another line in the map, following the same pattern as the rest in the map i.e.:
a named tuple of: ('reqArg', 'setterFunc', 'setterSignature')
So pretty universal way I'd say. And perfectly serves the exact purpose of what is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the rationale for implementing something generic, but right now I am not aware of any other spec parameters that will have to be changed, so it's definitely not hundreds of future elifs.
In addition, I am very much adept of the KISS principle, whenever possible and fit. To me, this overly complex code (personally I don't know any of those inspect methods) can easily lead to a bug, or a buggy development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaltaro , @anpicci , @todor-ivanov
Given that Python (or any programming language) offers multiple ways to achieve the same outcome, evaluating the best approach can often be subjective, depending on factors such as experience, proficiency, or personal style. Consequently, these discussions can become matters of personal preference.
For example, choosing between a full for-loop or list comprehension is a matter of choice. While we can debate which one might be "better" based on various criteria, both serve the same functional purpose. In this instance, I feel the this particular conversation is similar, and rather than debating stylistic choices, the focus should be on whether the code achieves its intended purpose. Otherwise, we risk undermining the role of both developer and reviewer by favoring one approach over another, which I find counterproductive.
If there are concerns about correctness or functionality, I would recommend incorporating unit tests for validation instead of continuing to focus on the specific implementation style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not coding style and personal preferences, but I see it as code readability and sustainability.
Anyhow, thank you for sharing your perspective on this matter. I won't bug people anymore on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan,
This:
try:
if 'RequestPriority' in reqArgs:
self.setPriority(reqArgs['RequestPriority'])
if 'SiteBlacklist' in reqArgs:
self.setSiteBlacklist(reqArgs['SiteBlacklist'])
if 'SiteWhitelist' in reqArgs:
self.setSiteWhitelist(reqArgs['SiteWhitelist'])
except Exception as ex:
msg = f"Failure to update workload parameters. Details: {str(ex)}"
raise WMWorkloadException(msg) from None
is simply not equal to this:
for reqArg, argValue in reqArgs.items():
if not self.settersMap.get(reqArg, None):
msg = f"Unsupported or missing setter method for updating reqArg: {reqArg}."
raise WMWorkloadException(msg)
try:
self.settersMap[reqArg].setterSignature.bind(argValue)
except TypeError as ex:
msg = f"Setter's method signature does not match the method calls we currently support: Error: req{str(ex)}"
raise WMWorkloadException(msg) from None
# Now go through the reqArg again and call every setter method according to the map
for reqArg, argValue in reqArgs.items():
try:
self.settersMap[reqArg].setterFunc(argValue)
except Exception as ex:
currFrame = inspect.currentframe()
argsInfo = inspect.getargvalues(currFrame)
argVals = {arg: argsInfo.locals.get(arg) for arg in argsInfo.args}
msg = f"Failure while calling setter method {self.settersMap[reqArg].setterFunc.__name__} "
msg += f"With arguments: {argVals}"
msg += f"Full exception string: {str(ex)}"
raise WMWorkloadException(msg) from None
At the end, in those both implementations, if the call to the setter method is wrong they will both fail the call, but the later is much more descriptive. It recognizes (at least 3) different possible conditions of failure and forwards the proper message for later debugging, while the former just fails the call and masks all the relevant information for the one who is to chase eventual problems. And that was the exact reason why I went for this implementation. I know you'd always prefer the shorter version, but sometimes it hides/masks valuable information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something more, since this is supposed to be a method triggered by a user's call rather than an operation triggered by the internals of our system, the later method inspects the stack and gives you the values of all parameters in the call. Meaning it shows you the exact (eventually) user mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about inspection and validation. Validation of the user parameters is not supposed to happen at this layer (at the setter call). Validation should be performed in the Service/Request class (which is likely calling the Validation utils).
With that said, I just noticed that we are not validating the site lists, as we usually do for workflow assignment.
So we should add that validation in here as well to protect the system and us from unneded debugging. For reference, this is how we define the site lists and the validation function:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/StdBase.py#L1147
3ea670b
to
099da8b
Compare
Jenkins results:
|
Jenkins results:
|
hi @amaltaro I have addressed your comments - Mostly, I removed the WQE status filter, based on the investigation I did on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, please find more comments along the code.
Once we are about to converge on this development, I think it is important to test it in a Dev environment (both for central services and WMAgent).
# call the proper setter methods bellow. | ||
|
||
# populate the current instance settersMap | ||
self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the rationale for implementing something generic, but right now I am not aware of any other spec parameters that will have to be changed, so it's definitely not hundreds of future elifs.
In addition, I am very much adept of the KISS principle, whenever possible and fit. To me, this overly complex code (personally I don't know any of those inspect methods) can easily lead to a bug, or a buggy development.
|
||
# Update all WorkQueue elements with the parameters provided in a single push | ||
if elementsToUpdate: | ||
self.updateElements(*elementsToUpdate, **updateParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've seen you answering my question about these updateParams
.
Are we completely 100% sure that the only parameters that will reach this update are one and/or a combination of:
- RequestPriority
- SiteWhitelist
- SiteBlacklist
?
Of course, also considering that we only want to update these values when there is an actual update to the value.
In addition, I would like you to test this in WMAgent with a crafted workflow/scenario, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've seen you answering my question about these updateParams
Yes the answer is the second one in line, here from this comment: #12099 (comment)
In addition, I would like you to test this in WMAgent with a crafted workflow/scenario, please.
I had it tested. With a full validation campaign, not a single workflow, in my dev cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find an exact answer to the question above.
Additionally, the code you are deleting had an else statement populating a list of not handled arguments:
https://github.com/dmwm/WMCore/pull/12099/files#diff-120ee6838284a3d1c1799f511da7f147179d0a955f87d0da6fc8b58a8b66c794L440
which makes me believe that it can receive parameters other than only those 3.
About the validation, to properly validate these changes, we need to actually trigger specific scenarios and actions. The standard "campaign-style" validation is not going to expose any issues with this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov @amaltaro I don't know if you have already converged on a common ground, but here is my opinion about map vs if-elif-else approach.
Considering that I am not as experienced as both of you with the WMCore software, to me the map approach looks clearer and more understandable than the if-elif-else approach. Indeed, I can read which are the parameters of interest at the very beginning, and from these lines I can see that only one parameter among RequestPriority, SiteWhitelist, SiteBlacklist can be modified. The all process doesn't require more intellectual effort than the previous if-elif-else approach, and in addition to me sounds more robust and easier to debug (where with "debug" I am also referring to "actually fix a potential issue").
Regarding the possible concerns:
I believe I've seen you answering my question about these
updateParams
. Are we completely 100% sure that the only parameters that will reach this update are one and/or a combination of:
* RequestPriority
* SiteWhitelist
* SiteBlacklist
?
Of course, also considering that we only want to update these values when there is an actual update to the value.
Given the original code, we can assume that this is a complete list of parameters that are supposed to reach this update, provided that what we want to get with this PR doesn't require other parameters to be properly updated
Additionally, the code you are deleting had an else statement populating a list of not handled arguments:
https://github.com/dmwm/WMCore/pull/12099/files#diff-120ee6838284a3d1c1799f511da7f147179d0a955f87d0da6fc8b58a8b66c794L440
which makes me believe that it can receive parameters other than only those 3.
This occurrence should be handled by the try-except in these lines, right @todor-ivanov? @amaltaro do you have any concern that this approach could fail to prevent updating parameters other than RequestPriority, SiteWhitelist, and SiteBlacklist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anpicci I think you meant to send this reply in this thread instead https://github.com/dmwm/WMCore/pull/12099/files#r1766081573 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer again the original question here:
With the latest changes in this commit: dfbda2a it is now 100% sure that if anything but a supported argument update reaches that point, we will raise the proper exception and it will be handled accordingly in the caller method
'reduce': False}) | ||
|
||
# Fetch only a list of WorkQueue element Ids && Filter them by allowed status | ||
if status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we are not breaking the current behavior of RequestPriority update.
However, thinking twice about this, if we want to make these updates more efficient, we could update it only for elements in Available, Negotiating and Acquired, from the list of potential statuses for WQE:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WorkQueue/DataStructs/WorkQueueElement.py#L14
If you change it, could you please also update the Issue description (because it says only Available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Jenkins results:
|
Can one of the admins verify this patch? |
@todor-ivanov apart from my comment to @amaltaro 's review, I only suggest to resolve the conflicts on |
@vkuznet thank you for the feedback.
I want to clarify that I would like to adopt these accountability principles to every issue, meaning that this is not a special treatment outlined only for this PR. I will follow up on this during the next group meeting. Thanks! |
1f78a64
to
b05def4
Compare
Jenkins results:
|
Jenkins results:
|
It looks like I forgot to update this PR with some previous reviews that have been done here and there, so here we go. There are some tasks pending from the other related PR, which have been described/discussed in this thread: |
Jenkins results:
|
Jenkins results:
|
Hi @amaltaro @vkuznet @anpicci So here is a quick summary:
|
a44e47c
to
9c0a81d
Compare
Jenkins results:
|
Jenkins results:
|
|
||
# Commit the changes of the current workload object to the database: | ||
workload.saveCouchUrl(workload.specUrl()) | ||
|
||
# Commit all Global WorkQueue changes per workflow in a single go: | ||
self.gq_service.updateElementsByWorkflow(workload.name(), reqArgs, status=['Available', 'Negotiating', 'Acquired']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @amaltaro the line that I was talking about during the WMCore meeting is this one. It triggers a second call to workload.updateWorkloadArgs
internally, similarly to what is done on line: 431 at WMcore.ReqMgr.Service.Request
from the current PR.
The internal call is happening at line 316 at WMCore.Services.WorkQueue
again from the very same PR.
You have asked me to implement all the changes to both reqMgr
(in this case through the workload
object) and all WQE
s in a single push. It is now possible. All that needs to happen, is to substitute the first call from above with the second call from the current line. I have the code change prepared. Just let me know if I should proceed with it or we should play it safe and we still do the action in two steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve reviewed the code provided in this PR and, from a coding perspective, I don’t have any specific suggestions for improvement. I noticed the ongoing discussion between Alan and Todor on various related topics; however, I believe my input may not add significant value to that part of the conversation.
From the standpoint of pure code review, the implementation looks good and is ready to be merged. I don’t have a strong preference between Todor's map/namedtuple/setter approach or Alan's if/else flow, as I view this as a matter of personal or team preference.
Jenkins results:
|
hi @amaltaro [1]
[2]
|
@todor-ivanov may you ping us when you will provide the fix to the latest issue? |
hi @anpicci
Yes, I will. Most probably after the training session today. |
hi @anpicci @amaltaro here is the solution promised: Skip validating update stat arguments |
Jenkins results:
|
@amaltaro I do not know if you have started looking into this PR, but I need your opinion at least on: #12099 (comment) So far the code works just as expected with the fix for stat update arguments. I've tested it. On top of it here is the state of one such WQE:
Shortly speaking both ReqMgr and the WorkQueue elements have been updated after the arguments have been properly validated not only for format but also for values type and content. the validation process concerns only no status update actions. I am still keeping all those commits unsquashed, only because I am waiting for your answer to the above question and to preserve the different steps of the solution to be visible during the review process. So if you think no further optimisations are needed I am ready to squash them and get it ready for merge. If you think I should push further for optimizing these calls to make them in one go I am ready do that as well. |
Jenkins results:
|
Fix bad updateElementsByWorkflow argument
c83b52f
to
93d0c3b
Compare
And to put it: #12099 (comment) in perspective, this is what I am talking about: Update workload args only once And here are the WQE updated:
So it works! @amaltaro - your turn |
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@todor-ivanov these changes are looking good in general. I left a few comments along the code that would be better to get them addressed though.
@@ -413,6 +414,8 @@ def _handleNoStatusUpdate(self, workload, request_args, dn): | |||
request_args will be ignored. | |||
""" | |||
reqArgs = deepcopy(request_args) | |||
reqStatus = workload.getStatus() | |||
cherrypy.log(f"CurrentRequest status: {reqStatus}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this log record should be removed before this PR gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I just noticed, by moving it few lines above we are now missing the requstName in the information deliverred with that message - before it was coming from previous messages, ut now wince it sits infornt of all of them I'd rather evolve it to the normal introductory message signaling how we enter the _handleNoStatusUpdate
method.
if reqArgsNothandled: | ||
try: | ||
# Commit all Global WorkQueue changes per workflow in a single go: | ||
self.gq_service.updateElementsByWorkflow(workload, reqArgs, status=['Available', 'Negotiating', 'Acquired']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a deep inspection of the code, this looks incorrect. Why are we trying to execute updateElementsByWorkflow()
even before some of the methods in the except block (e.g. _handleAssignmentStateTransition()
and/or claiming that there might be "unhandled arguments")?
Call to updateElementsByWorkflow()
should only be made if we are certain that WQEs need to be updated. As global workqueue can be a scale-limiting component of the system and we should use it wisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WMWorkloadUnhandledException
is telling you if there have been any unhandled arguments. It is thrown when we call workload.updateWorkloadArgs
from within updateElementsByWorkflow
- here: https://github.com/dmwm/WMCore/pull/12099/files#diff-d758570ce7f6baddaeefb0bdb4e2015b06e5b7772bf929248b686adc64286233R316
And I did it so because of your request during our discussion of the issue to have the workflow
and the workqueue elements
being updated in a single push.
It was not like that just until last night. And I've asked you at least twice ... trying to make all the needed pointers in my comments here: #12099 (comment) and here: #12099 (comment) before I proceed with unifying those two steps with this commit: 93d0c3b (hence why I still keep all the commits not squashed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I do not see how would we recognize if there have been any workqueue elements to be updated until we query the the workqueue and see if there are any. It have been sequential step and a mandatory one in this method even before I make it all in one go. We were still always going through it, as long as we were called.
Usually the parameter reqArgsDiff
was the one that would have told us if there was anything to be done at this stage within this method, but I remember in the past you asked me to avoid calling the validation twice at the head of this method here. But regardless... , in all the cases of a call to update a request with no RequestStatus
update we still always return the reqArgsDiff
from here:
return workload, reqArgsDiff |
request_args
and then any actions should have been halted at the very top of this method here: WMCore/src/python/WMCore/ReqMgr/Service/Request.py
Lines 417 to 419 in 44d2a8a
if not reqArgs: | |
cherrypy.log(f"Nothing to be changed at this stage for {workload.name()}") | |
return 'OK' |
So if there was nothing to be updated in the workflow itself - ergo in its WQE
elements of the given status, we should never reach that point in the code. But if there have been any change with the workflow arguments instead, we will have to proceed with updating the WQE
elements as well.
On top of that we update only the ones in the selected statuses, rather than all of them. I think we are safe in what we are doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these changes are now being applied on top of your previous incomplete changes, it is a bit harder to see how things were implemented.
I don't really understand why we call _handleAssignmentStateTransition()
from this _handleNoStatusUpdate()
method, as it says that there is not supposed to be any status transition. Can you please explain that?
In addition, I believe that this call updateElementsByWorkflow()
could become not super cheap (few secs?) if workqueue is loaded with documents. So, IMO we should only call it if really necessary. To put it in a different way, a request getting assigned (hence in assignment-approved
) does not have workqueue elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why we call _handleAssignmentStateTransition() from this _handleNoStatusUpdate() method, as it says that there is not supposed to be any status transition. Can you please explain that?
Because, if you have called the _handleNoStatusUpdate
for a workflow that is in assignment_approved
then you will get with the request all those full set of parameters that you get during the assignment_approved
state transition. Which will blow up, because we decided to still support a limited set of parameters for no status update - only SiteLists
related and RequestPriority
(which was yet again a wrong decision). This is the root cause of the BUG you reported in the first implementation an did not accept the code back then (even though I have provided this same solution on the very next day). So this was the solution - the way out of the situation was - in order to avoid rewriting all which is already there in _handleAssignmentStateTRansition
and assimilating it in workload.updateWorkloadArgs
and since the _handleAssignmentStateTRansition
method has nothing to do with the state transition action itself.... it was safe enough to call it directly here. If you want, I will take this method and will rewrite it under the context of workload.updateWorkloadArgs
. I do not mind. Actually I think this is the proper way to do it. And I've expressed that in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what is the relationship of a workflow in assignment_approved
with updateElementsByWorkflow()
? None, the workflow has not been assigned yet. Calling workqueue is a waste of resources.
If we had an issue (maybe that was with ACDC), making a call to WorkQueue won't resolve that issue. This implementation is wrong. If you want to keep it, keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alan, make up your mind.....
you asked me to reduce the calls to worload and push it all in one go. I twisted my mind to implement it. Before I did it I asked for your feedback twice.... now you tell me this is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to have it all correct we must stop putting exrta background knowledge in the algorithms we follow ... and tie ourselves in a figure 8
knot of conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid confusion and blame .... Please put the correct set of conditions here in simple English under which the work queue elements update must be called. And I will implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over Zoom:
_handleAssignmentStateTransition
is not performing any status transition;- and the exception
WMWorkloadUnhandledException
is not meant to be raised byupdateElementsByWorkflow()
, but by its internal callworkload.updateWorkloadArgs(updateParams)
, if needed.
Thank you for explaining this Todor. I have no more concerns on this code.
@@ -284,6 +284,41 @@ def updatePriority(self, wf, priority): | |||
wmspec.saveCouch(self.hostWithAuth, self.db.name, dummy_values) | |||
return | |||
|
|||
def updateElementsByWorkflow(self, workload, updateParams, status=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having two different methods for the same thing, I would suggest to converge on only one implementation for this.
Valentin already provided this site specific method: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/WorkQueue/WorkQueue.py#L240
So I would suggest to either refactor this one and change to make sure it works with WMAgent WorkflowComponent, or adapt this PR to use the method already provided by Valentin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Valentin, when you make that pull request, please refer it to the actual GH issue then, which I think it is #12039
@@ -68,6 +81,57 @@ class WMWorkloadHelper(PersistencyHelper): | |||
|
|||
def __init__(self, wmWorkload=None): | |||
self.data = wmWorkload | |||
self.settersMap = {} | |||
|
|||
def updateWorkloadArgs(self, reqArgs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need unit tests for this method.
@@ -1176,6 +1240,25 @@ def getDbsUrl(self): | |||
|
|||
return getattr(self.data.request.schema, "DbsUrl") | |||
|
|||
def setStatus(self, status): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need unit tests for this method as well
self.data.request.status = status | ||
return | ||
|
||
def getStatus(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need unit tests for this method as well
@@ -1971,6 +2054,26 @@ def validateArgumentForAssignment(self, schema): | |||
validateArgumentsUpdate(schema, argumentDefinition) | |||
return | |||
|
|||
def validateArgumentsPartialUpdate(self, schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a good idea to write tests too as well.
Jenkins results:
|
Jenkins results:
|
Fixes #12038
Status
Ready
Description
With the current PR we add the new functionality to call all needed WMWorkload setter methods based on a full
reqArgs
dictionary passed from the upper level caller and try to set all of parameters at once, rather than calling every single setter method one by one. This is achieved by creating a proper map between methods and possible request arguments. And later validating if the set of arguments passed withreqArgs
dictionary would properly match the signature of the setter method which is to be called. In the current implementation only a small set of methods is mapped to request parameters :(Mostly single argument parametrized methods, but that's ok for the time being, because those cover perfectly the functionality we need to plug in here)
With this change a path for updating all possible workqueue elements in a single go was opened. In the
WorkQueue
service an additional method was developed for fetching all possible workqueue elements and update them all with the full set of arguments provided through thereqArgs
dictionary with the cost of a single database action per WorkQuee element, rather than 3 (or more) separate calls to the database for updating every single element parameter separately. Upon updating all workqueue elements with the new parameters the WMSpec copy of the given workflow at the workqueue is also updated in a single push using the same logic from above.Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
None