-
Notifications
You must be signed in to change notification settings - Fork 107
Implementation WMAgent Refactoring
- need a new work unit table to keep track the which lumis/events are successfully processed. (not considering optimization but minimal changes)
wmbs_work_unit
CREATE TABLE wmbs_workunit (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
taskid INTEGER NOT NULL,
fileid INTEGER NOT NULL, #(fake file for mc)
run INTEGER NOT NULL,
lumi INTEGER NOT NULL,
firstevent INTEGER NOT NULL,
lastevent INTEGER NOT NULL,
status INT(1) DEFAULT 0,
FOREIGN KEY (taskid)
REFERENCES wmbs_workflow(id) ON DELETE CASCADE)
fileid, run, lumi can be replaced by one id, if we add unique id in wmbs_file_runlumi_map table.
EWV: I think we would need 2-3 other fields here too. Retry count for the work unit, how many work units ended up in the last job to try this work unit (remember we want to try a work unit by itself before giving up completely), and perhaps a timestamp. But that might not be necessary if we have a timeout on the jobs themselves.
If one lumi can be spread out in multiple files, we need association table for work unit and wmbs_file_runlumi_map table.
-
above table need to be populated when fileset and subscription is created (before job splitting happens)
-
wmbs_job_mask table should be modified (or replaced) so it contains relationship between work unit and job id.
CREATE TABLE wmbs_job_workunit_assoc (
jobid INTEGER NOT NULL,
workunitid INTEGER NOT NULL,
FOREIGN KEY (jobid)
REFERENCES wmbs_job(id) ON DELETE CASCADE,
FOREIGN KEY (workunitid)
REFERENCES wmbs_workunit(id) ON DELETE CASCADE)
-
wmbs_job table contains 4 states (success, failure, partial_success, not_attempt) Not sure this is needed but maybe need for retrying logic. (in case total failure don't reshuffle the work unit, etc)
-
We might need the association between output file and wmbs_work_unit
- (Job splitting need to happen multiple times not just for initially over the input. To make this simpler, splitting happens over wmbs_work_unit not over files.
- (JobAccounter needs to update wmbs_work_unit status, also wmbs_work_unit and output file accociation)
- How to define retry rules (by work_unit or jobs)
- we might still able to track jobs which would contain work unit information.