-
Notifications
You must be signed in to change notification settings - Fork 107
ReqMgr2 MicroService Monitor
This MicroService is supposed to monitor the input data placement made by the Transferor MicroService, and depending on the transfer status and its completion, it has to make a request status transition (to staged
), which would then allow Global WorkQueue to fetch those requests and proceed with the chunk of work and workqueue elements creation.
This module is - at the moment - set as a thread of the Unified/Transferor MicroService. In the future we could allocate it to its own process and increase the overall performance of the service.
Now talking about tasks and steps that this Monitor MS has to take, we can order them as (still not a very exhaustive and detailed description!!!):
- fetch all the workflows in the
staging
status in ReqMgr2 - fetch all the workflow transfer documents (in bulk) from ReqMgr AuxDB
- filter transfer documents according to the list of workflows in the
staging
status - and filter them (the ones already filtered out) once again according to the last time they were looked at (so looking at the
timestamp
/lastUpdate
value). If the lastUpdate was smaller than X hours (let's call it 6h for now), then we skip that transfer document, otherwise we add it to the list of transfers to be updated.
- fetch all the campaign documents (in bulk) from ReqMgr AuxDB
- using the transfer ids available in the transfer document (under the
transfers
key), make calls to PhEDEx/Rucio in order to get the status of those transfers; then calculate the transfer status completion. There are a few possible cases here:
- IF there are no transfers under the workflow transfer doc - thus no input data at all - move the workflow to
staged
- IF ALL transfers type (primary, secondary, etc) are above the minimum completion thresholds - as defined in the campaign - then a) update the transfer doc and b) update the request status to
staged
- IF NOT ALL transfers are above the minimum completion thresholds, then update the transfer doc with the new completion value and the
lastUpdate
value; then proceed to the next workflow.
This algorithm is supposed to run every 15min or so. We might also want to limit the amount of workflows considered in every cycle.
Open questions
Do we want to monitor the subscriptions and act upon issues and/or stuck transfers? Or we just assume transfers will eventually succeed? Alerts have to be created for bad input placement (bad transfers) as well.