-
Notifications
You must be signed in to change notification settings - Fork 4
Concept
Scheduler hosts multiple state machines that are managing jobs. Simply speaking every job run is tightly coupled with something called timeperiod. It represents a time window this job should cover, and is encoded in format: YYYYMMDDHH.
For instance timeperiod 2014011501 stands for 1 hour slice: from 01:00 (inclusive) to 02:00 (exclusive) of 15 of January 2014
Timeperiod could be either:
- hourly
- daily
- monthly
- yearly
For illustration purposes, let's assume that the Synergy Scheduler supervises a system that gathers and processes user's behaviour on a web site. In this context:
- An hourly timeperiod represents data gathered within one hour, such that period from 10:00:00 of 1 of Jan 2011 till 10:59:59 of 1 of Jan 2011 represents one hourly period.
Notation of this timeperiod is: 2011010110 - Data gathered from 00:00:00 of 1 of Jan 2011 till 23:59:59 of 1 of Jan 2011 represents daily period.
Notation of this timeperiod is: 2011010100 - Data gathered from 00:00:00 of 1 of Jan 2011 till 23:59:59 of 31 of Jan 2011 represents monthly period.
Notation of this timeperiod is: 2011010000 - All-year statistics result in a yearly period
Notation of this timeperiod is: 2011000000
Scheduler organize timperiods in tree-like structures.
root <- yearly periods <- monthly periods <- daily periods <- hourly periods
Each level of the tree can be considered as complete only if all nested timeperiods are in STATE_PROCESSED or STATE_SKIPPED states
For example: since daily period nests 24 hourly periods we need all of them to complete before daily period could be declared complete.
Trees can have following number of levels:
- 4-level tree, hosts yearly, monthly, daily and hourly timeperiods
- 3-level tree, hosts yearly, monthly and daily timeperiods
- 2-level tree, hosts timeperiods (either hourly, daily or monthly) and virtual "root" level to maintain tree-like structure
Trees above underline downwards dependency: yearly periods depend on monthly; monthly depends on daily; daily depends on hourly.
Each level in the tree is managed by a designated process/aggregator. For example: <site> hourly period statistics by "site_hourly_aggregator", <site> daily period statistics - by "site_daily_aggregator", etc.
It is common for trees to have dependencies.
For example: to calculate Revenue Per Click, we need two numbers: number_of_clicks from <site tree> and revenue from <financial tree>.
Both numbers are required to compute Revenue Per Click = number_of_clicks / revenue
. Thus, tree <financial post-processing> will depend on both <site tree> and <financial tree>.
Dependencies are registered in the context.py block defining the tree. They are time qualifier-dependent. Such that, daily timeperiods from <site tree> can be dependent on daily timeperiods from <financial tree>. Consequentially, hourly timeperiods of tree A can not block daily timeperiods from dependent tree B, as they belong to different time-aggregation groups.
Dependencies can be of following types:
-
type_blocking_dependencies any processing of dependent timeperiods is blocked until blocking timeperiods are processed.
Interesting use-case is when one of blocking timeperiods is in STATE_SKIPPED. In this case, dependent timeperiod is also moved to STATE_SKIPPED - type_blocking_children any processing of higher time granularity is blocked until all nested children timeperiods are processed.
- type_managed dependency allows processing of the dependent timeperiod, however finalization of the dependent timeperiod is not allowed unless blocking timeperiod is in STATE_PROCESSED