Skip to content
mushkevych edited this page Dec 3, 2011 · 22 revisions

Scheduler is state machine that works with concept of time-period.

Timeperiods

Timeperiod can be either:

  • hourly
  • daily
  • monthly
  • yearly

For example:

  • Statistics (number of page-views, number of unique visitors, etc) for particular domain a.com for period from 10:00:00 of 1 of Jan 2011 till 10:59:59 of 1 of Jan 2011 comprise hourly period of statistics.
  • Statistics aggregation from 00:00:00 of 1 of Jan 2011 till 23:59:59 of 1 of Jan 2011 comprise daily period statistics
  • Statistics aggregation from 00:00:00 of 1 of Jan 2011 till 23:59:59 of 31 of Jan 2011 comprise monthly period statistics
  • All-year statistics result in yearly period

Verticals

Scheduler organize timperiods in tree-like structures. root <- yearly periods <- monthly periods <- daily periods <- hourly periods

Each level of the tree can be considered as complete only if all nested timeperiods are in STATE_PROCESSED or STATE_SKIPPED states

For example: since daily period nests 24 hourly periods we need all of them to complete before we can declare daily period completed.

Trees can have following number of levels:

  • 4-level tree, hosts yearly, monthly, daily and hourly timeperiods
  • 3-level tree, hosts yearly, monthly and daily timeperiods
  • 2-level tree, hosts daily timeperiods only and virtual "root" level to keep tree-like structure

Tree presented above, presents a vertical. Term vertical underlines downwards dependency: yearly periods depends on monthly; monthly depends on the daily; daily on hourly.

Processes & Timperiods

Each level in the tree is suppose to be managed by a separate process/aggregator. For example: site hourly period statistics by "site_hourly_aggregator", site daily period statistics - by "site_daily_aggregator", etc.

Dependencies between verticals

It is common that verticals have dependencies. For example: to calculate Revenue Per Click, we need to have two numbers available: number of clicks from site vertical and revenue from financial vertical. Only by having them both in place we can compute Revenue Per Click.

Dependency is registered in Scheduler and are timeperiod qualifier dependent. So, daily timperiods from site vertical can be dependent on daily timeperiods from financial vertical. But apparently hourly timeperiods of traffic statistics can not block daily timeperiods from other vertical.

Dependencies can be either:

  • blocking when any processing of dependent timperiods is blocked until blocking timeperiods are processed. Separate use-case is when one of blocking timeperiod is in STATE_SKIPPED. In this case, dependent timeperiod is also moved to STATE_SKIPPED
  • regular dependency allows processing of the dependent timeperiod, however finalization of the dependent timeperiod is not allowed unless blocking timeperiod is in STATE_PROCESSED