status | flip | authors | sponsor | updated |
---|---|---|---|---|
implemented |
204 (set to the issue number) |
Jordan Schalm ([email protected]) |
Jordan Schalm ([email protected]) |
2023-10-03 |
- Increase robustness of Cruise Control System
- Create an explicit target time for epoch switchover, defined by the service account
Cruise Control: Automated Block Rate & Epoch Timing (Design) defines an existing system for controlling system block production to achieve a target block rate, in turn to achieve a target epoch switchover time. This system has been deployed on Mainnet since May.
At the time of writing, the target epoch switchover time is inferred based on a baked-in assumption of week-long epochs, and a configurable weekly switchover time. Therefore, each node’s Process Variable (switchover time) is determined by a heuristic, which has several downsides:
- In extreme edge cases (timing off by several days), different nodes may disagree about the target Process Variable value.
- Networks with different-length epochs (Canary, Testnet) can not use Cruise Control at all.
- Chain-queriable target epoch switchover time
- Increased robustness of Cruise Control System (reliable and consistent epoch and block timing)
- The
FlowEpoch
smart contract determines and broadcasts aTargetEndTime
for each epoch, within theEpochSetup
event.- (For informational purposes, we will also include this time in the
EpochStart
event)
- (For informational purposes, we will also include this time in the
- The
cruisectl.BlockTimeController
component reads thisTargetEndTime
and uses it as the Process Variable value for its PID controller, rather than the current heuristic method.
Below are two options for how to configure and compute the TargetEndTime
. Overall the author is in favour of Option 2.
The configuration consists only of the epoch duration. Each epoch’s TargetEndTime
is computed based on a reference time/view pair obtained via the getBlock
API.
pub struct EpochTimingConfig {
duration: UInt64 // in seconds
}
// Compute the target switchover time based on the current time/view.
// Invoked when transitioning into the EpochSetup phase.
pub fun getTargetEndTimeForEpoch(
currentBlock: Block,
epoch: EpochMetadata,
config: EpochTimingConfig
): UInt64 {
let now = currentBlock.timestamp
let viewsToEpochEnd = epoch.finalView - currentBlock.view
let estSecondsToNextEpochEnd = UFix64(viewsToEpochEnd) / UFix64(epoch.lengthInViews) * config.duration
return UInt64(estSecondsToNextEpochEnd)
}
// Memorize the end time of each epoch.
// Invoked when transitioning into a new epoch.
pub fun memorizeEpochEndTime(currentBlock: Block, epoch: EpochMetadata) {
epoch.endedAt = currentBlock.timestamp
}
// Compute the switchover time based on the last memorized reference timestamp.
pub fun getTargetEndTimeForEpoch(
refEpoch: EpochMetadata,
targetEpochCounter: UInt64,
config: EpochTimingConfig
): UInt64 {
return refEpoch.endedAt + config.duration * (targetEpochCounter - refEpoch.counter)
}
- Simpler configuration
- Does not require manual config changes to account for a durable switchover timing change.
- Drift can accumulate over time
- Approach 1.2 Does not work well with
resetEpoch
process, as that involves an epoch transition at a non-target time - Depends on block time API
- Approach 2 requires additional storage/logic in smart contract changes
The configuration consists of the epoch duration and a reference counter/end-time pair. Each epoch’s TargetEndTime
is computed solely based on the target epoch’s counter, the reference counter/end-time pair, and the duration.
pub struct EpochTimingConfig {
duration: UInt64 // in seconds
refCounter: UInt64 // the counter of a reference epoch
refTimestamp: UInt64 // the UNIX timestamp (UTC) at which refCounter ended
}
// Compute target switchover time based on offset from reference counter/switchover.
pub fun getTargetEndTimeForEpoch(
targetEpochCounter: UInt64,
config: EpochTimingConfig
): UInt64 {
return config.refTimestamp + config.duration * (targetEpochCounter-refCounter)
}
- Simple computation
- Drift cannot accumulate over time
- Does not use block time API
- Compatible with
resetEpoch
process
- More complex configuration specification
- Requires manual config changes for durable switchover time changes
See also ZenHub epic.
- Add
targetEndTime
field toEpochSetup
event,EpochStart
event- CAUTION: This must be added as the last field to maintain backward-compatibility
- NOTE: Unfortunately, since we cannot modify existing structs, we cannot add this field to
EpochMetadata
- Add config for determining
targetEndTime
to smart contractConfigMetadata
- Add logic to compute
targetEndTime
tostartEpochSetup
- Add function for service account to adjust new config
- Testing
- Validate field is set as expected
- Validate field is computed correctly
- Validate setter/getter for new config values
- Add
TargetEndTime
field toEpochSetup
event,Epoch
API - Update
EpochSetup
service event conversion function- Read
TargetEndTime
field - Ensure conversion is backward-compatible
- Read
- Update
cruisectl.BlockRateController
- Remove
EpochTransitionTime
inference heuristic - Replace
EpochTransitionTime
withtime.Duration
, retrieved fromEpochSetup
event
- Remove
- Add mechanism to set
TargetEndTime
in bootstrapping/sporking process- Comment: currently Cruise Control is disabled by default.
- Option 1: Add an optional flag to explicitly the desired epoch duration (seconds). We can compute a reference counter/timestamp.
- Option 2: Compute an initial
duration
config value, based on the committee size and epoch length and expected view rate.
- Update network instantiation
- Default value for deploying
FlowEpoch
- Default value for deploying
- Prepare patch which makes
EpochSetup
parsing forward-compatible (accept events with the extraTargetEndTime
field, but discard it). - Rolling upgrade all nodes to this patch version.
- Upgrade
FlowEpoch
contract.
- Deploy
flow-go
version implementing FLIP 204. - Since we have already upgraded the contract, no backward-compatibility in
flow-go
code is required (besides the patch above, which will remain on themainnet24
branch)
This proposal assumes that the service account is more reliable than the heuristic currently in use for determining target epoch switchover times. Many more important system functions already depend on correct operation of the service account. However, Cruise Control will become susceptible to faults in the service account's defined switchover time, rather than faults in the current heuristic.
See 2 options above.
None anticipated.
There are internal dependencies that will need to be updated in lockstep as part of this FLIP. No external dependencies.
- Do you expect changes to binary size / build time / test times?
- Who will maintain this code? Is this code in its own buildable unit? Can this code be tested in its own? Is visibility suitably restricted to only a small API surface for others to use?
N/A.
N/A.
The change will not break compatibility.
N/A.
N/A.
N/A.
- Do you prefer Option 1 or Option 2?
- Do you foresee any significant hurdles beyond those outlined in the Implementation Plan?
In Option 1.1 and 2, the timing of a particular epoch transition does not affect the target timing for other epochs. Therefore, the TargetEndTime
computation of an epoch during a spork will not behave differently from any other epoch.
The time information is specified in the EpochSetup
event, which occurs partway through the current epoch. If we specified a TargetStartTime
, then the PID Controller’s Process Variable would have an undefined value for part of the epoch, and the Cruise Control system would be unable to function. On Mainnet, this corresponds to about 90% of the duration of an epoch.