-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run loop additional metrics #2888
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some important metrics are missing imo:
- Time before competition is started
- Roundtrip time for reveal
- Time for all the persistence work we do before calling settle (maybe extract into onw method to keep single_run smaller)
- Roundtrip time for settle
Once all those metrics are in place, can you add them to the GPv2 Grafana dashboard or create a new dashboard specifically for the autopilot runloop performance?
Also, I think it would be useful to measure the time compared to the current block's timestamp (so that we can also measure the delay with which our run loop starts). So the stats we should have is:
|
# Conflicts: # crates/autopilot/src/solvable_orders.rs
Updated everything including the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be a separate PR, but I'd still be able in measuring our delay with which we even start a single run compared to the timestamp on the block.
crates/autopilot/src/run_loop.rs
Outdated
@@ -316,6 +322,7 @@ impl RunLoop { | |||
Metrics::fee_policies_store_error(); | |||
tracing::warn!(?err, "failed to save fee policies"); | |||
} | |||
Metrics::competition_stored(start.elapsed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move everything between the first start and this into a prost_processing method to keep single_run a bit more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the actual code, not the metrics, right? I will open a separate PR since it would be easier to read the actual changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this also. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run loop instrumentation looks alright. But more metrics on the details of building the auction would be nice.
@@ -304,6 +308,9 @@ impl SolvableOrdersCache { | |||
}; | |||
|
|||
tracing::debug!(%block, "updated current auction cache"); | |||
self.metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GIven that we know that this takes a significant amount of time we could already add metrics for the individual stages of the auction building.
I assume most of the time will likely be spent on the DB query but there will probably also be outliers in the individual steps that need to be ironed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a HistogramVec for individual update stages except solvable order fetching since we already have a separate DB metric for this.
&self, | ||
auction_id: domain::auction::Id, | ||
auction: &domain::Auction, | ||
init_block_timestamp: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If this variable is only passed in to trigger the metric, we could probably also simply do it where we call single_run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fetch the latest block timestamp or what exactly? That would work as long as the auction update function takes less than 1 round. For arbitrum, this is not the case.
Description
Changes
Achieves the following:
single_run_delay
metric.auction_preprocessing_time
metric.solve
metrics(by calculating the MAX value among solvers for the last N seconds).reveal
metric wasn't used, so I updated its type toHistogram
.auction_postprocessing_time
metric.Now it records individual elapsed times and the existing queries should continue to work after appending
_sum
postfix to the metric name.Also, I added
single_run_time
andauction_update_time
metrics for better visibility to avoid accumulating all the values from different sources. So the total round trip could be calculated assingle_run_time
+auction_update_time
.I used separate Histogram metrics since that might be not really suitable to have all of them on a single panel due to different values(from milliseconds to 5-10 seconds or so).
How to test
The plan was to deploy a temp image on staging.
Related Issues
Fixes #2859