-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New database tables for auction and solver competition #2980
Conversation
Reminder: Please update the DB Readme. Caused by: |
I left a comment on some google doc, but it seems to be relevant here as well. The discussion was around redesigning the settlement_scores table to compute rewards:
|
-- Not NULL for winning orders. | ||
deadline bigint, | ||
|
||
-- Order details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these needed? Given we store trades for all orders (jit included), do we need to store this information? Also, I think the order uid commits to those values (so you wouldn't be able to change the limit price for a fixed order uid and it can just be read from the settlement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to store this information? I think the order uid commits to those values
Ah I see, yes, order_uid
is actually a checksum for OrderData
. If the surplus capturing surplus JIT order is promised, there are only three outcomes in circuit breaker:
- Solution not delivered at all - violation.
- Solution delivered but it doesn't contain promised order_uid of JIT order - violation
- Solution delivered with the same order uid - then just compare the executed amounts to check if the prices are >= promised.
@fhenneke will circuit breaker require to calculate the score of promised (non-winning) solutions (for reference scores for example)? If not, then we can remove this order data and get them from orders
table and jit_orders
table if needed. But note the comment 👇 that I would rather not save the scores themselves but instead I would save the input for calculating score or whichever scoring criteria we use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after meeting with @fhenneke we concluded that we need this data for all proposed solutions (which include surplus capturing JIT orders and potentially regular JIT orders in the future) and also it makes sense to store this data to show it on the solver competition endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regular orders this data is already available in a different table (so basic SQL normalisation theory says we shouldn't duplicate). For jit orders this may not be the case. Can you explain a bit more detailed why we need this information for jit orders even if there ends up not being a settlement observation?
Note that the current competition endoint exposes order ids and executed amounts, nothing more: https://api.cow.fi/mainnet/api/v1/solver_competition/by_tx_hash/0xe987ca2672c8330398750c73e38ed6375c3e18b29172b806cfc2d66f33eaaf0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more detailed why we need this information for jit orders even if there ends up not being a settlement observation?
If we have two solutions, and one of them is declared a winner and allowed to settle, and IF the rewards scheme stays the same (difference between winning score and reference score), then we need all data of a second best solution (which might contain surplus capturing JIT order) available at postprocessing time, so that we could calculate the score of it to use it as a reference score.
And all of this stands because we don't want to save score as a field into database. We want to save solutions (input for ranking and everything else) and not scores (output of ranking) into db, for greater flexibility in the future. By flexibility I mean being possible to change the scoring rules of the protocol without changing the db scheme.
I think it is good to have more data available on orders to compute surplus and trade directions. One thing which would need to be added is scores of all solutions. Some of the reward mechanisms might even require scores per order. We can either store scores directly or make enough information available to compute scores ourselves. |
-- The block number until which the order should be settled. | ||
-- Not NULL for winning orders. | ||
deadline bigint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deadline
is a property of the auction
and not the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deadline is also optional to indicate if a solution is chosen as a winner or not.
Would you rather have:
- Deadline as property of auction and non-optional. Then, another property
winner: bool
on each solution to indicate if winner. - Deadline as a property of solution and optional so that it indicates if winner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have less implicit state so deadline
as required on auction
.
This could then lead to a separate table:
solutions
- id (populated by sequence)
- auction_id
- solver
- solution_id
- is_winner
And then auction_solution_orders
(I prefer the name proposed_order_executions
) could reference the solutions.id
.
Not sure if this would actually result in good queries since a shared key of multiple properties might be easier to use than solutions.id
but that would at least have some logical consistency where:
- auction has solutions
- solution contains orders
OTOH do we need to store who the winner is? Assuming there is no bug we should be able to reconstruct who (should have) won with all the data we have, no? Just a conceptual question as it probably doesn't make sense to cheap out on a bool here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming there is no bug we should be able to reconstruct who (should have) won with all the data we have, no?
The criteria might change even on each restart of the autopilot (for example, if we enable/disable multiple winners feature several times). In this case we would need to know who was supposed to settle solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored the tables before you posted a comment. Can you check if it's more acceptable now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The criteria might change even on each restart of the autopilot (for example, if we enable/disable multiple winners feature several times). In this case we would need to know who was supposed to settle solutions.
Alternative to this is to also save the information "which type of competition" was executed for each auction. With this, we would have an input and would be able to determine the winners so "is_winner" would not be needed.
I'd go with more general approach of storing enough information to compute whatever scoring criteria we use.
We have that in
We WILL have that in |
-- Not NULL for winning orders. | ||
deadline bigint, | ||
|
||
-- Order details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regular orders this data is already available in a different table (so basic SQL normalisation theory says we shouldn't duplicate). For jit orders this may not be the case. Can you explain a bit more detailed why we need this information for jit orders even if there ends up not being a settlement observation?
Note that the current competition endoint exposes order ids and executed amounts, nothing more: https://api.cow.fi/mainnet/api/v1/solver_competition/by_tx_hash/0xe987ca2672c8330398750c73e38ed6375c3e18b29172b806cfc2d66f33eaaf0d
# Description Currently fee policies are saved only for winning solution. This PR saves fee policies for all auction orders. This is needed for at least two reasons: 1. As discussed [in the PR](#2980 (comment)), fee policies will be needed for all proposed solutions during a competition so that the score could be reconstructed in circuit breaker. 2. [For historical get_auction](#2844).
This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed. |
pub uid: i64, | ||
// Id as reported by the solver (solvers are unaware of how other solvers are numbering their | ||
// solutions) | ||
pub id: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id is a string on the API level. It just happens to be that all solvers currently report a number.
It's probably okay to make this an integer on the API level but that has to be adjusted and communicated first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
is not supposed to replace solver name on the solver_competition API. It's here for completness of saving the whole solution object but not necessary for functionality to work. We can remove it as well if we are sure we won't need it.
I think we agreed somewhere that solver name is something we don't care too much about. We have a solver address which is supposed to uniquely identify the solver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the solver name has to do with this. This id
is the id
that solvers return for each individual solution, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id is a string on the API level
I thought you referred to solver name here.
This id is the id that solvers return for each individual solution, right?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then the point still stands that the ID is currently a string on the API level and only an integer by convention. If we want to store it in the DB I think we should make sure the data types align and make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Will add an issue to switch ID to being u64 as it used that way in both driver and autopilot domains.
@@ -0,0 +1,66 @@ | |||
-- All auctions ran by autopilot | |||
CREATE TABLE competition_auctions ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to avoid these composite table names if possible as I think they mostly cause confusion.
I'd say this should be called auctions
and the current auctions table would become current_auction
(singular as it's supposed to only store a single row at all times).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid touching existing code with this PR. Table renaming is particularly risky, and even though I initially wanted to do renaming, I went with defining a new name after all.
And if you assume that, it's really hard to figure out a new name for this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then at least make sure the tables are properly renamed when we finalize this refactor and remove the old tables.
@@ -0,0 +1,66 @@ | |||
-- All auctions ran by autopilot | |||
CREATE TABLE competition_auctions ( | |||
id bigint PRIMARY KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are supposed to use identity columns to have the DB automatically generate these unique values for us.
id bigint PRIMARY KEY, | |
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it looks neat to use DB generated ID, but why would we use lock us in with using it?
Besides, right now auctions
and competition_auctions
need to be aligned with ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, right now auctions and competition_auctions need to be aligned with ids.
Sorry, the comment was supposed to be onproposed_solutions.id
.
I understand it looks neat to use DB generated ID, but why would we use lock us in with using it?
For Ids that have no other purpose than being unique and identifying rows I think it makes the most sense to let the DB make sure that things are unique instead of relying on domain code that can have bugs in that regard. Since we don't expect any additional information in the ID any value is as good as any other value as long as it's unique so why should we bother with maintaining that uniqueness ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Ids that have no other purpose than being unique and identifying rows
But this is not actually true in this case right? Auction id is read directly by client and used to fetch data from other tables etc. It's not like it's inserted only once and never used by client but only by database internally to join on other tables etc.
But anyway, in this case we have to go with client defined Ids because of:
right now auctions and competition_auctions need to be aligned with ids.
.enumerate() | ||
.map(|(uid, participant)| { | ||
let solution = Solution { | ||
uid: uid.try_into().context("uid overflow")?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The db docs made it seem like this uid
is supposed to be globally unique, which I think is a nicer property than just having it be the index within one auction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique id of the proposed solution within a single auction
this is in the docs
Global uniqueness is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current state, I don't see any blockers.
ON CONFLICT (auction_id, solution_uid, order_uid) DO NOTHING | ||
"#; | ||
|
||
sqlx::query(QUERY_JIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this in one roundtrip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
God is my witness I tried but couldn't come up with a query that would properly handle WHERE NOT EXISTS
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all my comments got addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another round, LGTM! nice PR!
Description
Fixes #2979
Fixes #3021
Database changes internal discussion: https://www.notion.so/cownation/Database-26th-September-2024-10d8da5f04ca801ab087f00f6a6d608f
@fhenneke would this be appropriate design change for you?
Update 02 Oct 2024
This PR proposes new tables that should eventually replace
solver_competitions
table and also give enough information for core protocol and external tools to reconstruct the auction and competition for historical entries.Plan to execute:
services
repo. From now on, old tables are no longer used by backend.Changes