Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New database tables for auction and solver competition #2980

Merged
merged 46 commits into from
Oct 17, 2024

Conversation

sunce86
Copy link
Contributor

@sunce86 sunce86 commented Sep 12, 2024

Description

Fixes #2979
Fixes #3021

Database changes internal discussion: https://www.notion.so/cownation/Database-26th-September-2024-10d8da5f04ca801ab087f00f6a6d608f

@fhenneke would this be appropriate design change for you?

Update 02 Oct 2024

  • This PR proposes new tables that should eventually replace solver_competitions table and also give enough information for core protocol and external tools to reconstruct the auction and competition for historical entries.

  • Plan to execute:

  1. Create new tables (implemented in this PR)
  2. Start populating new tables (implemented in this PR)
  3. Do a one time migration of data from existing tables (solver_competitions, settlement_scores etc) into new tables.
  4. Start using new tables instead of old tables in services repo. From now on, old tables are no longer used by backend.
  5. Give time to solver team, frontend team etc to switch to new tables
  6. Cleanup - remove one time migration code, remove old tables, remove code that was updating old tables etc.

Changes

  • Defines tables for auction and proposed solutions
  • Populates new tables from autopilot

@sunce86 sunce86 added the E:6.2 Time to Happy Moo See https://github.com/cowprotocol/pm/issues/77 for details label Sep 12, 2024
@sunce86 sunce86 self-assigned this Sep 12, 2024
Copy link

github-actions bot commented Sep 12, 2024

Reminder: Please update the DB Readme.


Caused by:

@fhenneke
Copy link

I left a comment on some google doc, but it seems to be relevant here as well. The discussion was around redesigning the settlement_scores table to compute rewards:

A competition table with the following columns would be compatible with all variants of the comb. auction mechanism we are currently thinking about:
auction_id, solution_id, solver_address, solution, deadline

where solution should be equivalent to (but need not be the same as) the first part of call data (i.e. tokens, prices, trades) with additional data in other tables (e.g. an auctions table with historic data) on native prices and protocol fees to reconstruct scores.

the deadline would set if the solution is selected as winning.
additionally there would be some indexing of solutions on chain (tx_hash, auction_id, solution_id). actual observations on fees or surplus are not required. checking that solutions are valid would be done by a circuit breaker.

Some example code we might base experiments on: https://github.com/fhenneke/comb_auctions

database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
-- Not NULL for winning orders.
deadline bigint,

-- Order details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these needed? Given we store trades for all orders (jit included), do we need to store this information? Also, I think the order uid commits to those values (so you wouldn't be able to change the limit price for a fixed order uid and it can just be read from the settlement).

Copy link
Contributor Author

@sunce86 sunce86 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to store this information? I think the order uid commits to those values

Ah I see, yes, order_uid is actually a checksum for OrderData. If the surplus capturing surplus JIT order is promised, there are only three outcomes in circuit breaker:

  1. Solution not delivered at all - violation.
  2. Solution delivered but it doesn't contain promised order_uid of JIT order - violation
  3. Solution delivered with the same order uid - then just compare the executed amounts to check if the prices are >= promised.

@fhenneke will circuit breaker require to calculate the score of promised (non-winning) solutions (for reference scores for example)? If not, then we can remove this order data and get them from orders table and jit_orders table if needed. But note the comment 👇 that I would rather not save the scores themselves but instead I would save the input for calculating score or whichever scoring criteria we use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after meeting with @fhenneke we concluded that we need this data for all proposed solutions (which include surplus capturing JIT orders and potentially regular JIT orders in the future) and also it makes sense to store this data to show it on the solver competition endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular orders this data is already available in a different table (so basic SQL normalisation theory says we shouldn't duplicate). For jit orders this may not be the case. Can you explain a bit more detailed why we need this information for jit orders even if there ends up not being a settlement observation?

Note that the current competition endoint exposes order ids and executed amounts, nothing more: https://api.cow.fi/mainnet/api/v1/solver_competition/by_tx_hash/0xe987ca2672c8330398750c73e38ed6375c3e18b29172b806cfc2d66f33eaaf0d

Copy link
Contributor Author

@sunce86 sunce86 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit more detailed why we need this information for jit orders even if there ends up not being a settlement observation?

If we have two solutions, and one of them is declared a winner and allowed to settle, and IF the rewards scheme stays the same (difference between winning score and reference score), then we need all data of a second best solution (which might contain surplus capturing JIT order) available at postprocessing time, so that we could calculate the score of it to use it as a reference score.

And all of this stands because we don't want to save score as a field into database. We want to save solutions (input for ranking and everything else) and not scores (output of ranking) into db, for greater flexibility in the future. By flexibility I mean being possible to change the scoring rules of the protocol without changing the db scheme.

@fhenneke
Copy link

I think it is good to have more data available on orders to compute surplus and trade directions.

One thing which would need to be added is scores of all solutions. Some of the reward mechanisms might even require scores per order.

We can either store scores directly or make enough information available to compute scores ourselves.
Data required to compute scores would include native prices for all surplus tokens (instead of just the tokens of the winner). For protocol fees, we would need fee policies for all orders with a proposed solution (instead of just protocol fees for executed solutions).

database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
Comment on lines 14 to 16
-- The block number until which the order should be settled.
-- Not NULL for winning orders.
deadline bigint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadline is a property of the auction and not the order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadline is also optional to indicate if a solution is chosen as a winner or not.

Would you rather have:

  1. Deadline as property of auction and non-optional. Then, another property winner: bool on each solution to indicate if winner.
  2. Deadline as a property of solution and optional so that it indicates if winner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have less implicit state so deadline as required on auction.
This could then lead to a separate table:

solutions
- id (populated by sequence)
- auction_id
- solver
- solution_id
- is_winner

And then auction_solution_orders (I prefer the name proposed_order_executions) could reference the solutions.id.
Not sure if this would actually result in good queries since a shared key of multiple properties might be easier to use than solutions.id but that would at least have some logical consistency where:

  • auction has solutions
  • solution contains orders

OTOH do we need to store who the winner is? Assuming there is no bug we should be able to reconstruct who (should have) won with all the data we have, no? Just a conceptual question as it probably doesn't make sense to cheap out on a bool here.

Copy link
Contributor Author

@sunce86 sunce86 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming there is no bug we should be able to reconstruct who (should have) won with all the data we have, no?

The criteria might change even on each restart of the autopilot (for example, if we enable/disable multiple winners feature several times). In this case we would need to know who was supposed to settle solutions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the tables before you posted a comment. Can you check if it's more acceptable now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The criteria might change even on each restart of the autopilot (for example, if we enable/disable multiple winners feature several times). In this case we would need to know who was supposed to settle solutions.

Alternative to this is to also save the information "which type of competition" was executed for each auction. With this, we would have an input and would be able to determine the winners so "is_winner" would not be needed.

database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
@sunce86
Copy link
Contributor Author

sunce86 commented Sep 17, 2024

I think it is good to have more data available on orders to compute surplus and trade directions.

One thing which would need to be added is scores of all solutions. Some of the reward mechanisms might even require scores per order.

We can either store scores directly or make enough information available to compute scores ourselves. Data required to compute scores would include native prices for all surplus tokens (instead of just the tokens of the winner). For protocol fees, we would need fee policies for all orders with a proposed solution (instead of just protocol fees for executed solutions).

I'd go with more general approach of storing enough information to compute whatever scoring criteria we use.

Data required to compute scores would include native prices for all surplus tokens

We have that in auction_prices db table.

we would need fee policies for all orders with a proposed solution (instead of just protocol fees for executed solutions)

We WILL have that in fee_policies db table. Out of curiosity, why do you calculate the score for proposed solution that was not delivered? Is it because you might need reference scores?

database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
-- Not NULL for winning orders.
deadline bigint,

-- Order details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular orders this data is already available in a different table (so basic SQL normalisation theory says we shouldn't duplicate). For jit orders this may not be the case. Can you explain a bit more detailed why we need this information for jit orders even if there ends up not being a settlement observation?

Note that the current competition endoint exposes order ids and executed amounts, nothing more: https://api.cow.fi/mainnet/api/v1/solver_competition/by_tx_hash/0xe987ca2672c8330398750c73e38ed6375c3e18b29172b806cfc2d66f33eaaf0d

database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
sunce86 added a commit that referenced this pull request Sep 18, 2024
# Description
Currently fee policies are saved only for winning solution.

This PR saves fee policies for all auction orders. This is needed for at
least two reasons:

1. As discussed [in the
PR](#2980 (comment)),
fee policies will be needed for all proposed solutions during a
competition so that the score could be reconstructed in circuit breaker.
2. [For historical
get_auction](#2844).
sunce86 added a commit that referenced this pull request Sep 24, 2024
# Description
Fixes #2992

`settlement_scores::fetch` will be updated once the
#2980 is merged.

## How to test
Existing univ2 e2e test.
Copy link

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

@github-actions github-actions bot added the stale label Sep 26, 2024
@sunce86 sunce86 marked this pull request as ready for review October 10, 2024 15:00
@sunce86 sunce86 requested a review from a team as a code owner October 10, 2024 15:00
@sunce86 sunce86 changed the title [WIP] New database tables for auction and solver competition New database tables for auction and solver competition Oct 10, 2024
crates/autopilot/src/run_loop.rs Outdated Show resolved Hide resolved
pub uid: i64,
// Id as reported by the solver (solvers are unaware of how other solvers are numbering their
// solutions)
pub id: i64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id is a string on the API level. It just happens to be that all solvers currently report a number.
It's probably okay to make this an integer on the API level but that has to be adjusted and communicated first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is not supposed to replace solver name on the solver_competition API. It's here for completness of saving the whole solution object but not necessary for functionality to work. We can remove it as well if we are sure we won't need it.

I think we agreed somewhere that solver name is something we don't care too much about. We have a solver address which is supposed to uniquely identify the solver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the solver name has to do with this. This id is the id that solvers return for each individual solution, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id is a string on the API level

I thought you referred to solver name here.

This id is the id that solvers return for each individual solution, right?

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then the point still stands that the ID is currently a string on the API level and only an integer by convention. If we want to store it in the DB I think we should make sure the data types align and make sense.

Copy link
Contributor Author

@sunce86 sunce86 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will add an issue to switch ID to being u64 as it used that way in both driver and autopilot domains.

#3064

crates/database/src/solver_competition.rs Outdated Show resolved Hide resolved
crates/database/src/solver_competition.rs Outdated Show resolved Hide resolved
crates/database/src/solver_competition.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,66 @@
-- All auctions ran by autopilot
CREATE TABLE competition_auctions (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to avoid these composite table names if possible as I think they mostly cause confusion.
I'd say this should be called auctions and the current auctions table would become current_auction (singular as it's supposed to only store a single row at all times).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid touching existing code with this PR. Table renaming is particularly risky, and even though I initially wanted to do renaming, I went with defining a new name after all.
And if you assume that, it's really hard to figure out a new name for this table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then at least make sure the tables are properly renamed when we finalize this refactor and remove the old tables.

@@ -0,0 +1,66 @@
-- All auctions ran by autopilot
CREATE TABLE competition_auctions (
id bigint PRIMARY KEY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are supposed to use identity columns to have the DB automatically generate these unique values for us.

Suggested change
id bigint PRIMARY KEY,
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it looks neat to use DB generated ID, but why would we use lock us in with using it?
Besides, right now auctions and competition_auctions need to be aligned with ids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, right now auctions and competition_auctions need to be aligned with ids.
Sorry, the comment was supposed to be on proposed_solutions.id.

I understand it looks neat to use DB generated ID, but why would we use lock us in with using it?

For Ids that have no other purpose than being unique and identifying rows I think it makes the most sense to let the DB make sure that things are unique instead of relying on domain code that can have bugs in that regard. Since we don't expect any additional information in the ID any value is as good as any other value as long as it's unique so why should we bother with maintaining that uniqueness ourselves?

Copy link
Contributor Author

@sunce86 sunce86 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Ids that have no other purpose than being unique and identifying rows

But this is not actually true in this case right? Auction id is read directly by client and used to fetch data from other tables etc. It's not like it's inserted only once and never used by client but only by database internally to join on other tables etc.

But anyway, in this case we have to go with client defined Ids because of:

right now auctions and competition_auctions need to be aligned with ids.

database/sql/V072__auction_solution_orders.sql Outdated Show resolved Hide resolved
Comment on lines +146 to +149
.enumerate()
.map(|(uid, participant)| {
let solution = Solution {
uid: uid.try_into().context("uid overflow")?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db docs made it seem like this uid is supposed to be globally unique, which I think is a nicer property than just having it be the index within one auction.

Copy link
Contributor Author

@sunce86 sunce86 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique id of the proposed solution within a single auction this is in the docs

Global uniqueness is not required.

database/README.md Outdated Show resolved Hide resolved
database/README.md Outdated Show resolved Hide resolved
database/README.md Outdated Show resolved Hide resolved
database/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current state, I don't see any blockers.

ON CONFLICT (auction_id, solution_uid, order_uid) DO NOTHING
"#;

sqlx::query(QUERY_JIT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this in one roundtrip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God is my witness I tried but couldn't come up with a query that would properly handle WHERE NOT EXISTS part.

Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all my comments got addressed.

Copy link
Contributor

@m-lord-renkse m-lord-renkse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another round, LGTM! nice PR!

@sunce86 sunce86 enabled auto-merge (squash) October 17, 2024 13:23
@sunce86 sunce86 merged commit 1812dd9 into main Oct 17, 2024
11 checks passed
@sunce86 sunce86 deleted the auction-winners-table branch October 17, 2024 13:24
@github-actions github-actions bot locked and limited conversation to collaborators Oct 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
E:6.2 Time to Happy Moo See https://github.com/cowprotocol/pm/issues/77 for details
Projects
None yet
6 participants