Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks have only one aggregator, collector token #1973

Merged
merged 4 commits into from
Sep 22, 2023

Conversation

tgeoghegan
Copy link
Contributor

@tgeoghegan tgeoghegan commented Sep 21, 2023

We've decided that each task has exactly one aggregator and collector auth token, and rotating those means rotating the task. This commit updates the representations of auth tokens in the database and in memory (aggregator_core::task::Task) accordingly.

Several tests relied upon tasks constructed in tests having two aggregator auth tokens, one of each supported type. Those tests are fixed to explicitly construct tasks using DAP-Auth-Token tokens where necessary.

Given that there is now a single aggregator or collector auth token per task, we could remove the task_aggregator_auth_tokens and task_collector_auth_tokens tables and instead add columns aggregator_auth_token, aggregator_auth_token_type, collector_auth_token and collector_auth_token_type to tasks.

I chose not to do this because validating the correctness of those columns would be tricky. We couldn't make them all NOT NULL, because a helper task doesn't have a collector auth token and a task provisioned via taskprov won't have either token (instead it'll use tokens from the taskprov_*_auth_tokens tables). So we would have to write constraints on the tasks table to ensure pairs of columns are either NULL or NOT NULL in tandem, which I think are expressed more clearly in the independent task_*_auth_tokens tables. Plus, if we ever do decide to support auth token rotation, it'll be easier to do with these tables in place.

Part of #1524, #1521

@tgeoghegan tgeoghegan added the allow-changed-migrations Override the ci-migrations check to allow migrations that have changed. label Sep 21, 2023
@tgeoghegan tgeoghegan force-pushed the timg/aggregator-auth-token-hash branch from 732c7f9 to bde5aa8 Compare September 21, 2023 15:50
We've decided that each task has exactly one aggregator and collector
auth token, and rotating those means rotating the task. This commit
updates the representations of auth tokens in the database and in memory
(`aggregator_core::task::Task`) accordingly.

Several tests relied upon tasks constructed in tests having two
aggregator auth tokens, one of each supported type. Those tests are
fixed to explicitly construct tasks using `DAP-Auth-Token` tokens where
necessary.

Given that there is now a single aggregator or collector auth token per
task, we could remove the `task_aggregator_auth_tokens` and
`task_collector_auth_tokens` tables and instead add columns
`aggregator_auth_token`, `aggregator_auth_token_type`,
`collector_auth_token` and `collector_auth_token_type` to `tasks`.

I chose not to do this because validating the correctness of those
columns would be tricky. We couldn't make them all `NOT NULL`, because a
helper task doesn't have a collector auth token and a task provisioned
via taskprov won't have either token (instead it'll use tokens from the
`taskprov_*_auth_tokens` tables). So we would have to write constraints
on the `tasks` table to ensure pairs of columns are either `NULL` or
`NOT NULL` in tandem, which I think are expressed more clearly in the
independent `task_*_auth_tokens` tables. Plus, if we ever _do_ decide to
support auth token rotation, it'll be easier to do with these tables in
place.

Part of #1524, #1521
@tgeoghegan tgeoghegan force-pushed the timg/aggregator-auth-token-hash branch from bde5aa8 to a647d83 Compare September 21, 2023 16:38
@tgeoghegan tgeoghegan changed the title WIP: Exactly one aggregator and collector auth token per task Tasks have only one aggregator, collector token Sep 21, 2023
@tgeoghegan tgeoghegan marked this pull request as ready for review September 21, 2023 16:43
@tgeoghegan tgeoghegan requested a review from a team as a code owner September 21, 2023 16:43
Copy link
Contributor

@branlwyd branlwyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would eliminate the separate tables for auth tokens; with this PR, the only reason they exist is because we used to support multiple tokens. If we were to implement the token functionality for the first time, we would not add these tables. I understand the argument that it is challenging to express the necessary constraints, but we could either lean on application-level checks only or implement a CHECK constraint in the DB.

row_id[TaskId::LEN..].copy_from_slice(&ord.to_be_bytes());

// Aggregator auth token.
let aggregator_auth_token_future = if let Some(token) = task.aggregator_auth_token() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more natural way to express this idea would be to move the if & all related bits "into" the future created by async move. Something like:

let aggregator_auth_tokens_future = async move {
  if let Some(token) = task.aggregator_auth_token() {
    /* ... code currently inside this if */
  }
}

(likely with some additional code to move or copy or Arc-ify data needed inside the future)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly marginal upside, but this strategy would also allow the statement preparation IO to occur concurrently; as-written (before this PR & with the current PR changes) these preparations would be serialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moot because now that the distinct auth token tables are gone, we don't need to do these inserts separately from the one in tasks at all.

let ord: i64 = row.get("ord");
let auth_token_type: AuthenticationTokenType = row.get("type");
let encrypted_aggregator_auth_token: Vec<u8> = row.get("token");
let aggregator_auth_token = if let Some(row) = aggregator_auth_token_row {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more clearly/concisely expressed by aggregator_auth_token_row.map(/* ... code currently inside this if ... */).transpose()?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming the error types work out, it's hard to reason about the operation of the ? operator without an IDE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little surprised to see that I could write an Option::map closure with two different fallible calls in it by using Option::transpose, but it works, along with an Option::zip since we no longer have a distinct Option<Row> for the auth tokens.

@tgeoghegan
Copy link
Contributor Author

I would eliminate the separate tables for auth tokens; with this PR, the only reason they exist is because we used to support multiple tokens. If we were to implement the token functionality for the first time, we would not add these tables. I understand the argument that it is challenging to express the necessary constraints, but we could either lean on application-level checks only or implement a CHECK constraint in the DB.

Done, the auth token tables are gone.

@tgeoghegan tgeoghegan requested a review from branlwyd September 22, 2023 00:40
Copy link
Contributor

@inahga inahga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification.

/// Token used to authenticate messages sent to or received from the other aggregator. Only set
/// if the task was not created via taskprov.
aggregator_auth_token: Option<AuthenticationToken>,
/// Token used to authenticate messages sent to received from the collector. Only set if this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Token used to authenticate messages sent to received from the collector. Only set if this
/// Token used to authenticate messages sent to or received from the collector. Only set if this

aggregator_auth_token_type AUTH_TOKEN_TYPE, -- the type of the authentication token
aggregator_auth_token BYTEA, -- encrypted bearer token
-- The aggregator_auth_token columns must either both be NULL or both be non-NULL
CONSTRAINT aggregator_auth_token_null CHECK ((aggregator_auth_token_type IS NULL) = (aggregator_auth_token IS NULL)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about unit tests that exercise this constraint, i.e. try to insert values that violate it?

This is my main worry with inserting business logic into the database--it's difficult to test for. But I think our test harness can support such a test without too much pain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Our definition of struct Task doesn't allow constructing a value where the auth token is set but not its type (or vice-versa), but it's easy enough to write SQL UPDATE statements that exercise these constraints with illegal mutations to an existing task so I did that.

# authenticate leader-to-helper requests. In the case of a leader-role task,
# the leader will include the first token in a header when making requests to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, this was wrong the whole time. (the last token was the primary auth token)

Copy link
Contributor

@inahga inahga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@tgeoghegan tgeoghegan enabled auto-merge (squash) September 22, 2023 17:14
@tgeoghegan tgeoghegan merged commit aca9f14 into main Sep 22, 2023
8 checks passed
@tgeoghegan tgeoghegan deleted the timg/aggregator-auth-token-hash branch September 22, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
allow-changed-migrations Override the ci-migrations check to allow migrations that have changed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants