-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly forbid mutating agg jobs, collections #471
Conversation
draft-ietf-ppm-dap.md
Outdated
@@ -1355,6 +1355,12 @@ On success, the Helper responds to the Leader with HTTP status code 201 Created | |||
and a body consisting of the `AggregationJobResp`, with media type | |||
"application/dap-aggregation-job-resp". | |||
|
|||
Aggregation job initialization MUST be idempotent. That is, further requests to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things to consider including:
- Mention that "further requests with a different body" would be grouped together by aggregation job ID (I suppose).
- We don't want to allow re-initialization -- so if a continuation step has occurred, we should no longer allow initialization.
(& identical suggestions for the collection job case, with aggregation job ID -> collection job ID)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that "further requests with a different body" would be grouped together by aggregation job ID (I suppose).
I think I know what you mean. Is this paragraph more clear?
Aggregation job initialization MUST be idempotent. That is, further requests to
initialize some aggregation job whose body is identical to the first request
MUST succeed. However, changing an aggregation job's parameters is illegal, so
further requests to initialize an existing aggregation job with different parameters
MUST fail with an HTTP client error status code.
We don't want to allow re-initialization -- so if a continuation step has occurred, we should no longer allow initialization.
So we're talking about a sequence of requests like this:
PUT /tasks/{task-id}/aggregation_jobs/{aggregation-job-id}
POST /tasks/{task-id}/aggregation_jobs/{aggregation-job-id}
PUT /tasks/{task-id}/aggregation_jobs/{aggregation-job-id}
When you say we should no longer allow initialization, do you mean that the request at (3) should fail with something like HTTP 400 Bad Request, or just that it shouldn't rewind or reset the state of the aggregation job? I think that the request at (3) should still get a 201 Created response, because in many cases, the leader can then follow up with a POST
to continue the aggregation job and gracefully resume processing the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that "further requests with a different body" would be grouped together by aggregation job ID (I suppose).
I'm suggesting that the text "further requests to initialize an existing aggregation job with different parameters" is ambiguous, because it does not define how to determine if a new request matches to an existing aggregation job. I think the right way to determine this is by aggregation job ID -- if the request includes an aggregation job ID matching to an existing aggregation job, it is a repeated request (possibly with different parameters); if the request includes a unique aggregation job ID, it is a request for a new aggregation job. My suggestion is that we state that explicitly.
When you say we should no longer allow initialization, do you mean that the request at (3) should fail with something like HTTP 400 Bad Request, or just that it shouldn't rewind or reset the state of the aggregation job?
Yes -- IMO request (3) should fail [and it shouldn't rewind/reset the state of the aggregation job]. I would suggest we don't allow re-initialization of existing aggregation jobs because we don't need to: if a further "continuation" request has been made, the Leader has definitely received the result of the prior "initialization" step. And if we allow aggregation jobs to be re-initialized, we will have to deal with "weird" issues like "what if the Leader drives an agg job all the way to completion, then re-initializes it, then drives it to completion once again -- should the output shares recovered by the second completion contribute to the eventual aggregate 'again'?" (probably not, but we'd need text to that effect, and implementations would have to add some complexity to deal with such issues)
Or, framing things in terms of idempotency, request (2) changes the state of the aggregation job, so afterwards re-sending request (1) has no expectation of idempotency, as it is a request to rewind the aggregation job -- we would at that point expect request (2) to be idempotent rather than request (1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting that the text "further requests to initialize an existing aggregation job with different parameters" is ambiguous, because it does not define how to determine if a new request matches to an existing aggregation job.
OK, I'll rewrite to be more explicit.
Yes -- IMO request (3) should fail [and it shouldn't rewind/reset the state of the aggregation job].
I'm coming around to your point of view: reviewing the changes that will land in #393, it appears we don't allow for the AggregationJobResp
a leader gets in response to initialization to ever contain PrepareResp
s in the finished state, so as it stands, we can't allow PUT /tasks/{task-id}/aggregation_jobs/{aggregation-job-id}
to succeed regardless of the aggregation job's state. I was thinking that maybe an implementation could have multiple aggregation job workers racing to initialize and continue agg jobs, but the implementation would have to have a pretty serious state synchronization bug to have one worker step the job past round 1 and have the other still think the job needs to be initialized, so I don't think it's worth having the specification bend over backwards to accommodate this.
In short: requests to initialize an aggregation job that has been continued at least once are invalid.
1df6370
to
a354bfe
Compare
draft-ietf-ppm-dap.md
Outdated
@@ -1871,6 +1884,14 @@ The Leader then begins working with the Helper to aggregate the reports | |||
satisfying the query (or continues this process, depending on the VDAF) as | |||
described in {{aggregate-flow}}. | |||
|
|||
Collection job creation MUST be idempotent. That is, multiple requests to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter if the task has expired and leader deleted it's collection_jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how task expiration complicates this. We already have this text:
Upon receipt of a
CollectionReq
, the Leader begins by checking that it
recognizes the task ID in the request path. If not, it MUST abort with error
unrecognizedTask
.
So if task expires, then the leader should refuse to allow new collection jobs to be created in it, regardless of whether the request has been repeated.
draft-ietf-ppm-dap.md
Outdated
Collection job creation MUST be idempotent. That is, multiple requests to | ||
`PUT /tasks/{tasks}/collection_jobs/{collection-job-id}` for the same | ||
`collection-job-id` and the same request body MUST succeed. However, changing | ||
a collection job's parameters is illegal, so further requests to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this conflict with max_batch_query_count? if a collection-job-id has been queried more than max_batch_query_count, the Leader is suppose to fail the CollectionReq with "batchQueriedTooManyTimes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not conflict with max batch query count. Each of the queries against a given batch is a distinct collection with a distinct collection ID.
After thinking about this further, and having re-read what RFC 9110 already says about idempotence, safety and the different HTTP verbs, I think this change isn't necessary. We already explicitly reference RFC 9110 and it's not DAP's job to re-iterate HTTP concepts. Further, standards that build on HTTP like ACME get by without being so explicit about immutability and idempotence, so I don't think we need to go farther than that protocol does. |
I've refocused the change to be about the DAP level semantics and to avoid restating HTTP semantics, which is confusing to readers. RFC 9110 among others does a better job of explaining HTTP and its methods than DAP can. |
Note also that once an aggregation job has been continued, it may not be initialized again. Resolves #406
37182c8
to
bf7c585
Compare
Resolves #406