Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip chunkfix #35

Open
wants to merge 3 commits into
base: ceph-master
Choose a base branch
from

Conversation

mdw-at-linuxbox
Copy link

3 fixes here.

1 content-length vs. transfer-encoding: chunked; according to rfc 2616,
transfer-encoding chunked should supersede content-length. Later rfcs require this be treated as an error - sadly experience has shown s3 clients lag in this respect.

  1. boto2 fixes. Older versions of ceph (up to mid-luminous) worked with boto2; but the newer civetweb adopted in msater then backported to luminous is pickier and hates boto2. Even though boto2 appears to be abandonware it still seems like a good idea (to me) to fix this.

  2. mg_set_must_close, Gives ceph/rgw a way to say this connection is hosed, do not attempt to read another request out of it.

These are all issues from,
https://tracker.ceph.com/issues/45789

transfer-encoding = chunked vs. content-length encoding is documented
in rfc 2616 3.6.1 and rfc 7230 4.1.  In both cases, it is
documented that if transfer-encoding == 'chunked', then content-length
should not be supplied by a client, and must not be processed by a server.

The previous code in civetweb allowed content-length to override
transfer-encoding=chunked, which is clearly wrong by the standard.
This chnage corrects the order of those checks.

Signed-off-by: Marcus Watts <[email protected]>
In rfc 2616 3.6.1 and rfc 7230 4.1, a mechanism is provided to supply
extra data in chunk headers, in the of keyword/value pairs.  Old versions
of civetweb completely ignored this data.

The boto2 s3 client, when it uses chunked encoding, sends a per-chunk
header consisting of a hex count followed by a single ';'.  This is not,
according to a strict reading of the rfcs, legal, but it is something that
ought to be acceptable to any server that does not expect to interpret
any keyword values.  This behavior is not likely to change.

Additionally, the final chunk header is supposed to be followed by not
one, but two crlf pairs.  It is desirable to consume both pairs rather
than just leaving the 2nd pair to cause trouble later.

Older versions of civetweb did no checking of any possible chunk header
data, they only checked to see if they could decode a hexadecimal count.
This is clearly more generous than necessary, but the change to be more
strict broke boto2 clinets, which is not desirable.

This change provides slightly more general logic that allows for a ';'
followed by optional data (which is unconditionally discarded for now),
and also consumes the extra crlf pair.

Signed-off-by: Marcus Watts <[email protected]>
In some cases, ceph may detect an error that indicates
that further use of this connection is doubtful.  For instance,
mg_read() fails to return expected data, due to a chunk decoding
error.  In this case, the connection should be closed, but not
until after the error is sent.  civetweb provides low-level logic
to do this (conn->must_close); this commit provides an API
for the application to trigger that behavior.

Signed-off-by: Marcus Watts <[email protected]>
@mdw-at-linuxbox
Copy link
Author

This PR is a prerequesite for this PR to ceph: ceph/ceph#35350

@mdw-at-linuxbox
Copy link
Author

The build failures appear to be due ot preexisting conditions unrelated to this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants