Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Handle "filename*" field in MP header #3239

Open
wants to merge 3 commits into
base: v2/master
Choose a base branch
from

Conversation

airween
Copy link
Member

@airween airween commented Aug 25, 2024

what

Add handling of Multipart Header's "filename*" (asterisk at the end!) field.

why

mod_security2 (v2) does not handle MULTIPART header's "filename*" field, eg:

Content-Disposition: form-data; name="file"; filename*=UTF-8''r%C3%A9sum%C3%A9.pdf

where the filename is UTF8 encoded string with value "résumé.pdf".

references

RFC 7578

additional notes

v3 handles as well this header, I almost copied that part of code.

Thanks @fzipi for bringing it to our attention.

@airween airween requested a review from marcstern August 25, 2024 20:27
@airween airween self-assigned this Aug 25, 2024
apache2/msc_multipart.c Outdated Show resolved Hide resolved
apache2/msc_multipart.c Show resolved Hide resolved
p++;
}
if (*p != '\'') {
return -17; // Single quote for end-of-language not found
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you use something like this here, as below?

msr->mpd->flag_invalid_quoting = 1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea - thanks.

@fzipi, @marcstern - what do you think about this guys?

I already added this in 2b22261, but I can remove that.

If we keep this feature, we should add that to v3 too.

apache2/msc_multipart.c Show resolved Hide resolved
apache2/msc_multipart.c Show resolved Hide resolved
@marcstern
Copy link

marcstern commented Aug 26, 2024

https://www.rfc-editor.org/rfc/rfc5987#section-3.2
This RFC says that only UTF-8 & ISO-8859-1 are standardized:

Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1" ([ISO-8859-1]) character set. Extension character sets (mime-charset) are reserved for future use.

We're restricting for years the languages to these two and never found a false positive.
Should we allow all of them?
If the code allows it, we should add a new collection (FILES_LANG?) allowing checking this syntax.

return -16; // Must be at least one legit char before ' for start of language
}
p++;
while ((*p != '\0') && (*p != '\'')) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Languages can only contain alphanum & '-'
while (isalnum(*p) || *p == '-') p++;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and here - as you suggested - we can control this.

The question is: is it allowed to send the request without charset? I mean:

Content-Disposition: form-data; name="post"; filename*=resume.pdf

I can't find any relevant information about that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.rfc-editor.org/rfc/rfc7578#section-4.2 also says:

Some commonly deployed systems use multipart/form-data with file
names directly encoded including octets outside the US-ASCII range.
The encoding used for the file names is typically UTF-8, although
HTML forms will use the charset associated with the form.

Which sounds to me that any charset is valid, regardless of what the other RFC says.

charset is required, AFAICT, only language is optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you think we do not need to check the charset and we should allow anything? Or restrict the charset content only to alnum chars (+ -)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the charset should be restricted to the ABNF from the RFC.

Copy link

sonarcloud bot commented Aug 26, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
C Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

@airween
Copy link
Member Author

airween commented Aug 26, 2024

https://www.rfc-editor.org/rfc/rfc5987#section-3.2 This RFC says that only UTF-8 & ISO-8859-1 are standardized:

Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1" ([ISO-8859-1]) character set. Extension character sets (mime-charset) are reserved for future use.

We're restricting for years the languages to these two and never found a false positive. Should we allow all of them? If the code allows it, we should add a new collection (FILES_LANG?) allowing checking this syntax.

Well, you are right, but as it stands in RFC: "Extension character sets (mime-charset) are reserved for future use." - if someone introduce a new feature in the future we should align the code again.

Introducing a new settings would be confused (in this case).

And I think first we should investigate is it possible to check those languages with rules.

@marcstern marcstern added the 2.x Related to ModSecurity version 2.x label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to ModSecurity version 2.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants