-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820
base: main
Are you sure you want to change the base?
Conversation
srinivasreddy
commented
Dec 11, 2024
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: email.message.EmailMessage accepts invalid header field names without error, which raise an error when parsed #127794
…ly printable ascii characters
with self.assertRaises(ValueError) as cm: | ||
Message().add_header(name, value) | ||
self.assertIn(f"Header field name {name!r} contains invalid characters", str(cm.exception)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this only tests your change to add_header, and does not actually test the underlying problem. If you implement test methods that test msg['test header'] = 'foo'
, you will find that the invalid field name will be accepted. add_header is not used in the __setitem__
path.
The correct location for the fix is in the two header_store_parse methods in policy.py and _policybase.py. I'd put a helper method in _policybase and call it from those two methods. I would also consider using a regex to implement the check...I think that will be faster, but you might want to test it.
Also, the tests should test both Message (old API) and EmailMessage (new API). Since add_header uses the __setitem__
syntax in its implementation you don't actually need to test it directly, but it wouldn't be bad to keep the tests to make sure that path doesn't get broken in the future. More tests are good unless they are slow ;) But not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests still need to be converted to test __setitem__
and both Message
and EmailMessage
, but I assume you were waiting for that until I approved the code changes.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt response. Hopefully I can be reasonably prompt in return, but unfortunately no guarantees :(
@@ -90,6 +90,14 @@ def __add__(self, other): | |||
""" | |||
return self.clone(**other.__dict__) | |||
|
|||
def validate_header(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/validate_header/validate_header_name/ for clarity.
def validate_header(name): | ||
# Validate header name according to RFC 5322 | ||
import re | ||
if not re.match(r'^[^\s:]+$', name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to do the import at the top of the module and compile the regex. It's a micro optimization, but it is an obvous one. (Note that utils, which _policybase imports, already imports re, so there's no extra overhead for importing it at the top level here.)
raise ValueError(f"Invalid header field name {name!r}") | ||
# Only allow printable ASCII characters | ||
if any(ord(c) < 33 or ord(c) > 126 for c in name): | ||
raise ValueError(f"Invalid header field name {name!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using the same error message for both cases here, so there is no advantage to having the tests be separate. I'd use a variation on the RFC recommended regex for a single test:
header_re = re.compile("[\041-\071\073-\176]+$")
Then this whole section becomes:
if not header_re.match(name):
raise ValueError(f"Header field name contains invalid characters: {name!r}")
Also, I'd be inclined to put the function def above _PolicyBase, but that's not all that important.
@@ -4,7 +4,7 @@ | |||
|
|||
import re | |||
import sys | |||
from email._policybase import Policy, Compat32, compat32, _extend_docstrings | |||
from email._policybase import Policy, Compat32, compat32, _extend_docstrings, validate_header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to wrap this into a multiline expression to keep it under the PEP8 line length limit:
from email._policybase import (
_extend_docstrings,
Compat32,
compat32,
Policy,
validate_header_name,
)
```
You'll note that I sorted them alphabetically while I was at it (ignoring case).
@@ -728,6 +728,29 @@ def test_nonascii_add_header_with_tspecial(self): | |||
"attachment; filename*=utf-8''Fu%C3%9Fballer%20%5Bfilename%5D.ppt", | |||
msg['Content-Disposition']) | |||
|
|||
def test_invalid_headers(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_invalid_header_names
invalid_headers = [ | ||
('Header\x7F', 'Non-ASCII character'), | ||
('Header\x1F', 'control character'), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for having these separate from the list above? If you did it because you have the checks as two separate clauses in the function, keep in mind that the structure of the tests should be independent of the structure of the code. It looks like it makes more sense to have them all in one list.
with self.assertRaises(ValueError) as cm: | ||
Message().add_header(name, value) | ||
self.assertIn(f"Header field name {name!r} contains invalid characters", str(cm.exception)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests still need to be converted to test __setitem__
and both Message
and EmailMessage
, but I assume you were waiting for that until I approved the code changes.
(' LeadingSpace', 'starts with space'), | ||
('TrailingSpace ', 'ends with space'), | ||
] | ||
for name, value in invalid_headers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add: with self.subTest(name=name, problem=value):
and indent and wrap the remaining lines as appropriate.