-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820
base: main
Are you sure you want to change the base?
gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820
Changes from all commits
901a91c
5be0eaa
6ae6273
a7d1c0c
d04c9a2
bcab963
53bdb4f
31e4f3e
fae3664
026f35b
8f6f6c3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,6 +90,14 @@ def __add__(self, other): | |
""" | ||
return self.clone(**other.__dict__) | ||
|
||
def validate_header(name): | ||
# Validate header name according to RFC 5322 | ||
import re | ||
if not re.match(r'^[^\s:]+$', name): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is better to do the import at the top of the module and compile the regex. It's a micro optimization, but it is an obvous one. (Note that utils, which _policybase imports, already imports re, so there's no extra overhead for importing it at the top level here.) |
||
raise ValueError(f"Invalid header field name {name!r}") | ||
# Only allow printable ASCII characters | ||
if any(ord(c) < 33 or ord(c) > 126 for c in name): | ||
raise ValueError(f"Invalid header field name {name!r}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are using the same error message for both cases here, so there is no advantage to having the tests be separate. I'd use a variation on the RFC recommended regex for a single test:
Then this whole section becomes:
Also, I'd be inclined to put the function def above _PolicyBase, but that's not all that important. |
||
|
||
def _append_doc(doc, added_doc): | ||
doc = doc.rsplit('\n', 1)[0] | ||
|
@@ -314,6 +322,7 @@ def header_store_parse(self, name, value): | |
"""+ | ||
The name and value are returned unmodified. | ||
""" | ||
validate_header(name) | ||
return (name, value) | ||
|
||
def header_fetch_parse(self, name, value): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ | |
|
||
import re | ||
import sys | ||
from email._policybase import Policy, Compat32, compat32, _extend_docstrings | ||
from email._policybase import Policy, Compat32, compat32, _extend_docstrings, validate_header | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Time to wrap this into a multiline expression to keep it under the PEP8 line length limit:
|
||
from email.utils import _has_surrogates | ||
from email.headerregistry import HeaderRegistry as HeaderRegistry | ||
from email.contentmanager import raw_data_manager | ||
|
@@ -140,6 +140,7 @@ def header_store_parse(self, name, value): | |
""" | ||
if hasattr(value, 'name') and value.name.lower() == name.lower(): | ||
return (name, value) | ||
validate_header(name) | ||
if isinstance(value, str) and len(value.splitlines())>1: | ||
# XXX this error message isn't quite right when we use splitlines | ||
# (see issue 22233), but I'm not sure what should happen here. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -728,6 +728,29 @@ def test_nonascii_add_header_with_tspecial(self): | |
"attachment; filename*=utf-8''Fu%C3%9Fballer%20%5Bfilename%5D.ppt", | ||
msg['Content-Disposition']) | ||
|
||
def test_invalid_headers(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. test_invalid_header_names |
||
invalid_headers = [ | ||
('Invalid Header', 'contains space'), | ||
('Tab\tHeader', 'contains tab'), | ||
('Colon:Header', 'contains colon'), | ||
('', 'Empty name'), | ||
(' LeadingSpace', 'starts with space'), | ||
('TrailingSpace ', 'ends with space'), | ||
] | ||
for name, value in invalid_headers: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add: |
||
with self.assertRaises(ValueError) as cm: | ||
Message().add_header(name, value) | ||
self.assertIn(f"Invalid header field name {name!r}", str(cm.exception)) | ||
|
||
invalid_headers = [ | ||
('Header\x7F', 'Non-ASCII character'), | ||
('Header\x1F', 'control character'), | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for having these separate from the list above? If you did it because you have the checks as two separate clauses in the function, keep in mind that the structure of the tests should be independent of the structure of the code. It looks like it makes more sense to have them all in one list. |
||
for name, value in invalid_headers: | ||
with self.assertRaises(ValueError) as cm: | ||
Message().add_header(name, value) | ||
self.assertIn(f"Invalid header field name {name!r}", str(cm.exception)) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately this only tests your change to add_header, and does not actually test the underlying problem. If you implement test methods that test The correct location for the fix is in the two header_store_parse methods in policy.py and _policybase.py. I'd put a helper method in _policybase and call it from those two methods. I would also consider using a regex to implement the check...I think that will be faster, but you might want to test it. Also, the tests should test both Message (old API) and EmailMessage (new API). Since add_header uses the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tests still need to be converted to test |
||
def test_binary_quopri_payload(self): | ||
for charset in ('latin-1', 'ascii'): | ||
msg = Message() | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
The :meth:`email.message.Message.add_header` method now validates header | ||
field names according to :rfc:`5322`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/validate_header/validate_header_name/ for clarity.