Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
9 changes: 9 additions & 0 deletions Lib/email/_policybase.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ def __add__(self, other):
"""
return self.clone(**other.__dict__)

def validate_header(name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/validate_header/validate_header_name/ for clarity.

# Validate header name according to RFC 5322
import re
if not re.match(r'^[^\s:]+$', name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to do the import at the top of the module and compile the regex. It's a micro optimization, but it is an obvous one. (Note that utils, which _policybase imports, already imports re, so there's no extra overhead for importing it at the top level here.)

raise ValueError(f"Invalid header field name {name!r}")
# Only allow printable ASCII characters
if any(ord(c) < 33 or ord(c) > 126 for c in name):
raise ValueError(f"Invalid header field name {name!r}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using the same error message for both cases here, so there is no advantage to having the tests be separate. I'd use a variation on the RFC recommended regex for a single test:

header_re = re.compile("[\041-\071\073-\176]+$")

Then this whole section becomes:

if not header_re.match(name):
    raise ValueError(f"Header field name contains invalid characters: {name!r}")

Also, I'd be inclined to put the function def above _PolicyBase, but that's not all that important.


def _append_doc(doc, added_doc):
doc = doc.rsplit('\n', 1)[0]
Expand Down Expand Up @@ -314,6 +322,7 @@ def header_store_parse(self, name, value):
"""+
The name and value are returned unmodified.
"""
validate_header(name)
return (name, value)

def header_fetch_parse(self, name, value):
Expand Down
3 changes: 2 additions & 1 deletion Lib/email/policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import re
import sys
from email._policybase import Policy, Compat32, compat32, _extend_docstrings
from email._policybase import Policy, Compat32, compat32, _extend_docstrings, validate_header
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to wrap this into a multiline expression to keep it under the PEP8 line length limit:

from email._policybase import (
    _extend_docstrings,
    Compat32,
    compat32,
    Policy,
    validate_header_name,
    )
```
You'll note that I sorted them alphabetically while I was at it (ignoring case).

from email.utils import _has_surrogates
from email.headerregistry import HeaderRegistry as HeaderRegistry
from email.contentmanager import raw_data_manager
Expand Down Expand Up @@ -140,6 +140,7 @@ def header_store_parse(self, name, value):
"""
if hasattr(value, 'name') and value.name.lower() == name.lower():
return (name, value)
validate_header(name)
if isinstance(value, str) and len(value.splitlines())>1:
# XXX this error message isn't quite right when we use splitlines
# (see issue 22233), but I'm not sure what should happen here.
Expand Down
23 changes: 23 additions & 0 deletions Lib/test/test_email/test_email.py
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,29 @@ def test_nonascii_add_header_with_tspecial(self):
"attachment; filename*=utf-8''Fu%C3%9Fballer%20%5Bfilename%5D.ppt",
msg['Content-Disposition'])

def test_invalid_headers(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_invalid_header_names

invalid_headers = [
('Invalid Header', 'contains space'),
('Tab\tHeader', 'contains tab'),
('Colon:Header', 'contains colon'),
('', 'Empty name'),
(' LeadingSpace', 'starts with space'),
('TrailingSpace ', 'ends with space'),
]
for name, value in invalid_headers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: with self.subTest(name=name, problem=value): and indent and wrap the remaining lines as appropriate.

with self.assertRaises(ValueError) as cm:
Message().add_header(name, value)
self.assertIn(f"Invalid header field name {name!r}", str(cm.exception))

invalid_headers = [
('Header\x7F', 'Non-ASCII character'),
('Header\x1F', 'control character'),
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for having these separate from the list above? If you did it because you have the checks as two separate clauses in the function, keep in mind that the structure of the tests should be independent of the structure of the code. It looks like it makes more sense to have them all in one list.

for name, value in invalid_headers:
with self.assertRaises(ValueError) as cm:
Message().add_header(name, value)
self.assertIn(f"Invalid header field name {name!r}", str(cm.exception))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this only tests your change to add_header, and does not actually test the underlying problem. If you implement test methods that test msg['test header'] = 'foo', you will find that the invalid field name will be accepted. add_header is not used in the __setitem__ path.

The correct location for the fix is in the two header_store_parse methods in policy.py and _policybase.py. I'd put a helper method in _policybase and call it from those two methods. I would also consider using a regex to implement the check...I think that will be faster, but you might want to test it.

Also, the tests should test both Message (old API) and EmailMessage (new API). Since add_header uses the __setitem__ syntax in its implementation you don't actually need to test it directly, but it wouldn't be bad to keep the tests to make sure that path doesn't get broken in the future. More tests are good unless they are slow ;) But not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests still need to be converted to test __setitem__ and both Message and EmailMessage, but I assume you were waiting for that until I approved the code changes.

def test_binary_quopri_payload(self):
for charset in ('latin-1', 'ascii'):
msg = Message()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The :meth:`email.message.Message.add_header` method now validates header
field names according to :rfc:`5322`.
Loading