Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

srinivasreddy
Copy link
Contributor

@srinivasreddy srinivasreddy commented Dec 11, 2024

@srinivasreddy srinivasreddy requested a review from a team as a code owner December 11, 2024 11:59
@srinivasreddy srinivasreddy changed the title gh-127794: Validate header name according rfc-5322 and allow only printable ascii characters gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters Dec 11, 2024
@ZeroIntensity ZeroIntensity added topic-email needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Dec 12, 2024
Lib/email/message.py Outdated Show resolved Hide resolved
with self.assertRaises(ValueError) as cm:
Message().add_header(name, value)
self.assertIn(f"Header field name {name!r} contains invalid characters", str(cm.exception))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this only tests your change to add_header, and does not actually test the underlying problem. If you implement test methods that test msg['test header'] = 'foo', you will find that the invalid field name will be accepted. add_header is not used in the __setitem__ path.

The correct location for the fix is in the two header_store_parse methods in policy.py and _policybase.py. I'd put a helper method in _policybase and call it from those two methods. I would also consider using a regex to implement the check...I think that will be faster, but you might want to test it.

Also, the tests should test both Message (old API) and EmailMessage (new API). Since add_header uses the __setitem__ syntax in its implementation you don't actually need to test it directly, but it wouldn't be bad to keep the tests to make sure that path doesn't get broken in the future. More tests are good unless they are slow ;) But not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests still need to be converted to test __setitem__ and both Message and EmailMessage, but I assume you were waiting for that until I approved the code changes.

@bedevere-app
Copy link

bedevere-app bot commented Dec 12, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

Copy link
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt response. Hopefully I can be reasonably prompt in return, but unfortunately no guarantees :(

@@ -90,6 +90,14 @@ def __add__(self, other):
"""
return self.clone(**other.__dict__)

def validate_header(name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/validate_header/validate_header_name/ for clarity.

def validate_header(name):
# Validate header name according to RFC 5322
import re
if not re.match(r'^[^\s:]+$', name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to do the import at the top of the module and compile the regex. It's a micro optimization, but it is an obvous one. (Note that utils, which _policybase imports, already imports re, so there's no extra overhead for importing it at the top level here.)

raise ValueError(f"Invalid header field name {name!r}")
# Only allow printable ASCII characters
if any(ord(c) < 33 or ord(c) > 126 for c in name):
raise ValueError(f"Invalid header field name {name!r}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using the same error message for both cases here, so there is no advantage to having the tests be separate. I'd use a variation on the RFC recommended regex for a single test:

header_re = re.compile("[\041-\071\073-\176]+$")

Then this whole section becomes:

if not header_re.match(name):
    raise ValueError(f"Header field name contains invalid characters: {name!r}")

Also, I'd be inclined to put the function def above _PolicyBase, but that's not all that important.

@@ -4,7 +4,7 @@

import re
import sys
from email._policybase import Policy, Compat32, compat32, _extend_docstrings
from email._policybase import Policy, Compat32, compat32, _extend_docstrings, validate_header
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to wrap this into a multiline expression to keep it under the PEP8 line length limit:

from email._policybase import (
    _extend_docstrings,
    Compat32,
    compat32,
    Policy,
    validate_header_name,
    )
```
You'll note that I sorted them alphabetically while I was at it (ignoring case).

@@ -728,6 +728,29 @@ def test_nonascii_add_header_with_tspecial(self):
"attachment; filename*=utf-8''Fu%C3%9Fballer%20%5Bfilename%5D.ppt",
msg['Content-Disposition'])

def test_invalid_headers(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_invalid_header_names

invalid_headers = [
('Header\x7F', 'Non-ASCII character'),
('Header\x1F', 'control character'),
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for having these separate from the list above? If you did it because you have the checks as two separate clauses in the function, keep in mind that the structure of the tests should be independent of the structure of the code. It looks like it makes more sense to have them all in one list.

with self.assertRaises(ValueError) as cm:
Message().add_header(name, value)
self.assertIn(f"Header field name {name!r} contains invalid characters", str(cm.exception))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests still need to be converted to test __setitem__ and both Message and EmailMessage, but I assume you were waiting for that until I approved the code changes.

(' LeadingSpace', 'starts with space'),
('TrailingSpace ', 'ends with space'),
]
for name, value in invalid_headers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: with self.subTest(name=name, problem=value): and indent and wrap the remaining lines as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants