gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

srinivasreddy · 2024-12-11T11:59:14Z

Issue: email.message.EmailMessage accepts invalid header field names without error, which raise an error when parsed #127794

…ly printable ascii characters

Lib/email/message.py

bitdancer · 2024-12-12T20:26:24Z

Lib/test/test_email/test_email.py

+            with self.assertRaises(ValueError) as cm:
+               Message().add_header(name, value)
+            self.assertIn(f"Header field name {name!r} contains invalid characters", str(cm.exception))
+


Unfortunately this only tests your change to add_header, and does not actually test the underlying problem. If you implement test methods that test msg['test header'] = 'foo', you will find that the invalid field name will be accepted. add_header is not used in the __setitem__ path.

The correct location for the fix is in the two header_store_parse methods in policy.py and _policybase.py. I'd put a helper method in _policybase and call it from those two methods. I would also consider using a regex to implement the check...I think that will be faster, but you might want to test it.

Also, the tests should test both Message (old API) and EmailMessage (new API). Since add_header uses the __setitem__ syntax in its implementation you don't actually need to test it directly, but it wouldn't be bad to keep the tests to make sure that path doesn't get broken in the future. More tests are good unless they are slow ;) But not required.

Tests still need to be converted to test __setitem__ and both Message and EmailMessage, but I assume you were waiting for that until I approved the code changes.

bedevere-app · 2024-12-12T20:27:02Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

bitdancer

Thanks for the prompt response. Hopefully I can be reasonably prompt in return, but unfortunately no guarantees :(

bitdancer · 2024-12-13T22:58:52Z

Lib/email/_policybase.py

@@ -90,6 +90,14 @@ def __add__(self, other):
        """
        return self.clone(**other.__dict__)

+def validate_header(name):


s/validate_header/validate_header_name/ for clarity.

bitdancer · 2024-12-13T22:58:56Z

Lib/email/_policybase.py

+def validate_header(name):
+    # Validate header name according to RFC 5322
+    import re
+    if not re.match(r'^[^\s:]+$', name):


It is better to do the import at the top of the module and compile the regex. It's a micro optimization, but it is an obvous one. (Note that utils, which _policybase imports, already imports re, so there's no extra overhead for importing it at the top level here.)

bitdancer · 2024-12-13T22:58:58Z

Lib/email/_policybase.py

+        raise ValueError(f"Invalid header field name {name!r}")
+    # Only allow printable ASCII characters
+    if any(ord(c) < 33 or ord(c) > 126 for c in name):
+        raise ValueError(f"Invalid header field name {name!r}")


You are using the same error message for both cases here, so there is no advantage to having the tests be separate. I'd use a variation on the RFC recommended regex for a single test:

header_re = re.compile("[\041-\071\073-\176]+$")

Then this whole section becomes:

if not header_re.match(name): raise ValueError(f"Header field name contains invalid characters: {name!r}")

Also, I'd be inclined to put the function def above _PolicyBase, but that's not all that important.

bitdancer · 2024-12-13T22:59:03Z

Lib/email/policy.py

@@ -4,7 +4,7 @@

 import re
 import sys
-from email._policybase import Policy, Compat32, compat32, _extend_docstrings
+from email._policybase import Policy, Compat32, compat32, _extend_docstrings, validate_header


Time to wrap this into a multiline expression to keep it under the PEP8 line length limit:

from email._policybase import ( _extend_docstrings, Compat32, compat32, Policy, validate_header_name, ) ``` You'll note that I sorted them alphabetically while I was at it (ignoring case).

bitdancer · 2024-12-13T22:59:14Z

Lib/test/test_email/test_email.py

@@ -728,6 +728,29 @@ def test_nonascii_add_header_with_tspecial(self):
            "attachment; filename*=utf-8''Fu%C3%9Fballer%20%5Bfilename%5D.ppt",
            msg['Content-Disposition'])

+    def test_invalid_headers(self):


test_invalid_header_names

bitdancer · 2024-12-13T23:01:41Z

Lib/test/test_email/test_email.py

+        invalid_headers = [
+            ('Header\x7F', 'Non-ASCII character'),
+            ('Header\x1F', 'control character'),
+        ]


What is the reason for having these separate from the list above? If you did it because you have the checks as two separate clauses in the function, keep in mind that the structure of the tests should be independent of the structure of the code. It looks like it makes more sense to have them all in one list.

bitdancer · 2024-12-13T23:03:40Z

Lib/test/test_email/test_email.py

+            with self.assertRaises(ValueError) as cm:
+               Message().add_header(name, value)
+            self.assertIn(f"Header field name {name!r} contains invalid characters", str(cm.exception))
+


Tests still need to be converted to test __setitem__ and both Message and EmailMessage, but I assume you were waiting for that until I approved the code changes.

bitdancer · 2024-12-13T23:06:45Z

Lib/test/test_email/test_email.py

+            (' LeadingSpace', 'starts with space'),
+            ('TrailingSpace ', 'ends with space'),
+        ]
+        for name, value in invalid_headers:


Add: with self.subTest(name=name, problem=value): and indent and wrap the remaining lines as appropriate.

pythongh-127794: Validate header name according rfc-5322 and allow on…

901a91c

…ly printable ascii characters

srinivasreddy requested a review from a team as a code owner December 11, 2024 11:59

bedevere-app bot mentioned this pull request Dec 11, 2024

email.message.EmailMessage accepts invalid header field names without error, which raise an error when parsed #127794

Open

bedevere-app bot added the awaiting review label Dec 11, 2024

srinivasreddy changed the title ~~gh-127794: Validate header name according rfc-5322 and allow only printable ascii characters~~ gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters Dec 11, 2024

srinivasreddy added 5 commits December 11, 2024 17:33

Add test case

5be0eaa

Add some more tests

6ae6273

Fix indentation

a7d1c0c

Add news entry

d04c9a2

Merge branch 'main' into pythongh-127794

bcab963

ZeroIntensity added topic-email needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Dec 12, 2024

ZeroIntensity reviewed Dec 12, 2024

View reviewed changes

Lib/email/message.py Outdated Show resolved Hide resolved

bitdancer requested changes Dec 12, 2024

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Dec 12, 2024

srinivasreddy added 5 commits December 13, 2024 17:56

pythongh-127794: Create a separate function for validating header

53bdb4f

pythongh-127794: Coverge all the validations into a single regex

31e4f3e

pythongh-127794: Revert the changes

fae3664

pythongh-127794: Remove one variable

026f35b

pythongh-127794: Update tests

8f6f6c3

srinivasreddy requested a review from bitdancer December 13, 2024 13:27

bitdancer requested changes Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

srinivasreddy commented Dec 11, 2024 •

edited by bedevere-app bot

Loading

bitdancer Dec 12, 2024

bitdancer Dec 13, 2024

bedevere-app bot commented Dec 12, 2024

bitdancer left a comment

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

bitdancer Dec 13, 2024

gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

Are you sure you want to change the base?

gh-127794: Validate header name according to rfc-5322 and allow only printable ascii characters #127820

Conversation

srinivasreddy commented Dec 11, 2024 • edited by bedevere-app bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-app bot commented Dec 12, 2024

bitdancer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srinivasreddy commented Dec 11, 2024 •

edited by bedevere-app bot

Loading