-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Email addresses in HTML content are removed when sanitizing text coming from a plaintext email #126
Comments
This is the same problem as #91. See there for a workaround. |
Thanks @mganss ! |
I'd like HtmlSanitizer to "do one thing well" and that's sanitize HTML, so adding this would be outside of this scope. If you have something that's not HTML, you'll need to do preprocessing. |
Indeed, makes sense.. Maybe your gist could be included in the distribution and accessed through an additional call or option/flag. Thanks for the workaround and quick replies :) |
I have added it to the Examples wiki page. |
Thanks a lot @mganss ! |
Hi, Just FYI want to update how I am handling this issue. Created a method which identifies if the tag to be removed is in email format.
|
When the string to sanitize comes from a plaintext email, such items are present in the original content :
If the email was a HTML email, the < and > around "
<[email protected]>
" are aleady escaped as < and > but if the email was plaintext, they are not.In this specific case, the part
<[email protected]>
is considered to be an invalid HTML tag and is removed, along with all the following content from that point.If option "Keep child nodes of removed elements" is chosen, then only these email tags are lost.
It would be great if after testing a tag against the whitelist, an additional test was made to attempt to match it to these two authorized and standard and safe instances.
The text was updated successfully, but these errors were encountered: