How does one find all elements regardless of ifs? #2994
-
Nothing is printed. Suspect if statements are honored, or assumed false. Removing all comments is not an option because they are needed later. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@forthrin Thanks for asking this question. I've moved it from the nokogiri.org repository to the nokogiri repository. Let's look at the DOM that's created: Nokogiri.HTML5('<!--[if foo]><p>bar</p><![endif]-->').to_html
# => "<!--[if foo]><p>bar</p><![endif]--><html><head></head><body></body></html>"
Nokogiri.HTML5('<!--[if foo]><p>bar</p><![endif]-->')
# =>
# #(Document:0x33d9c {
# name = "document",
# children = [
# #(Comment "[if foo]><p>bar</p><![endif]"),
# #(Element:0x33db0 {
# name = "html",
# children = [ #(Element:0x33dc4 { name = "head" }), #(Element:0x33dd8 { name = "body" })]
# })]
# }) Note that the initial comment is handled while the parser is in the "initial" or "before html" state, and so the comment is inserted into the document before the The Please note, that a bare comment is not a well-formed HTML document; you may have better results parsing fragments with the Nokogiri::HTML5.fragment('<!--[if foo]><p>bar</p><![endif]-->').to_html
# => "<!--[if foo]><p>bar</p><![endif]-->"
Nokogiri::HTML5.fragment('<!--[if foo]><p>bar</p><![endif]-->')
# => #(DocumentFragment:0x1c034 { name = "#document-fragment", children = [ #(Comment "[if foo]><p>bar</p><![endif]")] }) If you have more questions, I can try to answer -- please tell us more about your use case and what you expected to happen. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your very kind and helpful reply. To answer the XY problem, the use case is: Remove any possibility of an HTML mail contacting a server except on explicit user interaction (ie. clicking a link). The current approach is:
So the problem is simply that URLs inside if statements are not removed, and could hence be loaded in a browser. Any simpler and smarter approach to accomplish the overall goal is appreciated. |
Beta Was this translation helpful? Give feedback.
Conditional comments aren't really supported in modern browsers -- https://en.wikipedia.org/wiki/Conditional_comment -- so you may simply want to remove all comments.
But it sounds like what you really want is an HTML sanitizer. I suggest you take a look at https://github.com/flavorjones/loofah or https://github.com/rgrove/sanitize