You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using HAP to load HTML files and transform them into something XML-valid (because I have existing APIs that take XDocument and do things with it).
Some of those contain XML Namespace declarations (often due to a lazy XSLT that doesn't omit them from its output; sometimes because it carries extra information in foreign namespaces on elements).
I've used 1.4.6 as DLL for the longest time, and only recently figured that I could/should switch to NuGet (which at that time already had 1.11.42). After a quick series of tests, I noticed that some files would fail; and I bisected it back to 1.6.3 being the first problematic version, while 1.6.2 still works as I expect it.
Note: I care less about the actual format; and more about it being valid XML in the end. The APIs where I hand the XDocument to don't necessarily care about the namespaces or their attributes, since they look at other aspects (such as certain HTML or XML tags, or simply the text content of elements that don't match a blacklist, etc.)
Not being able to obtain a valid XML here makes it a 100% to 0% drop, while a mangled element- or attribute-name (due to being in a namespace) is barely able to get 100% down to 98% (which is still considered "good enough" for what I'm doing there).
HAP fit the bill, and it did so with very little code - so I dropped it right in.
The code sample might not be the most optimal code possible; but it is what I ended up with (because it worked). So, in case I simply have to toggle a few switches, I'd also be ok with that.
2. Exception
System.ArgumentException: Invalid name character in 'xmlns:test'. The ':' character, hexadecimal value 0x3A, cannot be included in a name.
at System.Xml.XmlWellFormedWriter.CheckNCName(String ncname)
at System.Xml.XmlWellFormedWriter.WriteStartAttribute(String prefix, String localName, String namespaceName)
at HtmlAgilityPack.HtmlNode.WriteAttributes(XmlWriter writer, HtmlNode node)
at HtmlAgilityPack.HtmlNode.WriteTo(XmlWriter writer)
at HtmlAgilityPack.HtmlNode.WriteTo(XmlWriter writer)
at HtmlAgilityPack.HtmlDocument.Save(XmlWriter writer)
at Program.Main()
using System.Xml;using System.Xml.Linq;using HtmlAgilityPack;varhtmlDoc=new HtmlDocument();
htmlDoc.LoadHtml(@"<html xmlns:test=""urn:test:ns""> <body> <p test:this=""namespace"">don't mind this<br>line break</p> </body></html>");
htmlDoc.OptionOutputAsXml =true;varms=new MemoryStream();using(varwriter= XmlWriter.Create(ms,new XmlWriterSettings(){OmitXmlDeclaration=true,ConformanceLevel= ConformanceLevel.Fragment })){
htmlDoc.Save(writer);// this throws
ms.Position =0;vardoc= XDocument.Load(ms);// feed doc into something that expects an XDocument as input:
Console.WriteLine(doc.ToString());}
4. Any further technical details
Add any relevant detail can help us, such as:
HAP version: Any version since 1.6.3 fails; it worked up until 1.6.2
NET version: net48, but also net6.0
I think the change in #95 might be related to this.
And I also found the related #168, where I strongly believe the code sample is wrong (xmlns:MyNamespace="value" should be xmlns:value="MyNamespace", otherwise the result is an unused namespace MyNamespace along with the undeclared prefix value) and might have introduced further issues down the road.
The text was updated successfully, but these errors were encountered:
1. Description
I'm using HAP to load HTML files and transform them into something XML-valid (because I have existing APIs that take
XDocument
and do things with it).Some of those contain XML Namespace declarations (often due to a lazy XSLT that doesn't omit them from its output; sometimes because it carries extra information in foreign namespaces on elements).
I've used 1.4.6 as DLL for the longest time, and only recently figured that I could/should switch to NuGet (which at that time already had 1.11.42). After a quick series of tests, I noticed that some files would fail; and I bisected it back to 1.6.3 being the first problematic version, while 1.6.2 still works as I expect it.
Note: I care less about the actual format; and more about it being valid XML in the end. The APIs where I hand the
XDocument
to don't necessarily care about the namespaces or their attributes, since they look at other aspects (such as certain HTML or XML tags, or simply the text content of elements that don't match a blacklist, etc.)Not being able to obtain a valid XML here makes it a 100% to 0% drop, while a mangled element- or attribute-name (due to being in a namespace) is barely able to get 100% down to 98% (which is still considered "good enough" for what I'm doing there).
HAP fit the bill, and it did so with very little code - so I dropped it right in.
The code sample might not be the most optimal code possible; but it is what I ended up with (because it worked). So, in case I simply have to toggle a few switches, I'd also be ok with that.
2. Exception
3. Fiddle or Project
https://dotnetfiddle.net/Nd8vqF
4. Any further technical details
Add any relevant detail can help us, such as:
I think the change in #95 might be related to this.
And I also found the related #168, where I strongly believe the code sample is wrong (
xmlns:MyNamespace="value"
should bexmlns:value="MyNamespace"
, otherwise the result is an unused namespaceMyNamespace
along with the undeclared prefixvalue
) and might have introduced further issues down the road.The text was updated successfully, but these errors were encountered: