Convert any XML/HTML to JsonML using yxml
make html2json
cat test/basic.html | ./html2json | jq .[1].lang
"en"
# send json to a frontend (example: GTK)
curl https://news.ycombinator.com/rss | ./html2json | ./json2gtk
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Basic Example</title>
<link rel="stylesheet" />
</head>
<body id="home">
<input type="text"/>
<p>content</p>
</body>
</html> |
// doctype is ommited
["html",{"lang":"en"},[
["head", {}, [
["meta", {"charset": "utf-8"} ],
["title", {}, ["Basic Example"] ],
["link", {"rel": "stylesheet"} ]
]],
["body", {"id": "home"}, [
["input", {"type": "text"}],
["p", {}, ["content"]]
]]
]] |
yxml was added XHTML and HTML5 using:
- migrate
yxml_ret_t
to bitfield enum so multiple state can be returned (example : parsing>
in<p hidden>
will returnATTREND|ELEMSTART
) - accept lowercase
<!doctype
- read
<script>
,<style>
content as raw data until matching closing tag id found - accept unquoted attribute value
<form method=GET>
- accept value-less attribute
<p hidden id=p>
- handle void elements as self-closed (
<img>
will internaly generate<img></img>
), so alwo ignore end-tag of void elements (ex:</img>
)