cl-html5-parser is a HTML5 parser for Common Lisp with the following features:
-
It is a port of the Python library html5lib.
-
It passes all relevant tests from html5lib.
-
It is not tied to a specific DOM implementation.
Parsing functions are in the package HTML5-PARSER.
f parse-html5 source &key encoding strictp tree-builder ⇒ document, errors, tree-builder-instance
Parse an HTML document from source. Source can be a string, a pathname or a stream. When parsing from a stream encoding detection is not supported, encoding must be supplied via the encoding keyword parameter.
When strictp is true, parsing stops on first error.
tree-builder is a designator for a tree-builder class used to build the document.
Returns tree values. The type of document depends on the tree-builder used. Errors is a list of errors found during parsing. The format of this list is subject to change. The last value is the tree-builder-instance used to build the document.
f parse-html5-fragment source &key container encoding strictp tree-builder ⇒ document, errors, tree-builder-instance
Parses a fragment of HTML. Container sets the context, defaults to "div".
For det other parameters see PARSE-HTML5
.
Parsing HTML5 is not possible without a DOM. cl-html5-parser defines a minimal DOM interface via a set of generic functions, including a simple default implementation. See treebuilder.lisp.
All generic functions take a tree instance as the first
parameter. This is an instance of the tree-builder class supplied to
PARSE-HTML5
. When implementing your own tree builder you can
dispatch on this parameter. Thus it’s possible to use what ever
representation you want for the document, without risk of interfering with
other tree builders.
f html5-tree:tree-to-xmls tree node &optional include-namespace-p ⇒ list
Converts a DOM tree into a simple XMLS-like list structure. This is a convenience function as this data model is too anemic to implement as a separate tree builder. Tree and node are the tree-builder and document objects returned from PARSE-HTML5.
This library is available under the GNU Lesser General Public License v3.0.