Skip to content

HTML5 parser for Common Lisp

License

GPL-3.0, LGPL-3.0 licenses found

Licenses found

GPL-3.0
COPYING
LGPL-3.0
COPYING.LIB
Notifications You must be signed in to change notification settings

Spawno-de/cl-html5-parser

 
 

Repository files navigation

cl-html5-parser: HTML5 parser for Common Lisp

Abstract

cl-html5-parser is a HTML5 parser for Common Lisp with the following features:

  • It is a port of the Python library html5lib.

  • It passes all relevant tests from html5lib.

  • It is not tied to a specific DOM implementation.

Requirements

  • It has only been tested on SBCL.

  • CL-PPCRE and FLEXI-STREAMS.

Usage

Parsing

Parsing functions are in the package HTML5-PARSER.

f parse-html5 source &key encoding strictp tree-builder ⇒ document, errors, tree-builder-instance

Parse an HTML document from source. Source can be a string, a pathname or a stream. When parsing from a stream encoding detection is not supported, encoding must be supplied via the encoding keyword parameter.

When strictp is true, parsing stops on first error.

tree-builder is a designator for a tree-builder class used to build the document.

Returns tree values. The type of document depends on the tree-builder used. Errors is a list of errors found during parsing. The format of this list is subject to change. The last value is the tree-builder-instance used to build the document.

f parse-html5-fragment source &key container encoding strictp tree-builder ⇒ document, errors, tree-builder-instance

Parses a fragment of HTML. Container sets the context, defaults to "div". For det other parameters see PARSE-HTML5.

Tree builders

Parsing HTML5 is not possible without a DOM. cl-html5-parser defines a minimal DOM interface via a set of generic functions, including a simple default implementation. See treebuilder.lisp.

All generic functions take a tree instance as the first parameter. This is an instance of the tree-builder class supplied to PARSE-HTML5. When implementing your own tree builder you can dispatch on this parameter. Thus it’s possible to use what ever representation you want for the document, without risk of interfering with other tree builders.

Other

f html5-tree:tree-to-xmls tree node &optional include-namespace-p ⇒ list

Converts a DOM tree into a simple XMLS-like list structure. This is a convenience function as this data model is too anemic to implement as a separate tree builder. Tree and node are the tree-builder and document objects returned from PARSE-HTML5.

License

This library is available under the GNU Lesser General Public License v3.0.

About

HTML5 parser for Common Lisp

Resources

License

GPL-3.0, LGPL-3.0 licenses found

Licenses found

GPL-3.0
COPYING
LGPL-3.0
COPYING.LIB

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published