Reconsider elements excluded from toc parsing #264

mattgarrish · 2023-10-05T15:24:52Z

We referred to the HTML spec's definition of sectioning root elements as part of the list to avoid when parsing the table of contents, but on reflection this was a bit of an odd choice.

For example, the list contains the body element which can't even occur inside itself so is never an issue. It also excludes td elements but not any other parts of a table (e.g., row headings would be parsed in).

We should try to find out what publishers actually adorn tables of contents with and make an exclusion list that better matches reality.

We should also remove the references to "outlines" being why the elements are excluded since this is no longer terminology used in the HTML spec.

The text was updated successfully, but these errors were encountered:

iherman · 2023-10-06T06:48:58Z

Can we try to approach this from the other side, and make a list of element categories that are allowed for a ToC? After all I would expect that, in reality, the elements in use are fairly limited anyway...

mattgarrish · 2023-10-06T11:23:47Z

I get the feeling we're being too permissive within the nav element. The complaint about the EPUB toc was that it couldn't accommodate anything more than labels, while many tables of contents have labels with additional descriptive text beside or below them (a short description of the chapter, a list of authors, etc.). The pub manifest toc will already filter this kind of secondary information out, so the whole purpose of these ignored elements is if you're putting complex structures like other navs, asides, tables, etc. inside your table of contents. In my experience, I haven't seen that kind of material added to a table of contents.

iherman · 2023-10-06T12:19:55Z

Right. Hence my proposal to go the other way. We obviously would allow for the various list elements, p, blockquote,all the phrasing content...

mattgarrish · 2023-10-06T13:01:13Z

I'm not sure we need to get that deep into the details, but granted I haven't tried to read the parsing algorithm in a long time. If we process lists and extract the first a tag for the link/label, which I believe is how it works, we may not need to care about the rest.

The only guidance might be not to use ol/ul anywhere inside a toc unless it's part of defining the structure of the toc, as I believe the purpose of the restriction was to avoid accidentally descending into lists containing non-toc content.

mattgarrish mentioned this issue Nov 9, 2023

Inline definition of sectioning root elements #263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider elements excluded from toc parsing #264

Reconsider elements excluded from toc parsing #264

mattgarrish commented Oct 5, 2023 •

edited

Loading

iherman commented Oct 6, 2023

mattgarrish commented Oct 6, 2023

iherman commented Oct 6, 2023

mattgarrish commented Oct 6, 2023

Reconsider elements excluded from toc parsing #264

Reconsider elements excluded from toc parsing #264

Comments

mattgarrish commented Oct 5, 2023 • edited Loading

iherman commented Oct 6, 2023

mattgarrish commented Oct 6, 2023

iherman commented Oct 6, 2023

mattgarrish commented Oct 6, 2023

mattgarrish commented Oct 5, 2023 •

edited

Loading