Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider elements excluded from toc parsing #264

Open
mattgarrish opened this issue Oct 5, 2023 · 4 comments
Open

Reconsider elements excluded from toc parsing #264

mattgarrish opened this issue Oct 5, 2023 · 4 comments

Comments

@mattgarrish
Copy link
Member

mattgarrish commented Oct 5, 2023

We referred to the HTML spec's definition of sectioning root elements as part of the list to avoid when parsing the table of contents, but on reflection this was a bit of an odd choice.

For example, the list contains the body element which can't even occur inside itself so is never an issue. It also excludes td elements but not any other parts of a table (e.g., row headings would be parsed in).

We should try to find out what publishers actually adorn tables of contents with and make an exclusion list that better matches reality.

We should also remove the references to "outlines" being why the elements are excluded since this is no longer terminology used in the HTML spec.

@iherman
Copy link
Member

iherman commented Oct 6, 2023

Can we try to approach this from the other side, and make a list of element categories that are allowed for a ToC? After all I would expect that, in reality, the elements in use are fairly limited anyway...

@mattgarrish
Copy link
Member Author

I get the feeling we're being too permissive within the nav element. The complaint about the EPUB toc was that it couldn't accommodate anything more than labels, while many tables of contents have labels with additional descriptive text beside or below them (a short description of the chapter, a list of authors, etc.). The pub manifest toc will already filter this kind of secondary information out, so the whole purpose of these ignored elements is if you're putting complex structures like other navs, asides, tables, etc. inside your table of contents. In my experience, I haven't seen that kind of material added to a table of contents.

@iherman
Copy link
Member

iherman commented Oct 6, 2023

Right. Hence my proposal to go the other way. We obviously would allow for the various list elements, p, blockquote,all the phrasing content...

@mattgarrish
Copy link
Member Author

I'm not sure we need to get that deep into the details, but granted I haven't tried to read the parsing algorithm in a long time. If we process lists and extract the first a tag for the link/label, which I believe is how it works, we may not need to care about the rest.

The only guidance might be not to use ol/ul anywhere inside a toc unless it's part of defining the structure of the toc, as I believe the purpose of the restriction was to avoid accidentally descending into lists containing non-toc content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants