-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider elements excluded from toc parsing #264
Comments
Can we try to approach this from the other side, and make a list of element categories that are allowed for a ToC? After all I would expect that, in reality, the elements in use are fairly limited anyway... |
I get the feeling we're being too permissive within the nav element. The complaint about the EPUB toc was that it couldn't accommodate anything more than labels, while many tables of contents have labels with additional descriptive text beside or below them (a short description of the chapter, a list of authors, etc.). The pub manifest toc will already filter this kind of secondary information out, so the whole purpose of these ignored elements is if you're putting complex structures like other navs, asides, tables, etc. inside your table of contents. In my experience, I haven't seen that kind of material added to a table of contents. |
Right. Hence my proposal to go the other way. We obviously would allow for the various list elements, p, blockquote,all the phrasing content... |
I'm not sure we need to get that deep into the details, but granted I haven't tried to read the parsing algorithm in a long time. If we process lists and extract the first a tag for the link/label, which I believe is how it works, we may not need to care about the rest. The only guidance might be not to use ol/ul anywhere inside a toc unless it's part of defining the structure of the toc, as I believe the purpose of the restriction was to avoid accidentally descending into lists containing non-toc content. |
We referred to the HTML spec's definition of sectioning root elements as part of the list to avoid when parsing the table of contents, but on reflection this was a bit of an odd choice.
For example, the list contains the
body
element which can't even occur inside itself so is never an issue. It also excludestd
elements but not any other parts of a table (e.g., row headings would be parsed in).We should try to find out what publishers actually adorn tables of contents with and make an exclusion list that better matches reality.
We should also remove the references to "outlines" being why the elements are excluded since this is no longer terminology used in the HTML spec.
The text was updated successfully, but these errors were encountered: