PDF tagging: tables are structured wrongly, and include non-table content #1001

ronaldtse · 2023-05-29T13:55:09Z

The tables are structured elaborately but wrongly, and include non-table content in the table

From metanorma/mn2pdf#201

Intelligent2013 · 2023-06-07T12:29:22Z

empty helper TR for (continued) text should be removed:

Intelligent2013 · 2023-06-07T13:33:50Z

The table tags flattened:

Intelligent2013 · 2023-06-07T17:43:46Z

there aren't tags for THead, TBody and TFoot:

Intelligent2013 · 2023-06-15T10:30:33Z

The PDF tags THead, TBody and TFoot are missing due this code in the Apache FOP class PDFStructureTreeBuilder. In case of PDF/A or PDF/UA these tags will be skipped:

    public StructureTreeElement startNode(String name, Attributes attributes, StructureTreeElement parent) {
        if (!isPDFA1Safe(name)) {
            return null;
        }
       ...
...
    private boolean isPDFA1Safe(String name) {
        return !((pdfFactory.getDocument().getProfile().getPDFAMode().isPart1()
                || pdfFactory.getDocument().getProfile().getPDFUAMode().isEnabled())
                && (name.equals("table-body")
                || name.equals("table-header")
                || name.equals("table-footer"));
    }

I've tried to remove the condition for PDF/UA (pdfFactory.getDocument().getProfile().getPDFUAMode().isEnabled()), and the resulted PDF contains all tags:

The Adobe Acrobat Preflight feature Fix problems in PDF tagging structure doesn't find any issues:

The feature Verify compliance with PDF/UA-1 (syntax check only) also doesn't find the issues relate to the table's header, footer:

Note: 5 issues present also in the PDF generated without THead, TBody and TFoot.

The PAC tool (PDF Accessibility Checker) reports ok regarding the structure elements:

Actually, I don't know why the condition pdfFactory.getDocument().getProfile().getPDFUAMode().isEnabled() was added in the method isPDFA1Safe. May be for old PDF versions... So, I'll remove this condition.
Note: the condition added in Apache FOP v2.1, see the issue https://issues.apache.org/jira/browse/FOP-2488.

Intelligent2013 · 2023-06-15T13:06:22Z

The table splits into two tables:

because the table's notes and footnotes renders in the separated table in XSL-FO (this was made for simplicity):

</fo:table>
<fo:table keep-with-previous="always" table-omit-footer-at-break="true" table-layout="fixed" border="1pt solid black" border-top="0pt solid black" width="100%">
	<fo:table-column column-width="proportional-column-width(1270)"/>
	<fo:table-column column-width="proportional-column-width(639)"/>
	<fo:table-column column-width="proportional-column-width(1111)"/>
	<fo:table-column column-width="proportional-column-width(948)"/>
	<fo:table-column column-width="proportional-column-width(921)"/>
	<fo:table-body>
		<fo:table-row>
			<fo:table-cell border="1pt solid black" padding-left="1mm" padding-right="1mm" padding-top="1mm" border-top="solid black 0pt" number-columns-spanned="5">
				<fo:block font-size="9pt" margin-bottom="6pt">
					<fo:inline padding-right="2mm">NOTE  1</fo:inline>This table is based on

To do: in XSL-FO - render table's notes and footnotes in the table-footer:

if there is tfoot in the Metanorma XML, then add notes into it;
if there isn't tfoot in the Metanorma XML, then add the XSL-FO table-footer and add notes into it.

Metanorma presentation XML instance with tfoot:

	</tbody>
	<tfoot>
		<tr>
			<td colspan="5" valign="top" align="left">
				<p id="_2e80c795-c60e-bbda-c227-86750c6557ee">Live insects shall not be present. Dead insects shall be included in extraneous matter.</p>
			</td>
		</tr>
	</tfoot>
	<note id="_61c8670b-2c17-b33a-4e5f-7eb3ebaa8018">
		<name>NOTE  1</name>
		<p id="_c68b15fe-a9c1-4d7e-b35d-09915e32c5ef">This table is based on <xref type="inline" target="ISO7301">ISO 7301:2011, <span class="citetbl">Table 1</span>
			</xref>
			<fn reference="d">
				<p id="_9ffcb1e6-34d6-cb36-20b2-87ecd980b6eb">Cancelled and replaced by ISO 7301:2021.</p>
			</fn>.</p>
	</note>
	...
</table>

Intelligent2013 · 2023-06-15T16:00:25Z

There is a side-effect after moving the note and footnotes into the table footer. The table big footer renders from the new page and there is the empty space on the previous page:

I.e. table footer doesn't split across pages. I'll try to find how to split table footer, if it's even possible.

Intelligent2013 · 2023-06-15T16:41:25Z

Regarding issue with table footer break I've found my 6 years ago question on SO https://stackoverflow.com/questions/41625388/table-footer-doesnt-page-break-using-apache-fop#comment80796453_41625388
Looks like that's why I've moved the long table footer into the separate table.

I'll add the ticket on Apache FOP bug tracker.

Intelligent2013 · 2023-06-15T17:14:48Z

Issue added: https://issues.apache.org/jira/browse/FOP-3134

ronaldtse added the bug label May 29, 2023

ronaldtse assigned Intelligent2013 May 29, 2023

ronaldtse added this to Metanorma May 29, 2023

github-project-automation bot moved this to 🆕 New in Metanorma May 29, 2023

ronaldtse mentioned this issue May 29, 2023

PDF tagging issues discovered by the PDF Association experts metanorma/mn2pdf#201

Open

8 tasks

ronaldtse moved this from 🆕 New to 🌋 Urgent in Metanorma May 29, 2023

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 7, 2023

XSLT updated for table PDF tags, metanorma/metanorma-iso#1001

9507fdb

Intelligent2013 added a commit to metanorma/mn2pdf that referenced this issue Jun 15, 2023

updated for TBody, THead, TFoot, metanorma/metanorma-iso#1001

dfc8255

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 15, 2023

common.xslt updated for table footer, metanorma/metanorma-iso#1001

d96d94e

Intelligent2013 added a commit to metanorma/mn2pdf that referenced this issue Jun 15, 2023

updated v1.74, metanorma/metanorma-iso#1001

5820eb0

Intelligent2013 mentioned this issue Nov 16, 2024

Fix: relax PDF checking for table tags metanorma/xmlgraphics-fop#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF tagging: tables are structured wrongly, and include non-table content #1001

PDF tagging: tables are structured wrongly, and include non-table content #1001

ronaldtse commented May 29, 2023

Intelligent2013 commented Jun 7, 2023 •

edited

Loading

Intelligent2013 commented Jun 7, 2023

Intelligent2013 commented Jun 7, 2023

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 15, 2023 •

edited

Loading

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 15, 2023

PDF tagging: tables are structured wrongly, and include non-table content #1001

PDF tagging: tables are structured wrongly, and include non-table content #1001

Comments

ronaldtse commented May 29, 2023

Intelligent2013 commented Jun 7, 2023 • edited Loading

Intelligent2013 commented Jun 7, 2023

Intelligent2013 commented Jun 7, 2023

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 15, 2023 • edited Loading

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 15, 2023

Intelligent2013 commented Jun 7, 2023 •

edited

Loading

Intelligent2013 commented Jun 15, 2023 •

edited

Loading