Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF tagging: tables are structured wrongly, and include non-table content #1001

Open
ronaldtse opened this issue May 29, 2023 · 8 comments
Open
Assignees
Labels

Comments

@ronaldtse
Copy link
Contributor

The tables are structured elaborately but wrongly, and include non-table content in the table

From metanorma/mn2pdf#201

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 7, 2023

  • empty helper TR for (continued) text should be removed:
    image

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 7, 2023
@Intelligent2013
Copy link
Contributor

The table tags flattened:
image

@Intelligent2013
Copy link
Contributor

  • there aren't tags for THead, TBody and TFoot:
    image

@Intelligent2013
Copy link
Contributor

The PDF tags THead, TBody and TFoot are missing due this code in the Apache FOP class PDFStructureTreeBuilder. In case of PDF/A or PDF/UA these tags will be skipped:

    public StructureTreeElement startNode(String name, Attributes attributes, StructureTreeElement parent) {
        if (!isPDFA1Safe(name)) {
            return null;
        }
       ...
...
    private boolean isPDFA1Safe(String name) {
        return !((pdfFactory.getDocument().getProfile().getPDFAMode().isPart1()
                || pdfFactory.getDocument().getProfile().getPDFUAMode().isEnabled())
                && (name.equals("table-body")
                || name.equals("table-header")
                || name.equals("table-footer"));
    }

I've tried to remove the condition for PDF/UA (pdfFactory.getDocument().getProfile().getPDFUAMode().isEnabled()), and the resulted PDF contains all tags:
image

The Adobe Acrobat Preflight feature Fix problems in PDF tagging structure doesn't find any issues:
image

The feature Verify compliance with PDF/UA-1 (syntax check only) also doesn't find the issues relate to the table's header, footer:
image
Note: 5 issues present also in the PDF generated without THead, TBody and TFoot.

The PAC tool (PDF Accessibility Checker) reports ok regarding the structure elements:
image

Actually, I don't know why the condition pdfFactory.getDocument().getProfile().getPDFUAMode().isEnabled() was added in the method isPDFA1Safe. May be for old PDF versions... So, I'll remove this condition.
Note: the condition added in Apache FOP v2.1, see the issue https://issues.apache.org/jira/browse/FOP-2488.

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jun 15, 2023

The table splits into two tables:
image

because the table's notes and footnotes renders in the separated table in XSL-FO (this was made for simplicity):

</fo:table>
<fo:table keep-with-previous="always" table-omit-footer-at-break="true" table-layout="fixed" border="1pt solid black" border-top="0pt solid black" width="100%">
	<fo:table-column column-width="proportional-column-width(1270)"/>
	<fo:table-column column-width="proportional-column-width(639)"/>
	<fo:table-column column-width="proportional-column-width(1111)"/>
	<fo:table-column column-width="proportional-column-width(948)"/>
	<fo:table-column column-width="proportional-column-width(921)"/>
	<fo:table-body>
		<fo:table-row>
			<fo:table-cell border="1pt solid black" padding-left="1mm" padding-right="1mm" padding-top="1mm" border-top="solid black 0pt" number-columns-spanned="5">
				<fo:block font-size="9pt" margin-bottom="6pt">
					<fo:inline padding-right="2mm">NOTE  1</fo:inline>This table is based on

To do: in XSL-FO - render table's notes and footnotes in the table-footer:

  • if there is tfoot in the Metanorma XML, then add notes into it;
  • if there isn't tfoot in the Metanorma XML, then add the XSL-FO table-footer and add notes into it.

Metanorma presentation XML instance with tfoot:

	</tbody>
	<tfoot>
		<tr>
			<td colspan="5" valign="top" align="left">
				<p id="_2e80c795-c60e-bbda-c227-86750c6557ee">Live insects shall not be present. Dead insects shall be included in extraneous matter.</p>
			</td>
		</tr>
	</tfoot>
	<note id="_61c8670b-2c17-b33a-4e5f-7eb3ebaa8018">
		<name>NOTE  1</name>
		<p id="_c68b15fe-a9c1-4d7e-b35d-09915e32c5ef">This table is based on <xref type="inline" target="ISO7301">ISO 7301:2011, <span class="citetbl">Table 1</span>
			</xref>
			<fn reference="d">
				<p id="_9ffcb1e6-34d6-cb36-20b2-87ecd980b6eb">Cancelled and replaced by ISO 7301:2021.</p>
			</fn>.</p>
	</note>
	...
</table>

Intelligent2013 added a commit to metanorma/mn2pdf that referenced this issue Jun 15, 2023
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jun 15, 2023
@Intelligent2013
Copy link
Contributor

There is a side-effect after moving the note and footnotes into the table footer. The table big footer renders from the new page and there is the empty space on the previous page:
image
I.e. table footer doesn't split across pages. I'll try to find how to split table footer, if it's even possible.

@Intelligent2013
Copy link
Contributor

Regarding issue with table footer break I've found my 6 years ago question on SO https://stackoverflow.com/questions/41625388/table-footer-doesnt-page-break-using-apache-fop#comment80796453_41625388
Looks like that's why I've moved the long table footer into the separate table.

I'll add the ticket on Apache FOP bug tracker.

@Intelligent2013
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🌋 Urgent
Development

No branches or pull requests

2 participants