Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presentation XML caption refactoring #770

Closed
opoudjis opened this issue Nov 21, 2024 · 77 comments
Closed

Presentation XML caption refactoring #770

opoudjis opened this issue Nov 21, 2024 · 77 comments
Assignees
Labels
enhancement New feature or request

Comments

@opoudjis
Copy link
Contributor

This comes out of metanorma/isodoc#617

The refactoring of captions in Presentation XML is extremely slow for me: it has taken me three weeks of effort just to get through metanorma-iso, and I will likely be refactoring isodoc as I go with other flavours. The intention is to do a release in two weeks time, and I already do not think I will make that for all of Metanorma.

For that reason, it is very important to keep all work in a separate branch until all flavours of Metanorma are done. Even more critical, this refactor is now breaking STS (metanorma/mnconvert#418), so we may need to delay release until STS is addressed as well. And IEEE XML.

I will be updating you with each flavour as I finish it. Right now, ISO is the only flavour switched across.

This is the largest task because it impacts the most elements, but I am hoping it is much less work for you to do: it is adding semantic markup which you should be able to ignore, and moving captions and titles to a Presentation XML-specific element, which you will be using instead of /title and /name.

The updates are:

  • For all blocks (requirement, permission, recommendation, formula, table, figure, note, admonition, example, sourcecode, ul, ol, dl, term, termexample, termnote), the caption to be rendered is fmt-name. name if present is the Semantic XML element, consisting of just the user-supplied caption, and it is to be ignored. fmt-name will contain the autonumber and the block type.
  • For all clauses (foreword, introduction, acknowledgement, abstract, annex, appendix, terms, references, appendix), the title to be rendered is fmt-title. title if present is the Semantic XML element, consisting of just the user-supplied title, and it is to be ignored. fmt-title will contain the autonumber and the block type.
    • The fmt-* tags will be a design pattern in the new Presentation XML, of Presentation XML elements being marked up alongside the source Semantic XML elements, rather than overwriting them. In the refactoring, I will be introducing a lot more of these. Either the fmt-* elements or their child fmt-* elements hyperlink to the Semantic XML tags they are derived from through their source attribute.
  • Every instance of fmt-title and fmt-name will be accompanied by a fmt-xref-label element, indicating what label to use for cross-references to this block or clause. You are to ignore this element.
  • There will be a lot of span elements introduced inline, semantically annotating various kinds of delimiter in captions and cross-references to captions. Their class attribute will be fmt-*. Just render their contents.
  • Autonumbering in captions and cross-references will be marked up with the semx element, which hyperlinks the autonumbering information to the element it is numbering. As with span, you need only render its contents.
    • The element attribute gives the name of the Semantic XML element that the semx element derives its value from. The source element gives the GUID of the Semantic XML element that the semx element derives its value from.
  • All autonumbered blocks and clauses will have an autonum attribute giving the autonumber value.
  • Hierarchically formed autonumbers for blocks will have multiple semx values, one for each block being referenced in the hierarchy. So figure 1-3 will be marked up as <semx source={GUID for Figure 1}>1</semx> - <semx source={GUID for Subfigure 3}>3</semx>.
    • I have not yet done this for clause numbers, and I don't think there's as compelling a reason to.
    • Container-based xrefs (e.g. "Clause 3.1, Note 4") will hyperlink their components separately.

Example, Presentation XML before:

          <clause id="widgets"  displayorder='5'>
            <title depth="1">3
              <tab/>
              Widgets</title>
            <clause id="widgets1" inline-header="true">
              <title>3.1</title>
              <note id="note1">
                <name>NOTE 1</name>
                <p id="_">These results are based on a study carried out on three different
                          types of kernel.
                        </p>
              </note>
              <note id="note2">
                <name>NOTE 2</name>
                <p id="_">These results are based on a study carried out on three different
                          types of kernel.
                        </p>
              </note>
              <p>
                <xref target="note1">Note 1</xref>
                <xref target="note2">Note 2</xref>
              </p>
            </clause>
          </clause>

Example, Presentation XML after (GUIDs stripped for test use):

            <clause id="widgets" displayorder="5">
                <title id="_">Widgets</title>
                <fmt-title depth="1">
                   <span class="fmt-caption-label">
                      <semx element="autonum" source="widgets">3</semx>
                      <span class="fmt-caption-delim">
                         <tab/>
                      </span>
                      <semx element="title" source="_">Widgets</semx>
                   </span>
                </fmt-title>
                <fmt-xref-label>
                   <span class="fmt-element-name">Clause</span>
                   <semx element="autonum" source="widgets">3</semx>
                </fmt-xref-label>
                <clause id="widgets1" inline-header="true">
                   <fmt-title depth="2">
                      <span class="fmt-caption-label">
                         <semx element="autonum" source="widgets1">3.1</semx>
                      </span>
                   </fmt-title>
                   <fmt-xref-label>
                      <semx element="autonum" source="widgets1">3.1</semx>
                   </fmt-xref-label>
                   <note id="note1" autonum="1">
                      <fmt-name>
                         <span class="fmt-caption-label">
                            <span class="fmt-element-name">NOTE</span>
                            <semx element="autonum" source="note1">1</semx>
                         </span>
                      </fmt-name>
                      <fmt-xref-label>
                         <span class="fmt-element-name">Note</span>
                         <semx element="autonum" source="note1">1</semx>
                      </fmt-xref-label>
                      <p id="_">These results are based on a study carried out on three different types of kernel.</p>
                   </note>
                   <note id="note2" autonum="2">
                      <fmt-name>
                         <span class="fmt-caption-label">
                            <span class="fmt-element-name">NOTE</span>
                            <semx element="autonum" source="note2">2</semx>
                         </span>
                      </fmt-name>
                      <fmt-xref-label>
                         <span class="fmt-element-name">Note</span>
                         <semx element="autonum" source="note2">2</semx>
                      </fmt-xref-label>
                      <p id="_">These results are based on a study carried out on three different types of kernel.</p>
                   </note>
                   <p>
                      <xref target="note1">
                         <span class="fmt-element-name">Note</span>
                         <semx element="autonum" source="note1">1</semx>
                      </xref>
                      <xref target="note2">
                         <span class="fmt-element-name">Note</span>
                         <semx element="autonum" source="note2">2</semx>
                      </xref>
                   </p>
                </clause>
           </clause>

As noted: ignore title and name and use fmt-title and fmt-name instead. Ignore fmt-xref-label. Render the contents of any semx and span elements.

This mass introduction of semx and span into markup means that we will have a lot more mixed content in our XML than we used to: all titles, block captions, and xrefs will contain tags. Given what has been happening with docidentifier containing tags in PDF, that means you will need to ensure that rendering copes with all that.

@opoudjis opoudjis added the enhancement New feature or request label Nov 21, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Metanorma Nov 21, 2024
@opoudjis
Copy link
Contributor Author

Note that work is being done in PR branch feature/presxml-autonum. This is the Gemfile.devel for metanorma-iso:

gem "isodoc", git: "https://github.com/metanorma/isodoc", branch: "feature/presxml-autonum"
gem "isodoc-i18n", git: "https://github.com/metanorma/isodoc-i18n", branch: "fix/markup-connectives"
gem "mn-requirements", git: "https://github.com/metanorma/mn-requirements", branch: "feature/presxml-autonum"

@opoudjis
Copy link
Contributor Author

I will now be inserting any tabs between the note label and note content in presentation XML, so you won't need to:

Before, ISO:

<note>
  <name>NOTE</name>

After, ISO:

<note>
  <fmt-name>
         <span class="fmt-caption-label">
           <span class="fmt-element-name">NOTE</span>
         </span>
         <span class="fmt-label-delim">
            <tab/>
         </span>
  <fmt-name>

Before, IEEE:

<note>
  <name>NOTE — </name>

After, ISO:

<note>
  <fmt-name>
         <span class="fmt-caption-label">
           <span class="fmt-element-name">NOTE</span>
         </span>
         <span class="fmt-label-delim">—</span>
  <fmt-name>

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Nov 21, 2024

I think the best way is to do these actions in the the templates mode="update_xml_step1" (common.xsl):

  • in the elements requirement, permission, recommendation, formula, table, figure, note, admonition, example, sourcecode, ul, ol, dl, term, termexample, termnote,
    OR if the element name fmt-name nearby

    • remove name
    • rename fmt-name to name
  • in the elements foreword, introduction, acknowledgement, abstract, annex, appendix, terms, references,
    OR if the element title fmt-title nearby

    • remove title
    • rename fmt-title to title
  • remove fmt-xref-label

  • add the processing for the new elements span

@Intelligent2013
Copy link
Contributor

I have to comment <xsl:strip-space elements="iso:xref"/> in common.xsl (added in metanorma/metanorma-iso#852), because with stripped spaces inside xref the XML:

<xref type="inline" target="ISO20483"><span class="stdpublisher">ISO </span><span class="stddocNumber">20483</span>:<span class="stdyear">2013</span>, <span class="citeapp">Annex C</span> <span class="fmt-conn">and</span> <span class="citetbl">Table C.1</span></xref>

will be rendered so:
Annex CandTable C.1

and will review the solution for issue metanorma/metanorma-iso#852.

@opoudjis
Copy link
Contributor Author

opoudjis commented Nov 23, 2024

Yes, that is what I was afraid of. That's why I noted that there's going to be much more mixed content now than before.

Intelligent2013 added a commit that referenced this issue Nov 23, 2024
author Alexander Dyuzhev <[email protected]> 1732308339 +0300
committer Alexander Dyuzhev <[email protected]> 1732378925 +0300

common.xsl updated for new title and name format, #770
@Intelligent2013
Copy link
Contributor

ISO XSLT updated in #773.

@opoudjis I've generated the Presentation XML for https://github.com/metanorma/mn-samples-ieee/tree/main/sources/p987.6, and found a bug - <semx element="autonum" contains :

<term id="term-input-reference-axis">
	<p>
		<strong>input reference axis</strong>: The direction of an axis. <em>Syn:</em>
		<strong>IRA</strong>. </p>
	<termnote id="_6ec31af7-e76a-f673-6493-14d481e81b77" autonum="NOTE">
		<fmt-name>
			<span class="fmt-caption-label">NOTE <semx element="autonum" source="_6ec31af7-e76a-f673-6493-14d481e81b77"/>—</span>
			<span class="fmt-label-delim">—</span>
		</fmt-name>

Therefore it renders as long/double dash (first from <semx element="autonum" , second from <span class="fmt-label-delim">):
image

My Gemfile:

source "https://rubygems.org"

gem "metanorma-cli"

gem "metanorma-ieee", git: "https://github.com/metanorma/metanorma-ieee", branch: "feature/presxml-autonum"
gem "isodoc", git: "https://github.com/metanorma/isodoc", branch: "feature/presxml-autonum"
gem "isodoc-i18n", git: "https://github.com/metanorma/isodoc-i18n", branch: "fix/markup-connectives"
gem "mn-requirements", git: "https://github.com/metanorma/mn-requirements", branch: "feature/presxml-autonum"

gem "sassc"

@opoudjis
Copy link
Contributor Author

Thank you for finding that @Intelligent2013! But I am still not done with testing and therefore debugging metanorma-ieee; in fact, I am repeatedly going back and doing refactoring on isodoc as I find issues and redundancies downstream. With luck I may get through the rest of metanorma-ieee tonight.

@opoudjis
Copy link
Contributor Author

metanorma-ieee done

@opoudjis
Copy link
Contributor Author

metanorma-itu done

@opoudjis
Copy link
Contributor Author

In metanorma/metanorma-standoc#312, I had added an xref attribute to annex/title at your request @Intelligent2013 , as a processing hint for PDF:

Is there a possibility to include 'Appendix' title and number/letter into a separate element or attribute? When I calculated it via XSLT it was a simple to use one algorithm to put appendix number anywhere, but in presentation XML I have to calculated it again via xslt (it's a potential source of issues due possible different algorithms in presentation XML generation and XSLT) or extract from element title, but it's language specific problem.

This information is now available as /annex/fmt-xref-label (and I will do some refactoring to also make it retrievable from /annex/fmt-name/ ). I request that I get rid of the xref attribute.

@opoudjis
Copy link
Contributor Author

metanorma-nist done

@Intelligent2013
Copy link
Contributor

IEEE XSLT updated in #773.

@Intelligent2013
Copy link
Contributor

ITU XSLT updated in #773.

Intelligent2013 added a commit that referenced this issue Dec 4, 2024
@Intelligent2013
Copy link
Contributor

metanorma-bipm done

@opoudjis issue found - in the Index there are both references to the title/bookmark and fmt-title/bookmark:

<clause id="_appendix_3_the_base_unitsbase_units_of_the_si" obligation="normative" unnumbered="true">
	<title id="_18cc76a6-96da-4886-9d62-248e129a14fb">Appendix 3. The base units<bookmark id="_a17d01cc-8089-4eb9-88be-33ff27fad132"/> of the SI</title>
	<fmt-title depth="5">
		<semx element="title" source="_18cc76a6-96da-4886-9d62-248e129a14fb">Appendix 3. The base units<bookmark id="_a3cbeb24-fde5-40cc-bd17-ea2081c8bcd4"/> of the SI</semx>
	</fmt-title>
<indexsect id="_17dde3a6-ae02-4a2c-b121-c963cdd0f029" displayorder="15">
...
<li>base unit(s),  ...

		<xref target="_a17d01cc-8089-4eb9-88be-33ff27fad132" pagenumber="true">"<semx element="title" source="_appendix_3_the_base_unitsbase_units_of_the_si">Appendix 3. The base unitsbase unit(s) of the SI</semx>"</xref>, 
		
		<xref target="_a3cbeb24-fde5-40cc-bd17-ea2081c8bcd4" pagenumber="true">"<semx element="title" source="_appendix_3_the_base_unitsbase_units_of_the_si">Appendix 3. The base unitsbase unit(s) of the SI</semx>"</xref>,
...
</li>

As the element title ignored, the PDF renders with spaces for missing refs:
image

And in PDF log there are messages:

Page 110: Unresolved ID reference "_a17d01cc-8089-4eb9-88be-33ff27fad132" found.
Page 110: Unresolved ID reference "_a70f554b-425c-4510-ac42-977e33286cbf" found.
Page 110: Unresolved ID reference "_8d8d5193-6bb7-4017-bb64-5e2756145d00" found.
....

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 5, 2024

in the Index there are both references to the title/bookmark and fmt-title/bookmark:

That's an issue I knew was coming, but I hoped had not spread yet—that IDs are being replicated between title and fmt-title. I am dodging it in floating-title, by reassigning floating-title/*[@id] to @original-id; I need to generalise that.

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 5, 2024

In addition, I can't leave index items in Semantic XML elements, they need to be moved to the Presentation XML elements, globally.

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 5, 2024

Fixed, try it out now. The <index> is removed from title, it is only left in fmt-title.

@Intelligent2013
Copy link
Contributor

Plateau XSLT updated in #773.

Now, I'll test the BIPM.

@Intelligent2013
Copy link
Contributor

Fixed, try it out now. The <index> is removed from title, it is only left in fmt-title.

@opoudjis still there are references between index//xref and title/bookmark.
The previous PDF generation log contains 167 errors Unresolved ID reference "..." found.
Now, 88 errors.
For instance, _1d07cd96-a110-4d66-9d02-cb7fc1eede83:

<clause id="cgpm12th1964r7" unnumbered="true" obligation="normative">
<title type="quoted"><blacksquare/><strong>Curie</strong> (<link target="https://www.bipm.org/en/committees/cg/cgpm/12-1964/resolution-7">CGPM RES 7 (1964, E)</link>)<bookmark id="_1d07cd96-a110-4d66-9d02-cb7fc1eede83"/></title><fmt-title type="quoted" depth="3"><blacksquare/><strong>Curie</strong> (<link target="https://www.bipm.org/en/committees/cg/cgpm/12-1964/resolution-7">CGPM RES 7 (1964, E)</link>)<bookmark id="_f6184b8a-e919-4531-be6f-28712e0ee20f"/></fmt-title>
<indexsect 
<xref target="_1d07cd96-a110-4d66-9d02-cb7fc1eede83" pagenumber="true">"<semx element="title" source="cgpm12th1964r7">[cgpm12th1964r7]</semx>"</xref>

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 6, 2024

Hm, ok. This is all BIPM Brochure, I just was reluctant to compile the whole thing, but clearly I'll need to...

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 6, 2024

I was catching all title/fmt-title duplications, but the one specific to BIPM with quoted titles. I am now testing the Brochure outputs to ensure no further unresolved ids (those are xrefs pointing to title or name, which are now excluded from rendering, instead of their duplicates in fmt-title and fmt-name.)

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 6, 2024

So compiling the English-language BIPM brochure, there are 1327 index entries. Of these, 88 are under title, and thus will go missing on rendering, since rendering uses fmt-title instead (which duplicates those titles); that is what you also found, and they should be the quoted titles I had missed last time—so we've clearly been looking at the same document.

Recompiling with the bug fix, the ancestors of index links are now

{"clause" => 503, 
"table" => 148, 
"xref" => 22, 
"note" => 26, 
"li" => 337, 
"fmt-title" => 167, 
"dl" => 26, 
"fmt-name" => 6, 
"dt" => 4}

There were a couple of surprises in there (there are index entries pointing to footnotes and to cross-reference text), but i think this means we are now good to go.

@Intelligent2013
Copy link
Contributor

BIPM and JCGM XSLT updated in #773.

@opoudjis I've just found the double :: in termnote name when I've tested JCGM document. The double :: there is also in old Presentation XML.

Old Presentation XML:

<termnote id="_49888e25-31af-6205-b3e3-269d2cc55cfb"><name>Note 1 to entry:: </name>

New Presentation XML:

<fmt-name><span class="fmt-caption-label">Note <semx element="autonum" source="_49888e25-31af-6205-b3e3-269d2cc55cfb">1</semx> to entry:</span><span class="fmt-label-delim">: </span></fmt-name>

@Intelligent2013
Copy link
Contributor

So, at this moment all XSLT updated for new Presentation XML in:

Found issues in:

opoudjis added a commit to metanorma/metanorma-bipm that referenced this issue Dec 7, 2024
@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 7, 2024

BSI fixed. References in BIPM from index to title/bookmark should all have been fixed yesterday, I found no remaining instances. Removed redundant colon in BIPM termnotes.

@opoudjis
Copy link
Contributor Author

opoudjis commented Dec 7, 2024

Please confirm all is ready, intend to release Monday.

@Intelligent2013
Copy link
Contributor

BSI fixed,

Confirmed. Thank you!

@Intelligent2013
Copy link
Contributor

References in BIPM from index to title/bookmark should all have been fixed yesterday, I found no remaining instances. Removed redundant colon in BIPM termnotes.

Confirmed.

I'll merge the XSLT PRs today.

Intelligent2013 added a commit that referenced this issue Dec 7, 2024
@Intelligent2013 Intelligent2013 moved this from 🆕 New to 👀 In review in Metanorma Dec 7, 2024
Intelligent2013 added a commit that referenced this issue Dec 7, 2024
common.xsl updated for new title and name format, #770
@Intelligent2013
Copy link
Contributor

XSLTs merged into the main branch in metanorma-... repositories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

3 participants