Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run tp conversion on phytochemistry articles #72

Open
tcatapano opened this issue Feb 11, 2024 · 4 comments
Open

run tp conversion on phytochemistry articles #72

tcatapano opened this issue Feb 11, 2024 · 4 comments
Assignees
Labels
article issues related to conversion of full articles conversion gg markup xslt

Comments

@tcatapano
Copy link
Member

starting with sample file Phytochemistry.103.67-75.pdf.xml

@tcatapano tcatapano added gg markup xslt article issues related to conversion of full articles conversion labels Feb 11, 2024
@tcatapano tcatapano self-assigned this Feb 11, 2024
@tcatapano
Copy link
Member Author

Problem with TP result:

Two consecutive treatments in the source are being merged into a single treatment, with the two nomenclature sections placed one after another at the start of the treatment: see

https://github.com/plazi/ggxml2taxpub/blob/8f605eab9dd119dea1f287c67768d1aa6dde6b45/level1/articles/non-tax/Phytochemistry.103.67-75.pdf_tp.xml#L1854C2-L1861C2

        <tp:mixed-nomenclature> 2.1. Floral scent composition of <tp:taxon-name>Nymphaea subg.
               Hydrocallis</tp:taxon-name>
         </tp:mixed-nomenclature>
         <tp:mixed-nomenclature> 2.2. Floral scent variations within
               <tp:taxon-name>Nymphaea</tp:taxon-name> and <tp:taxon-name>Victoria</tp:taxon-name>
         </tp:mixed-nomenclature>
         <tp:treatment-sec sec-type="description">
            <p> The six species and two subspecies of <tp:taxon-name> Nymphaea subg. Hydrocallis

@tcatapano
Copy link
Member Author

tcatapano commented Feb 11, 2024

Think the problem is here: (

<xsl:template match="//treatment">
<tp:taxon-treatment>
<xsl:call-template name="treatment-metadata"/>
<xsl:apply-templates select="//subSubSection[@type = 'nomenclature']"/>
<xsl:apply-templates select="//subSubSection[not(@type = 'nomenclature')]"/>
</tp:taxon-treatment>
)

this should be

xsl:apply-templates select=".//subSubSection[@type = 'nomenclature']"/> 
         <xsl:apply-templates select=".//subSubSection[not(@type = 'nomenclature')]"/> 

to iterate over descendant subSubSection's of the current treatment, not in the document as a whole

@tcatapano
Copy link
Member Author

The xpath fix worked. Resulting file is now valid. Now run over larger sample set.

tcatapano added a commit that referenced this issue Feb 11, 2024
@tcatapano
Copy link
Member Author

Conversion on full batch results in most files being valid. Errors are in:

  22 Phytochemistry.157.168-174.pdf_tp.xml
   3 Phytochemistry.189.112824.pdf_tp.xml
   2 Phytochemistry.187.112776.pdf_tp.xml
   2 Phytochemistry.186.112741.pdf_tp.xml
   1 Phytochemistry.193.112970.pdf_tp.xml
   1 Phytochemistry.191.112908.pdf_tp.xml
   1 Phytochemistry.163.196-197.pdf_tp.xml
   1 Phytochemistry.157.158-167.pdf_tp.xml
   1 Phytochemistry.153.58-63.pdf_tp.xml

see: https://github.com/plazi/ggxml2taxpub/blob/master/errs/phytochemistry_errors_20240211_frq.txt:

22 Unexpected character data "
6 Unexpected element "title". The content of the parent element type must match "(sec-meta?,((label,title?)|title),(address|alternatives|answer|answer-set|array|block-alternatives|boxed-text|chem-struct-wrap|code|explanation|fig|fig-group|graphic|media|preformat|question|question-wrap|question-wrap-group|supplementary-material|table-wrap|table-wrap-group|disp-formula|disp-formula-group|def-list|list|tex-math|mml:math|p|related-article|related-object|disp-quote|speech|statement|verse-group)*,(sec|tp
2 Unexpected element "sec". The content of the parent element type must match "(sec-meta?,label?,title?,(address|alternatives|answer|answer-set|array|block-alternatives|boxed-text|chem-struct-wrap|code|explanation|fig|fig-group|graphic|media|preformat|question|question-wrap|question-wrap-group|supplementary-material|table-wrap|table-wrap-group|disp-formula|disp-formula-group|def-list|list|tex-math|mml:math|p|related-article|related-object|disp-quote|speech|statement|verse-group)*,tp
1 Unexpected element "tp:treatment-sec". The content of the parent element type must match "((address|alternatives|answer|answer-set|array|block-alternatives|boxed-text|chem-struct-wrap|code|explanation|fig|fig-group|graphic|media|preformat|question|question-wrap|question-wrap-group|supplementary-material|table-wrap|table-wrap-group|disp-formula|disp-formula-group|def-list|list|tex-math|mml
1 Unexpected element "tp:mixed-nomenclature". The content of the parent element type must match "((address|alternatives|answer|answer-set|array|block-alternatives|boxed-text|chem-struct-wrap|code|explanation|fig|fig-group|graphic|media|preformat|question|question-wrap|question-wrap-group|supplementary-material|table-wrap|table-wrap-group|disp-formula|disp-formula-group|def-list|list|tex-math|mml
1 Unexpected element "sec". The content of the parent element type must match "(tp:taxon-name|tp
1 The content of element type "kwd-group" is incomplete, it must match "(label?,title?,(kwd|compound-kwd|nested-kwd)+)".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
article issues related to conversion of full articles conversion gg markup xslt
Projects
None yet
Development

No branches or pull requests

1 participant