Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc should use: --from markdown-yaml_metadata_block #78

Open
dlmiles opened this issue Nov 3, 2024 · 4 comments · May be fixed by #84
Open

pandoc should use: --from markdown-yaml_metadata_block #78

dlmiles opened this issue Nov 3, 2024 · 4 comments · May be fixed by #84
Assignees
Labels
bug Something isn't working

Comments

@dlmiles
Copy link
Contributor

dlmiles commented Nov 3, 2024

This is in related to the https://discord.com/channels/1009193568256135208/1302455447269281823 thread on well formed markdown format docs not completing the docs GHA task.

The use of \_ appears to be related to the TeX format requirement not a MD format requirement. Pandoc should have converted/escaped the MD document for TeX format if that is what it needs to process it. An internal bug there somewhere.

MD allows use of horizontal rules https://www.markdownguide.org/basic-syntax/#horizontal-rules in the form --- and this appears to be triggering the YAML parser used to parse the top block.

Top of document looks like:

---
documentclass: scrartcl
geometry: "left=2cm,right=2cm,top=2cm,bottom=3cm"
fontsize: 14pt
mainfont: Latin Modern Sans
header-includes:
- \usepackage{hyperref}
- \hypersetup{colorlinks=false,
          allbordercolors={0 0 0},
          pdfborderstyle={/S/U/W 1}}
---

# MULDIV unit (8-bit signed/unsigned)

I tried replacing the 2nd --- with ... to indicate eod-of-document to YML but it did not make a difference.

Using the option --from markdown-yaml_metadata_block does allow it to be processed without error.
I guess this defeats a format auto-detection mechanism which maybe causing the issue the problem. pandoc is not exactly sure the following data (after YAMl headers) is MD format.

@dlmiles
Copy link
Contributor Author

dlmiles commented Nov 3, 2024

A guess at the line that would receive the option:

project.py:893: pdf_cmd = "pandoc --pdf-engine=xelatex --resource-path=docs -i datasheet.md -o datasheet.pdf"

@urish urish added the bug Something isn't working label Nov 3, 2024
@dlmiles
Copy link
Contributor Author

dlmiles commented Nov 3, 2024

After looking at the resulting PDF this --from markdown-yaml_metadata_block means turn OFF the YAML metadata block, which ends up rendering that block as inline text, which is not the intention.

I found best results with a sequence like:

## Note use of --strip-comments to help with removal of HTML style comments from MD document `<!---`
pandoc --from gfm --to markdown --strip-comments --resource-path=docs -i docs/info.md  -o datasheet.md

## edit datasheet.md to remove any empty HTML CODE blocks ```{=html}\s+``` which is in the default TT
## template of a markdown file, the use of `--strip-comments` above helped, maybe this can be done in
## python when the next stage is done?  like a multiline regex replace

## prepend the YAML  metadata header info for TeX, maybe it should end with `...` to indicate end of YAML
## document, although I did not find it makes much difference to pandoc.  I think the issue here is the pandoc
## feature of allowing the metadata to be anywhere in the document.

## Then process with the current command:
pandoc --pdf-engine=xelatex --resource-path=docs -i datasheet.md -o datasheet.pdf

The difference is there is a gfm (GitHub Markdown) to markdown (Pandoc Markdown) conversion that occurs, seems to resolve every issue when I try to break it.

htfab pushed a commit to htfab/tt-support-tools that referenced this issue Dec 5, 2024
@htfab htfab linked a pull request Dec 5, 2024 that will close this issue
@htfab
Copy link
Collaborator

htfab commented Dec 5, 2024

See #84 as my suggested solution. I've marked it as draft so that we can discuss it here before merging. The changes are:

  • use gfm (i.e. github flavored markdown) as per your last comment, but without the extra steps
  • this turns off implicit raw tex support within markdown, so we replace our uses with explicit latex blocks, e.g. instead of \pagebreak we write
    ```{=latex}
    \pagebreak
    ```
    
  • add the +raw_attribute extension so that we can use this feature in the yaml header block
  • add the +smart extension that was enabled by default for markdown but not for gfm (it makes some changes like replacing apostrophes and quotation marks with nicer versions, i.e. ' becomes ʼ and " becomes ”).

The current PR only affects the project-level documentation, but if it works for your test cases I'll make another one for the full shuttle datasheet.

@dlmiles
Copy link
Contributor Author

dlmiles commented Dec 5, 2024

Before I forget the initial issue for me was the use of --- in the markdown, this was the cause of the failed docs GHA.

This is allowed in gfm markdown flavour for a horizontal-rule but because of the YAML metadata and the feature that metadata can appear anywhere in the markdown content, the use of --- has special meaning to YAML parser.

The solution was to just use ----- which side-steps the issue and is a compatible form of HR for both markdown flavours. But it was only known that this was a solution when I converted and inspected the output after a conversion from gfm to markdown (the native pandoc flavour markdown).

I think my extra steps were just trying to maintain using the native pandoc markdown by default for PDF generation as I assumed there maybe formatting differences and features in use (and didn't want/couldn't retest every scenario for every datasheet to know if it broke something).

So if switching to gfm and output looks good there be no need for extra steps to maintain internal use of pandoc markdown flavour.

The #84 solution looks fine to me.

--

It would be really nice if there was a diagnostic/linting mode to try to inform a failed docs GHA on what the problem might be. This might be a markdown syntax linter in the flavor of markdown that is always run before the PDF is produced so at least there is feedback in the logs. I did find and use some linting features but I don't recall the specifics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants