Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

links fail for identifiers wrapped in backticks with pymdownx.inlinehilite enabled #34

Closed
tlambert03 opened this issue Sep 26, 2023 · 8 comments · Fixed by #40
Closed

Comments

@tlambert03
Copy link

This took me a while to figure out, and it's possible this is a "won't fix", but curious to hear your thoughts on this issue.

I use a number of autorefs in the following format:

[`some.identifier`][]

I'm not sure if that's officially supported, but it has always worked well for me, making properly hyperlinked text wrapped in <code>. I recently installed mkdocs-gallery which broke this behavior, and I eventually tracked it down to the addition of pymdownx.inlinehilite to the config. So, with the following config, the above autoref will fail:

site_name: My Docs
markdown_extensions:
  - pymdownx.inlinehilite
plugins:
  - mkdocstrings

I tracked that down to this line:

if re.search(r"[/ \x00-\x1f]", identifier):
# Do nothing if the matched reference contains:
# - a space, slash or control character (considered unintended);
# - specifically \x01 is used by Python-Markdown HTML stash when there's inline formatting,
# but references with Markdown formatting are not possible anyway.
return None, m.start(0), end

if pymdownx.inlinehilite is not included in the config, identifier will equal 'some.identifier' at that line, re.search will not match, and the link will be created. If inlinehilight is included though, identifier will look something like '\x02wzxhzdk:1\x03', and the search will hit preventing the link.

Is this something that you can imagine a fix for? Or is the [`some.identifier`][] syntax just not supported?

thanks!

@oprypin
Copy link
Member

oprypin commented Sep 26, 2023

[`some.identifier`][] syntax is supported, we'll try to find a fix for this. Thanks for the detailed report.

@oprypin
Copy link
Member

oprypin commented Sep 28, 2023

Sadly the fix isn't something simple priority-based

pymdownx anyway uses the same priority as the standard one and that's not it
https://github.com/facelessuser/pymdown-extensions/blob/e6474b38703b45e3dc431d3c1a0b1f24d80ee7fa/pymdownx/inlinehilite.py#L7
https://github.com/Python-Markdown/markdown/blob/93054dd9f7e6e2f555537873a3ec76d99e82326a/markdown/inlinepatterns.py#L76

and we need the priority to be lower than that

priority=168, # Right after markdown.inlinepatterns.ReferenceInlineProcessor

the difference is that pymdownx stashes the html and the standard one doesn't

@oprypin
Copy link
Member

oprypin commented Sep 28, 2023

Ah actually the built-in one also stashes things, and we are able to detect that one but not this one somehow

if INLINE_PLACEHOLDER_RE.fullmatch(identifier):
identifier = self.unescape(identifier)

@pawamoy
Copy link
Member

pawamoy commented Feb 22, 2024

It seems to be because the first unescape reveals a second stashed item, this time stored in the HTML stash. We can retrieve the item from the stash, but it was stashed as a string, so we can't use .itertext() on it to easily get the text.

if INLINE_PLACEHOLDER_RE.fullmatch(identifier):
    identifier = self.unescape(identifier)
if match := HTML_PLACEHOLDER_RE.fullmatch(identifier):
    identifier = self.md.htmlStash.rawHtmlBlocks[int(match.group(1))]  # no unstash function that does this?
    ...  # how to get text?

For [pathlib.Path][], identifier becomes:

<span class="n">pathlib</span><span class="o">.</span><span class="n">Path</span>

Should we load that into an Element tree again? Or use a regex to pick up what's between > and <?

identifier = "".join(re.findall(r">([^<>]+)<", identifier))

And finally we have to unescape HTML characters:

identifier = html.unescape(identifier)

@pawamoy
Copy link
Member

pawamoy commented Feb 22, 2024

OK that works well, except that our fix_refs post-processing regular expression stops at the first </span>, and messes up the HTML.

@oprypin
Copy link
Member

oprypin commented Feb 22, 2024

Should we load that into an Element tree again? Or use a regex to pick up what's between > and <?

markupsafe striptags would fit perfectly for this task. It even unescapes as well

@oprypin
Copy link
Member

oprypin commented Feb 23, 2024

Created #40 accordingly (sorry for the snipe)

@pawamoy
Copy link
Member

pawamoy commented Feb 23, 2024

No worries, I can add myself as a co-author when squashing (I've spent quite some time investigating and debugging 😅)

pawamoy added a commit that referenced this issue Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants