Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add option to scan and register Markdown anchors #39

Merged
merged 14 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 94 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,97 @@ This works the same as [a normal link to that heading](../doc1.md#hello-world).

Linking to a heading without needing to know the destination page can be useful if specifying that path is cumbersome, e.g. when the pages have deeply nested paths, are far apart, or are moved around frequently. And the issue is somewhat exacerbated by the fact that [MkDocs supports only *relative* links between pages](https://github.com/mkdocs/mkdocs/issues/1592).

Note that this plugin's behavior is undefined when trying to link to a heading title that appears several times throughout the site. Currently it arbitrarily chooses one of the pages.
Note that this plugin's behavior is undefined when trying to link to a heading title that appears several times throughout the site. Currently it arbitrarily chooses one of the pages. In such cases, use [Markdown anchors](#markdown-anchors) to add unique aliases to your headings.

### Markdown anchors

The autorefs plugin offers a feature called "Markdown anchors". Such anchors can be added anywhere in a document, and linked to from any other place.

The syntax is:

```md
[](){#id-of-the-anchor}
```

If you look closely, it starts with the usual syntax for a link, `[]()`, except both the text value and URL of the link are empty. Then we see `{#id-of-the-anchor}`, which is the syntax supported by the [`attr_list`](https://python-markdown.github.io/extensions/attr_list/) extension. It sets an HTML id to the anchor element. The autorefs plugin simply gives a meaning to such anchors with ids. Note that raw HTML anchors like `<a id="foo"></a>` are not supported.

The `attr_list` extension must be enabled for the Markdown anchors feature to work:

```yaml
# mkdocs.yml
plugins:
- search
- autorefs

markdown_extensions:
- attr_list
```

Now, you can add anchors to documents:

```md
Somewhere in a document.

[](){#foobar-paragraph}
pawamoy marked this conversation as resolved.
Show resolved Hide resolved

Paragraph about foobar.
```

...making it possible to link to this anchor with our automatic links:

```md
In any document.

Check out the [paragraph about foobar][foobar-pararaph].
```

If you add a Markdown anchor right above a heading, this anchor will redirect to the heading itself:

```md
[](){#foobar}
## A verbose title about foobar
```

Linking to the `foobar` anchor will bring you directly to the heading, not the anchor itself, so the URL will show `#a-verbose-title-about-foobar` instead of `#foobar`. These anchors therefore act as "aliases" for headings. It is possible to define multiple aliases per heading:

```md
[](){#contributing}
[](){#development-setup}
## How to contribute to the project?
```

Such aliases are especially useful when the same headings appear in several different pages. Without aliases, linking to the heading is undefined behavior (it could lead to any one of the headings). With unique aliases above headings, you can make sure to link to the right heading.

For example, consider the following setup. You have one document per operating system describing how to install a project with the OS package manager or from sources:

```tree
docs/
install/
arch.md
debian.md
gentoo.md
```

Each page has:

```md
## Install with package manager
...

## Install from sources
...
```

You don't want to change headings and make them redundant, like `## Arch: Install with package manager` and `## Debian: Install with package manager` just to be able to reference the right one with autorefs. Instead you can do this:

```md
[](){#arch-install-pkg}
## Install with package manager
...

[](){#arch-install-src}
## Install from sources
...
```

...changing `arch` by `debian`, `gentoo`, etc. in the other pages.
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ markdown_extensions:
permalink: "¤"

plugins:
- autorefs:
scan_anchors: true
- search
- markdown-exec
- gen-files:
Expand All @@ -109,6 +111,7 @@ plugins:
import:
- https://docs.python.org/3/objects.inv
- https://www.mkdocs.org/objects.inv
- https://python-markdown.github.io/objects.inv
paths: [src]
options:
docstring_options:
Expand Down
31 changes: 26 additions & 5 deletions src/mkdocs_autorefs/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,12 @@
from typing import TYPE_CHECKING, Any, Callable, Sequence
from urllib.parse import urlsplit

from markdown.extensions.attr_list import AttrListExtension
from mkdocs.config.base import Config
from mkdocs.config.config_options import Type
from mkdocs.config.defaults import MkDocsConfig
from mkdocs.plugins import BasePlugin
from mkdocs.structure.pages import Page

from mkdocs_autorefs.references import AutorefsExtension, fix_refs, relative_url

Expand All @@ -36,7 +41,14 @@
log = logging.getLogger(f"mkdocs.plugins.{__name__}") # type: ignore[assignment]


class AutorefsPlugin(BasePlugin):
class AutorefsConfig(Config):
"""Configuration options for the Autorefs plugin."""

scan_anchors = Type(bool, default=False)
"""Whether to scan HTML pages for anchors defining references."""
pawamoy marked this conversation as resolved.
Show resolved Hide resolved


class AutorefsPlugin(BasePlugin[AutorefsConfig]):
"""An `mkdocs` plugin.

This plugin defines the following event hooks:
Expand All @@ -50,6 +62,7 @@ class AutorefsPlugin(BasePlugin):
"""

scan_toc: bool = True
scan_anchors: bool = False
current_page: str | None = None

def __init__(self) -> None:
Expand All @@ -59,14 +72,14 @@ def __init__(self) -> None:
self._abs_url_map: dict[str, str] = {}
self.get_fallback_anchor: Callable[[str], str | None] | None = None

def register_anchor(self, page: str, identifier: str) -> None:
def register_anchor(self, page: str, identifier: str, anchor: str | None = None) -> None:
"""Register that an anchor corresponding to an identifier was encountered when rendering the page.

Arguments:
page: The relative URL of the current page. Examples: `'foo/bar/'`, `'foo/index.html'`
identifier: The HTML anchor (without '#') as a string.
"""
self._url_map[identifier] = f"{page}#{identifier}"
self._url_map[identifier] = f"{page}#{anchor or identifier}"

def register_url(self, identifier: str, url: str) -> None:
"""Register that the identifier should be turned into a link to this URL.
Expand Down Expand Up @@ -133,7 +146,14 @@ def on_config(self, config: MkDocsConfig) -> MkDocsConfig | None:
The modified config.
"""
log.debug("Adding AutorefsExtension to the list")
config["markdown_extensions"].append(AutorefsExtension())
for ext in config.markdown_extensions:
if ext == "attr_list" or isinstance(ext, AttrListExtension):
log.debug("Enabling Markdown anchors feature")
scan_anchors = True
oprypin marked this conversation as resolved.
Show resolved Hide resolved
break
else:
scan_anchors = False
config["markdown_extensions"].append(AutorefsExtension(plugin=self if scan_anchors else None))
return config

def on_page_markdown(self, markdown: str, page: Page, **kwargs: Any) -> str: # noqa: ARG002
Expand All @@ -145,7 +165,8 @@ def on_page_markdown(self, markdown: str, page: Page, **kwargs: Any) -> str: #
kwargs: Additional arguments passed by MkDocs.

Returns:
The same Markdown. We only use this hook to map anchors to URLs.
The same Markdown. We only use this hook to keep a reference to the current page URL,
used during Markdown conversion by the anchor scanner tree processor.
"""
self.current_page = page.url
return markdown
Expand Down
78 changes: 77 additions & 1 deletion src/mkdocs_autorefs/references.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,22 @@

import re
from html import escape, unescape
from typing import TYPE_CHECKING, Any, Callable, Match, Tuple
from itertools import zip_longest
from typing import TYPE_CHECKING, Any, Callable, ClassVar, Match, Tuple
from urllib.parse import urlsplit
from xml.etree.ElementTree import Element

from markdown.core import Markdown
from markdown.extensions import Extension
from markdown.inlinepatterns import REFERENCE_RE, ReferenceInlineProcessor
from markdown.treeprocessors import Treeprocessor
from markdown.util import INLINE_PLACEHOLDER_RE

if TYPE_CHECKING:
from markdown import Markdown

from mkdocs_autorefs.plugin import AutorefsPlugin

AUTO_REF_RE = re.compile(
r"<span data-(?P<kind>autorefs-identifier|autorefs-optional|autorefs-optional-hover)="
r'("?)(?P<identifier>[^"<>]*)\2>(?P<title>.*?)</span>',
Expand Down Expand Up @@ -197,13 +202,78 @@ def fix_refs(html: str, url_mapper: Callable[[str], str]) -> tuple[str, list[str
return html, unmapped


class AnchorScannerTreeProcessor(Treeprocessor):
"""Tree processor to scan and register HTML anchors."""

_htags: ClassVar[set[str]] = {"h1", "h2", "h3", "h4", "h5", "h6"}

def __init__(self, plugin: AutorefsPlugin, md: Markdown | None = None) -> None:
"""Initialize the tree processor.

Parameters:
plugin: A reference to the autorefs plugin, to use its `register_anchor` method.
"""
super().__init__(md)
self.plugin = plugin
self._slug = md.treeprocessors["toc"].slugify

def run(self, root: Element) -> None: # noqa: D102
if self.plugin.current_page is not None:
self._scan_anchors(root)

def _scan_anchors(self, parent: Element) -> list[str]:
ids = []
# We iterate on pairs of elements, to check if the next element is a heading (alias feature).
for el, next_el in zip_longest(parent, parent[1:], fillvalue=Element("/")):
if el.tag == "a":
# We found an anchor. Record its id if it has one.
if anchor_id := el.get("id"):
pawamoy marked this conversation as resolved.
Show resolved Hide resolved
if el.tail and el.tail.strip():
# If the anchor has a non-whitespace-only tail, it's not an alias:
# register it immediately.
self.plugin.register_anchor(self.plugin.current_page, anchor_id) # type: ignore[arg-type]
else:
# Else record its id and continue.
ids.append(anchor_id)
elif el.tag == "p":
if ids := self._scan_anchors(el):
# Markdown anchors are always rendered as `a` tags within a `p` tag.
# Headings therefore appear after the `p` tag. Here the current element
# is a `p` tag and it contains at least one anchor with an id.
# We can check if the next element is a heading, and use its id as href.
href = (next_el.get("id") or self._slug(next_el.text or "")) if next_el.tag in self._htags else ""
for anchor_id in ids:
self.plugin.register_anchor(self.plugin.current_page, anchor_id, href) # type: ignore[arg-type]
ids.clear()
else:
# Recurse into sub-elements.
ids = self._scan_anchors(el)
return ids
pawamoy marked this conversation as resolved.
Show resolved Hide resolved


class AutorefsExtension(Extension):
"""Extension that inserts auto-references in Markdown."""

def __init__(
self,
plugin: AutorefsPlugin | None = None,
**kwargs: Any,
) -> None:
"""Initialize the Markdown extension.

Parameters:
plugin: An optional reference to the autorefs plugin (to pass it to the anchor scanner tree processor).
**kwargs: Keyword arguments passed to the [base constructor][markdown.extensions.Extension].
"""
super().__init__(**kwargs)
self.plugin = plugin

def extendMarkdown(self, md: Markdown) -> None: # noqa: N802 (casing: parent method's name)
"""Register the extension.

Add an instance of our [`AutoRefInlineProcessor`][mkdocs_autorefs.references.AutoRefInlineProcessor] to the Markdown parser.
Also optionally add an instance of our [`AnchorScannerTreeProcessor`][mkdocs_autorefs.references.AnchorScannerTreeProcessor]
to the Markdown parser if a reference to the autorefs plugin was passed to this extension.

Arguments:
md: A `markdown.Markdown` instance.
Expand All @@ -213,3 +283,9 @@ def extendMarkdown(self, md: Markdown) -> None: # noqa: N802 (casing: parent me
"mkdocs-autorefs",
priority=168, # Right after markdown.inlinepatterns.ReferenceInlineProcessor
)
if self.plugin:
md.treeprocessors.register(
AnchorScannerTreeProcessor(self.plugin, md),
"mkdocs-autorefs-anchors-scanner",
priority=0,
)
38 changes: 38 additions & 0 deletions tests/test_references.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@

from __future__ import annotations

from textwrap import dedent

import markdown
import pytest

from mkdocs_autorefs.plugin import AutorefsPlugin
from mkdocs_autorefs.references import AutorefsExtension, fix_refs, relative_url


Expand Down Expand Up @@ -224,3 +227,38 @@ def test_external_references() -> None:
output, unmapped = fix_refs(source, url_map.__getitem__)
assert output == '<a class="autorefs autorefs-external" href="https://example.com">example</a>'
assert unmapped == []


def test_register_markdown_anchors() -> None:
"""Check that Markdown anchors are registered when enabled."""
plugin = AutorefsPlugin()
md = markdown.Markdown(extensions=["attr_list", "toc", AutorefsExtension(plugin)])
plugin.current_page = ""
md.convert(
dedent(
"""
[](){#foo}
## Heading foo

Paragraph 1.

[](){#bar}
Paragraph 2.

[](){#alias1}
[](){#alias2}
## Heading bar

[](){#alias3}
Text.
[](){#alias4}
## Heading baz
""",
),
)
assert plugin._url_map["foo"] == "#heading-foo"
assert plugin._url_map["bar"] == "#bar"
assert plugin._url_map["alias1"] == "#heading-bar"
assert plugin._url_map["alias2"] == "#heading-bar"
assert plugin._url_map["alias3"] == "#alias3"
assert plugin._url_map["alias4"] == "#heading-baz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have an official specification of what is expected to happen in case of conflicts, with the most extreme case being this one:

[](){#foo}
## Bar

[](){#bar}
## Foo

Copy link
Member Author

@pawamoy pawamoy Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With

[](){#foo}
## Bar
...
[](){#bar}
## Foo
...

[Link to foo][foo]
[Link to bar][bar]

foobar

Erm sorry about the highlighted rectangle on the left, but you get the idea.

Clicking on the "foo" link will bring you to the "Bar" heading, and inversely.
So, even though the ids of the headings themselves have been suffixed, what the user specified has been achieved 🤷 (the aliases work)

Loading