Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to micromark #5330

Merged
merged 26 commits into from
Oct 25, 2024
Merged

Move to micromark #5330

merged 26 commits into from
Oct 25, 2024

Conversation

compulim
Copy link
Contributor

@compulim compulim commented Oct 24, 2024

Fixes #4844.

Changelog Entry

Breaking changes

  • Moved to micromark for rendering Markdown, instead of markdown-it
    • Please refer to PR #5330 for details

Changed

  • Moved to micromark for rendering Markdown, instead of markdown-it, in PR #5330, by @compulim

Description

Moving from markdown-it to micromark so we can support MathML and TeX (via micromark-extension-math and katex) in short future.

We are still using markdown-it for internal localization and will evaluate if we should move to ICU/MessageFormat or micromark.

Design

One-pager

  • No smart quotes or "typographer" from markdown-it
    • Previously, ' would become ´ in some cases, " would become and , and ... would become
    • Smart quotes may not localize well
  • Enabled GitHub Flavored Markdown (GFM)
    • Auto-link is more conservative
      • Will turn www.bing.com and https://bing.com into links
      • Will not turn bing.com
    • Tag filter will filter out dangerous tags
      • Today, we removed dangerous tag during post-Markdown sanitization
      • GFM tag filter will turn <iframe> into &lt;iframe>
      • Tomorrow, we will keep dangerous tag as plain text
  • No aria-label attribute via {aria-label=""}
    • Use <a aria-label=""> directly
  • Small differences when HTML-in-Markdown is turned off
    • Text that resembles XML tags may not be wrapped in <p>

Corner cases

Oneliner: micromark with "allow dangerous HTML" requires a separate sanitization process to make sure HTML code rendered are safe to use.

Markdown [Link](javascript:alert(1))
CommonMark <a href="javascript:alert(1)">Link</a>
markdown-it [Link](javascript:alert(1)) (as text)
micromark <a href="">Link</a> (without allowDangerousProtocol)
micromark with GFM <a href="">Link</a> (without allowDangerousProtocol)

Note: we will enable allowDangerousProtocol, see below.

Markdown <iframe>ABC</iframe>
CommonMark <iframe>ABC</iframe>
markdown-it <iframe>ABC</iframe>
micromark <iframe>ABC</iframe>
micromark with GFM (tagfilter) &lt;iframe>ABC&lt;/iframe>
Markdown <script>alert(0)</script>
CommonMark <script>alert(0)</script>
markdown-it <script>alert(0)</script>
micromark <script>alert(0)</script>
micromark with GFM (tagfilter) &lt;script>alert(0)&lt;/script>
Markdown <img onerror="javascript:alert(0)" />
CommonMark <img onerror="javascript:alert(0)" />
markdown-it <img onerror="javascript:alert(0)" />
micromark <img onerror="javascript:alert(0)" />
micromark with GFM <img onerror="javascript:alert(0)" />

The content of <iframe> tag is treated similar to CDATA. When using markdown-it, <iframe>abc<b>def</b>xyz</iframe> will become <iframe>abc&lt;b&gt;def&lt;/b&gt;xyz</iframe>, where <b> become &lt;b&gt;.

This caused slight deviation when using GFM tagfilter extension where <iframe> will turn into &lt;iframe>. The content of <iframe> would not processed as CDATA but as-is.

Thus, using micromark-extension-gfm, the <iframe>abc<b>def</b>xyz</iframe> would become &lt;iframe>abc<b>def</b>xyz&lt;/iframe> and "def" will be bold.

Allowing dangerous protocol

micromark default set of URL protocol is limited to a handful.

URLs that have no protocol (which means it’s relative to the current page, such as ./some/page.html) and URLs that have a safe protocol (for images: http, https; for links: http, https, irc, ircs, mailto, xmpp), are safe. All other URLs are dangerous and dropped.

As we are always do HTML sanitization, with micromark, we should set allowDangerousProtocol: true and let all protocol to fellthrough. So we only need to configure one rule for sanitization. Also, the protocol selection for links in micromark is limited and it don't allow URL handlers and links like cite:1.

Later on down the pipe, the final step is sanitizer and it will remove "javascript:" properly.

Furthermore in our better link mod, all disallowed schemes will be turned into <button value="javascript:">, which is mostly harmless.

To enable sanitizing "javascript:" in links, we are skipping better link mod for "javascript:" protocol, so it continue to be <a href="javascript:"> and will be stripped by the final sanitizer.

Injecting aria-label

As we introduced HTML-in-Markdown support recently, we encourage web devs to use HTML instead of markdown-it-attrs.

Engine Markdown
markdown-it [Link](https://bing.com/){aria-label="Bing homepage"}
micromark <a aria-label="Bing homepage" href="https://bing.com/">Link</a>

Auto-link

markdown-it turn www.bing.com, bing.com, and https://bing.com into a link.

micromark-extension-gfm-autolink-literal will ignore bing.com. Instead of bing.com, bot author should use https://bing.com instead.

Smart quote

"Typographer" plugin, or "smart quote" was not introduced in micromark as it may cause issues with localization. For example, ' would become ´ in some cases, " would become and , and ... would become .

Math/TeX support

TODO: We are evaluating micromark-extension-math, which use KaTeX. However, it is a huge dependency as it also include fonts, we are looking for ways to adopt it.

Disabling HTML

Note: CommonMark spec did not specify how HTML code should look when HTML is disabled.

Given <abc> as input Markdown text.

markdown-it micromark
HTML enabled <abc> <abc>
HTML disabled <p>&lt;abc&gt;</p> &lt;abc&gt;

Given <abc>123</abc> as input Markdown text.

markdown-it micromark
HTML enabled <abc>123</abc> <abc>123</abc>
HTML disabled <p>&lt;abc&gt;123&lt;/abc&gt;</p> <p>&lt;abc&gt;123&lt;/abc&gt;</p>

Given <a href="https://bing.com">.

  • markdown-it: &lt;a href=&quot;<a href="https://bing.com">https://bing.com</a>&quot;&gt;
  • micromark: &lt;a href="https://bing.com"&gt;Bing.com&lt;/a&gt;
    • Visualized: <a href="https://bing.com">
    • Did not auto-link https://bing.com

Specific Changes

  • I have added tests and executed them locally
  • I have updated CHANGELOG.md
  • I have updated documentation

Review Checklist

This section is for contributors to review your work.

  • Accessibility reviewed (tab order, content readability, alt text, color contrast)
  • Browser and platform compatibilities reviewed
  • CSS styles reviewed (minimal rules, no z-index)
  • Documents reviewed (docs, samples, live demo)
  • Internationalization reviewed (strings, unit formatting)
  • package.json and package-lock.json reviewed
  • Security reviewed (no data URIs, check for nonce leak)
  • Tests reviewed (coverage, legitimacy)

@compulim compulim changed the title [WIP] Move to micromark Move to micromark Oct 24, 2024
@compulim compulim marked this pull request as ready for review October 24, 2024 23:54
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any difference, I assume our snapshot matcher knows better

},
logLevel: 'info',
// Generate stats.json.
metafile: true,
// Minified file is smaller and load faster from GitHub Codespaces.
minify: true,
outdir: resolveFromProjectRoot('./dist'),
outdir: resolveFromProjectRoot('./dist/dev'),
plugins: [isomorphicReactResolvePlugin],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Collaborator

@OEvgeny OEvgeny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we now ship both:

  • internal markdown-it, non configurable from the outside, used in component
  • micromark, configurable from the outside, used in bundle

@@ -145,7 +140,14 @@ export default function render(
return decoration;
};

const htmlAfterMarkdown = ariaLabelPost(new MarkdownIt(MARKDOWN_IT_INIT).use(ariaLabel).render(markdown));
const htmlAfterMarkdown = micromark(markdown, {
allowDangerousHtml: markdownRenderHTML ?? true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is true by default as we still have sanitize after it.

@OEvgeny OEvgeny merged commit 8356902 into microsoft:main Oct 25, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants