Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typographer extension should ignore code element content #308

Closed
jmooring opened this issue May 20, 2022 · 9 comments
Closed

typographer extension should ignore code element content #308

jmooring opened this issue May 20, 2022 · 9 comments

Comments

@jmooring
Copy link
Contributor

main.go
package main

import (
	"bytes"
	"fmt"

	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/extension"
	"github.com/yuin/goldmark/renderer/html"
)

func main() {
	md := goldmark.New(
		goldmark.WithExtensions(
			extension.Typographer,
		),
		goldmark.WithRendererOptions(
			html.WithUnsafe(),
		),
	)

	input := `<code>"foo"</code>`

	var buf bytes.Buffer
	if err := md.Convert([]byte(input), &buf); err != nil {
		panic(err)
	}

	fmt.Println(buf.String())

}

Desired output:

<p><code>"foo"</code></p>

Actual output:

<p><code>&ldquo;foo&rdquo;</code></p>
@yuin
Copy link
Owner

yuin commented May 21, 2022

This works as expected.

CommonMark spec does not have handlings for specific inline HTML tags. See Raw HTML section in CommonMark spec.

CommonMark dingus renders contents between <code> tags as markdown.

<code>**bold**</code>
<p><code><strong>bold</strong></code></p>

@yuin yuin closed this as completed May 21, 2022
@jmooring
Copy link
Contributor Author

Please re-open.

This extension substitutes punctuations with typographic entities like smartypants.

The smartypants documentation states:

SmartyPants does not modify characters within <pre>, <code>, <kbd>, or <script> tag blocks. Typically, these tags are used to display text where smart quotes and other “smart punctuation” would not be appropriate, such as source code or example markup.

Goldmark currently handles <pre> and <script> correctly, and should handle <code> and <kbd> in the same way.

cc: @kaushalmodi

@yuin
Copy link
Owner

yuin commented May 22, 2022

'like smartypants' does not mean that typographer extension is smartypants.

<pre> and <script> are HTML blocks but not Raw HTML .

Thus

Goldmark currently handles <pre> and <script> correctly, and should handle <code> and <kbd> in the same way.

is not correct.

@kaushalmodi
Copy link

@yuin Can you please separate the CommonMark spec from the Smart quotes feature? The latter feature is anyways not part of the CommonMark spec.

If a user has <code>"something"</code>, they want to show that element content verbatim. (Other HTML elements in this family would be pre, kbd, samp and var.)

While I agree that CommonMark spec allows the rendering of Markdown inside code elements, the conversion of straight quotes to curved quotes is not governed by that spec. And so I hope that the behavior of that smart quote conversion is decided by the practical nature of the code and related HTML elements.

@yuin
Copy link
Owner

yuin commented May 22, 2022

@kaushalmodi

No.

Original smartypants is defined for HTMLs, but Typographer extension is defined for CommonMark. It is unnatural that CommonMark parser handles specific inline HTML elements.

Other implementation also behaves in same way about inline HTMLs.

Example: markdown-it

Could you tell me CommonMark implementations that behave way you want?

@kaushalmodi
Copy link

Could you tell me CommonMark implementations that behave way you want?

I perceive this is an oversight in the CommonMark spec .. I have put more details in commonmark/commonmark-spec#711.

I don't understand why the CommonMark spec would allow polluting the user content which the user explicitly marked as "dont' touch"! The <code> block is especially meant for that.


This issue was raised from a practical issue I noticed here:

image

If a user reading my blog copied that rendered portion with the curved quotes, they will get a Nim compilation failure!

I had expected the smart quoting to not touch the <code> block. But now that it has, it made my code block invalid.


While I would hope this to be properly fixed in CommonMark, it would be great if at least the smart quotations can leave out the code-related HTML elements like <code>.

@yuin
Copy link
Owner

yuin commented May 22, 2022

Good action 👍

goldmark is a CommonMark parser. If commonmark/commonmark-spec#711 is accepted, I will implement it as defined.

@kaushalmodi
Copy link

@yuin I would appreciate your opinion in that issue please.

Putting the current version of the spec aside, does my explanation above make sense to you?

@yuin
Copy link
Owner

yuin commented May 22, 2022

Putting the current version of the spec aside, does my explanation #308 (comment) make sense to you?

I'm a neutral about this. From the point of view of CommoMark related product users, it makes sense. But the point of view of CommoMark library author, this is not straightforward solution for your requirements.

It would be better solutions that spec defines attributes for backticks like `my code`{class=foo}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants