Leading and trailing non-`whitespace` `Unicode whitespace` is stripped from paragraphs #132

ScottAbbey · 2017-12-05T07:00:27Z

The CommonMark dingus implementation currently strips non-whitespace Unicode whitespace from
the start and end of paragraphs. The CommonMark specification and the cmark implementation seem to indicate that these characters should not be stripped.

Example link

Each "space" character in this example is U+1680, OGHAM SPACE MARK, chosen from the list of Unicode whitespace characters that are not in the list of whitespace characters.

Note that the leading and trailing U+1680 characters have been trimmed from the final result.

From CommonMark 0.28, section 4.8:

The paragraph’s raw content is formed by concatenating the lines and removing initial and final [whitespace].

By comparison, in cmark 0.28.3:

Input:

 o o 
 o o

Output:

<p> o o 
 o o </p>

Ref: commonmark/commonmark-spec#465

The text was updated successfully, but these errors were encountered:

maximbaz mentioned this issue Jan 20, 2018

Remove support for seeing non-ASCII whitespace as such remarkjs/remark#321

Closed

lancedolan mentioned this issue Mar 12, 2019

Add U+00A0 (NO-BREAK SPACE) as whitespace commonmark/commonmark-spec#366

Closed

ChristianMurphy mentioned this issue Sep 26, 2021

micromark preserves control characters where commonmark does not micromark/micromark#91

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leading and trailing non-`whitespace` `Unicode whitespace` is stripped from paragraphs #132

Leading and trailing non-`whitespace` `Unicode whitespace` is stripped from paragraphs #132

ScottAbbey commented Dec 5, 2017

Leading and trailing non-whitespace Unicode whitespace is stripped from paragraphs #132

Leading and trailing non-whitespace Unicode whitespace is stripped from paragraphs #132

Comments

ScottAbbey commented Dec 5, 2017

Leading and trailing non-`whitespace` `Unicode whitespace` is stripped from paragraphs #132

Leading and trailing non-`whitespace` `Unicode whitespace` is stripped from paragraphs #132