Add U+00A0 (NO-BREAK SPACE) as whitespace #366

wdhwg001 · 2015-09-09T11:20:29Z

Both ES5 and jQuery treat \xA0 as whitespace in String.prototype.trim()

The text was updated successfully, but these errors were encountered:

jgm · 2015-09-09T16:13:38Z

It's not clear that this is desirable.

Currently all the characters that relate to block structure in Markdown are ASCII; adding U+00A0 as whitespace would break this. That's a move away from simplicity, and if we allow U+00A0, what about all the other unicode space characters (half-width space, zero-width space, etc.)?

The structural-markers-are-ASCII invariant also helps in writing parsers that process text as streams of bytes (e.g. in C).

Finally, allowing nonbreaking space NOT to count as whitespace also opens up some expressive possibilities -- e.g. if you want some emphasized text that begins and ends with a space, you can do it using nonbreaking spaces, but you wouldn't be able to if we made the proposed change.

+++ wdhwg001 [Sep 09 15 04:20 ]:

Both ES5 and jQuery treat \xA0 as whitespace in String.prototype.trim()

—
Reply to this email directly or [1]view it on GitHub.

References

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

lancedolan · 2019-03-12T22:23:15Z

I offer the following as a deeply respectful argument toward supporting the unicode whitespace as whitespace. Sometimes criticisms can be taken a bit personally, and I want to be clear that I'm criticizing a decision and not any person.

The decision not to support unicode whitespace characters as white space characters has exactly the merits @jgm has given, but those merits don't seem to hold up against the "spirit" of markdown in the context of the internet, in which unicode whitespace shows up fairly often, and the intuition of developers and users alike is that unicode whitespace is, in fact, whitespace.

With the current spec, we have the ability to provide a type of whitespace that is, sheerly by unexpected specification/convention, not considered whitespace, in order to trigger alternative parsing. You would only know that if you looked at that exact segment of the spec, which of course nearly no markdown users will, whereas so much else in the spec can be gleaned without checking the spec due to its intuitive nature. Users skim through a "kitchen sink" and are ready to write some markdown in seconds. Imagine a section of the "kitchen sink" which demonstrates that different types of whitespace can create alternative parsings? The user would be spooked out.

Rather than a desirable opportunity to be more expressive, this appears, to me, a counterintuitive secret-handshake, a hack akin to so many other anti patterns we've all recognized and recommended against in a codebase.

Perhaps there's some other reason, EG that unicode can't be generally supported, that I'm not familiar with?

This cost me a solid day of development time to track down the culprit today, resulting in this issue report, and I'm really kind of shocked to learn that it's working as intended. When users of my system type in some markdown, I need to convert unicode spaces to U+0020 in order to give them the experience they, and I, expected. I'm only one voice and my experience will probably be brushed aside, but it seems obvious to me that others should be experiencing issues due to this in various places (I see @maximbaz hit an issue and @ScottAbbey hit an issue as well.)

Thanks for your consideration and feel free to set me straight if my understanding is off.

:)

Update: I noticed that I misread the bit about "structural-markers-are-ASCII", and believe I understand now. I think what's going on is that technical limitations are controlling the spec, rather than the user desires completely determining the spec and thus the technical solution. I consider this to be a case of implementation guiding the spec rather than spec guiding the implementation, but we've all been there and I understand. Fair enough.

digitalmoksha · 2019-03-13T17:26:21Z

I think there is something to be said for keeping it simple in regards to structural whitespace. Considering there are at least 20 Unicode whitespace characters (at least according to this: http://jkorpela.fi/chars/spaces.html ) and an ASCII whitespace is available everywhere, I personally consider it reasonable.

wdhwg001 closed this as completed Sep 15, 2015

maximbaz mentioned this issue Jan 20, 2018

Remove support for seeing non-ASCII whitespace as such remarkjs/remark#321

Closed

lancedolan mentioned this issue Mar 12, 2019

HR following whitespace is interpreted as empty H2 markdown-it/markdown-it#541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

wdhwg001 commented Sep 9, 2015

jgm commented Sep 9, 2015

lancedolan commented Mar 12, 2019 •

edited

Loading

digitalmoksha commented Mar 13, 2019

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

Comments

wdhwg001 commented Sep 9, 2015

jgm commented Sep 9, 2015

lancedolan commented Mar 12, 2019 • edited Loading

digitalmoksha commented Mar 13, 2019

lancedolan commented Mar 12, 2019 •

edited

Loading