Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

Closed
wdhwg001 opened this issue Sep 9, 2015 · 3 comments
Closed

Add U+00A0 (NO-BREAK SPACE) as whitespace #366

wdhwg001 opened this issue Sep 9, 2015 · 3 comments

Comments

@wdhwg001
Copy link

wdhwg001 commented Sep 9, 2015

Both ES5 and jQuery treat \xA0 as whitespace in String.prototype.trim()

@jgm
Copy link
Member

jgm commented Sep 9, 2015

It's not clear that this is desirable.

Currently all the characters that relate to block structure in Markdown are ASCII; adding U+00A0 as whitespace would break this. That's a move away from simplicity, and if we allow U+00A0, what about all the other unicode space characters (half-width space, zero-width space, etc.)?

The structural-markers-are-ASCII invariant also helps in writing parsers that process text as streams of bytes (e.g. in C).

Finally, allowing nonbreaking space NOT to count as whitespace also opens up some expressive possibilities -- e.g. if you want some emphasized text that begins and ends with a space, you can do it using nonbreaking spaces, but you wouldn't be able to if we made the proposed change.

+++ wdhwg001 [Sep 09 15 04:20 ]:

Both ES5 and jQuery treat \xA0 as whitespace in String.prototype.trim()


Reply to this email directly or [1]view it on GitHub.

References

  1. Add U+00A0 (NO-BREAK SPACE) as whitespace #366

@lancedolan
Copy link

lancedolan commented Mar 12, 2019

I offer the following as a deeply respectful argument toward supporting the unicode whitespace as whitespace. Sometimes criticisms can be taken a bit personally, and I want to be clear that I'm criticizing a decision and not any person.

The decision not to support unicode whitespace characters as white space characters has exactly the merits @jgm has given, but those merits don't seem to hold up against the "spirit" of markdown in the context of the internet, in which unicode whitespace shows up fairly often, and the intuition of developers and users alike is that unicode whitespace is, in fact, whitespace.

With the current spec, we have the ability to provide a type of whitespace that is, sheerly by unexpected specification/convention, not considered whitespace, in order to trigger alternative parsing. You would only know that if you looked at that exact segment of the spec, which of course nearly no markdown users will, whereas so much else in the spec can be gleaned without checking the spec due to its intuitive nature. Users skim through a "kitchen sink" and are ready to write some markdown in seconds. Imagine a section of the "kitchen sink" which demonstrates that different types of whitespace can create alternative parsings? The user would be spooked out.

Rather than a desirable opportunity to be more expressive, this appears, to me, a counterintuitive secret-handshake, a hack akin to so many other anti patterns we've all recognized and recommended against in a codebase.

Perhaps there's some other reason, EG that unicode can't be generally supported, that I'm not familiar with?

This cost me a solid day of development time to track down the culprit today, resulting in this issue report, and I'm really kind of shocked to learn that it's working as intended. When users of my system type in some markdown, I need to convert unicode spaces to U+0020 in order to give them the experience they, and I, expected. I'm only one voice and my experience will probably be brushed aside, but it seems obvious to me that others should be experiencing issues due to this in various places (I see @maximbaz hit an issue and @ScottAbbey hit an issue as well.)

Thanks for your consideration and feel free to set me straight if my understanding is off.

:)

Update: I noticed that I misread the bit about "structural-markers-are-ASCII", and believe I understand now. I think what's going on is that technical limitations are controlling the spec, rather than the user desires completely determining the spec and thus the technical solution. I consider this to be a case of implementation guiding the spec rather than spec guiding the implementation, but we've all been there and I understand. Fair enough.

@digitalmoksha
Copy link

I think there is something to be said for keeping it simple in regards to structural whitespace. Considering there are at least 20 Unicode whitespace characters (at least according to this: http://jkorpela.fi/chars/spaces.html ) and an ASCII whitespace is available everywhere, I personally consider it reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants