Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use multibyte.h macros for CHAR_T #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

yamt
Copy link
Contributor

@yamt yamt commented Mar 14, 2014

this fixes O commands with autoindent and J command at least.

this fixes O commands with autoindent and J command at least.
@lichray
Copy link
Owner

lichray commented Mar 14, 2014

CHAR_T literals should be wrapped with L() macro, yes.

Locale sensitive upper/lower case sometime makes sense.

isspace, isdigit and isblank uses are intended to be locale insensitive. Otherwise, full-width spaces will be able to be used as vi command splitter, or full-width (or even CJK numbers) will be able to be used in hex numbers, etc. I used to spend lots of time to evaluate them case by case. I suggest you to keep locale sensitive in mind and review those changes again.

@yamt
Copy link
Contributor Author

yamt commented Mar 14, 2014

i'm not sure what you mean. isspace etc is locale sensitive.

@lichray
Copy link
Owner

lichray commented Mar 14, 2014

That's true... Here are the details:

I assume that I can use the narrow char type functions on wide chars and
get correct answer for the unsigned char range. Now I know that's
wrong, but it works fine so far.

And then, I tested the locale effects to the narrow char type functions
on FreeBSD. If the effects do not satisfy my needs, I apply isascii
before using those functions. This might not be cross platform, but if
it also works on other BSDs, then that's fine.

So if you see narrow char type functions on wide chars in nvi2 code, they
are intentional. If you can provide counter examples to show how they
breaks, for example, ex script become locale sensitive while historically
they don't, or wide char type recognition completely stop working on some
BSDs, I'll look at them again.

I would suggest to split this patch into two, one for L literals, one
for char type functions.

@yamt
Copy link
Contributor Author

yamt commented Mar 14, 2014

it might happen to work for you, but not for me.
using isascii() on wchar_t has the same problem.
please use iswXXX(), or wctob().

at least O commands with autoindent and J command was broken for fileencoding=iso-2022-jp
on NetBSD. at least for these cases, iswblank() is appropriate.

using a full-width space as a command splitter might be a little icky but not broken as the current code.

i don't bother to separate patch because L() part is not important at all.

@lichray
Copy link
Owner

lichray commented Mar 14, 2014

On Fri, Mar 14, 2014 at 12:47 PM, YAMAMOTO Takashi <[email protected]

wrote:

it might happen to work for you, but not for me.
using isascii() on wchar_t has the same problem.
please use iswXXX(), or wctob().

at least O commands with autoindent and J command was broken for
fileencoding=iso-2022-jp
on NetBSD. at least for these cases, iswblank() is appropriate.

Can you show me the steps to reproduce it? And your locale settings. Hope
I can also reproduce it on FreeBSD. If not, I'll take this serious anyway.

iswblank() must not be used alone for J command. You definitely don't want
to
join a full-width space into a narrow space :)

using a full-width space as a command splitter might be a little icky but
not broken as the current code.

That's not acceptable. We need to fix both.

Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.


4BSD -- http://4bsd.biz/

@yamt
Copy link
Contributor Author

yamt commented Mar 15, 2014

LANG=ja_JP.eucJP
unset LC_xxx

vi
:set fileencoding=iso-2022-jp
:set ai
i今日[ESC]O

and

vi
ihoge[ESC]o今日[ESC]kJ

@lichray
Copy link
Owner

lichray commented Mar 31, 2015

@yamt I looked at this patch again and noticed that none of the change you request was prefixed by isascii, so even in my theory the status quo is problematic. Now I need some help:

  1. Can you check NetBSD's libc source code and see whether isascii and iswascii the same? (For short, I expect them to be the same on all ASCII-based systems.)
  2. Are tolower, isdigit, isblank produces the same result to their wide variants for wint_t within (0, 127)?

@lichray lichray force-pushed the master branch 2 times, most recently from 2c1d2dc to 9f2cc1e Compare April 3, 2015 07:23
@lichray lichray force-pushed the master branch 2 times, most recently from 834f889 to 4ee3903 Compare December 29, 2015 19:09
@lichray
Copy link
Owner

lichray commented Dec 29, 2015

@yamt Can you give this branch a test? Thanks. https://github.com/lichray/nvi2/tree/narrow-wctype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants