From dfd3ca97e7e4cc289852e382da39daa67aa5d938 Mon Sep 17 00:00:00 2001 From: Martijn Dekker Date: Mon, 30 Dec 2024 22:26:38 +0100 Subject: [PATCH] Fix KEYBD trap for multibyte characters (re: 4886463b) The KEYBD trap should now be fully functional for UTF-8 and other multibyte locales. Thanks to Johnothan King for finding this fix! Analysis: The KEYBD trap code processes character code points stored in e_lbuf by ed_read(). But shell variables store bytes, not characters. So, in UTF-8 locales for example, the Unicode code points need to be converted to multibyte UTF-8 encoding. This is needed to calculate the length of each encoded character in bytes (which fixes the corruption issue) and for keytrap() to store its UTF-8 representation in ${.sh.edchar}. src/cmd/ksh93/edit/edit.c: ed_getchar(): - Remove the workaround from the referenced commit. - Use mbconv to convert innput codepoints to bytes before adding them to inbuff, a char array that is passed on to keytrap(). Related: https://bugzilla.redhat.com/show_bug.cgi?id=1503922 Related: https://github.com/att/ast/issues/197 Related: https://github.com/ksh93/ksh/issues/307 Resolves: https://github.com/ksh93/ksh/issues/460 Co-authored-by: Johnothan King --- NEWS | 6 ++++++ src/cmd/ksh93/edit/edit.c | 9 +++------ src/cmd/ksh93/include/version.h | 2 +- src/cmd/ksh93/sh.1 | 4 ---- 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index 7049189654d1..ddeb8537b680 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,12 @@ This documents significant changes in the 1.0 branch of ksh 93u+m. For full details, see the git log at: https://github.com/ksh93/ksh/tree/1.0 Uppercase BUG_* IDs are shell bug IDs as used by the Modernish shell library. +2024-12-30: + +- The KEYBD trap should now be fully functional for multibyte characters + (for example, non-Latin characters in UTF-8 locales). This fixes a bug + inherited from AT&T and worked around on 2022-02-12. + 2024-12-25: - The dirname path-bound built-in now accepts multiple operands. diff --git a/src/cmd/ksh93/edit/edit.c b/src/cmd/ksh93/edit/edit.c index 0c11991bac85..9458f53fd170 100644 --- a/src/cmd/ksh93/edit/edit.c +++ b/src/cmd/ksh93/edit/edit.c @@ -836,14 +836,11 @@ int ed_getchar(Edit_t *ep,int mode) { if(mode<=0 && -c == ep->e_intr) killpg(getpgrp(),SIGINT); - if(mode<=0 && sh.st.trap[SH_KEYTRAP] - /* workaround for : - * do not trigger KEYBD for non-ASCII in multibyte locale */ - && (!mbwide() || c > -128)) + if(mode<=0 && sh.st.trap[SH_KEYTRAP]) { ep->e_keytrap = 1; - n=1; - if((readin[0]= -c) == ESC) + n = mbconv(readin, -c); + if(n==1 && readin[0]==ESC) { while(1) { diff --git a/src/cmd/ksh93/include/version.h b/src/cmd/ksh93/include/version.h index 01d22e693560..d7e2cbbb7d7c 100644 --- a/src/cmd/ksh93/include/version.h +++ b/src/cmd/ksh93/include/version.h @@ -18,7 +18,7 @@ #include #include "git.h" -#define SH_RELEASE_DATE "2024-12-25" /* must be in this format for $((.sh.version)) */ +#define SH_RELEASE_DATE "2024-12-30" /* must be in this format for $((.sh.version)) */ /* * This comment keeps SH_RELEASE_DATE a few lines away from SH_RELEASE_SVER to avoid * merge conflicts when cherry-picking dev branch commits onto a release branch. diff --git a/src/cmd/ksh93/sh.1 b/src/cmd/ksh93/sh.1 index e6622ad0cfd2..41fea2bd176e 100644 --- a/src/cmd/ksh93/sh.1 +++ b/src/cmd/ksh93/sh.1 @@ -9513,10 +9513,6 @@ Thus, a trap on .B CHLD won't be executed until the foreground job terminates. .PP -In locales that use a multibyte character set such as UTF-8, the -.B KEYBD -trap is only triggered for ASCII characters (1-127). -.PP It is a good idea to leave a space after the comma operator in arithmetic expressions to prevent the comma from being interpreted as the decimal point character in certain locales.