Skip to content

Commit

Permalink
Fix ${v//“”/str} and multibyte in ${v//~{E}/str} (re: f00b21a)
Browse files Browse the repository at this point in the history
Koichi Nakashima (@ko1nksm) reports:
> $ ksh -c 'v=あいう; echo "${v//""/-}"'
> -あ-�-�-い-�-�-う-�-�-
>
> $ ksh -c 'v=あいう; echo "${v//""/-}"' | od -tx1
> 0000000 2d e3 81 82 2d 81 2d 82 2d e3 81 84 2d 81 2d 84
> 0000020 2d e3 81 86 2d 81 2d 86 2d 0a
> 0000032

There are two problems here. First is that the glob pattern in the
expansion is empty (after removing the double quotes). Since an
empty glob pattern matches nothing, the output string should simply
be the value of v. However, since the referenced commit, the empty
pattern is changed to ~(E)^, an (anchored) empty ERE. Unlike an
empty glob pattern, an empty regular expression matches everything.
So a '-' should be inserted at the start, end, and between each
character. Which brings us to problem two: lack of multibyte
processing in that scenario, causing corrupted output.

As this commit, things work correctly:

  $ ksh -c 'v=あいう; echo "${v//""/-}"'
  あいう
  $ ksh -c 'v=あいう; echo "${v//~(E)/-}"'
  -あ-い-う-

src/cmd/ksh93/sh/macro.c: varsub():
- Only convert an empty pattern to ~(E)^ if the operator character
  is '#'. This should not be done for '/' or '//'. This fixes
  the first problem.
- The case of ${v//~(E)/str} brings us to some code that special-
  cases the behaviour for nmatch && match[1]==0 to avoid an
  infinite loop. Make that code multibyte-aware by calculating the
  size of each character in bytes using mbsize, and advancing by
  that number instead of 1. (In the case of an invalid multibyte
  sequence with mbsize returning -1, default to advancing by one
  byte as before.) This fixes the second problem.

Resolves: #813
  • Loading branch information
McDutchie committed Jan 7, 2025
1 parent 13d4423 commit 2582449
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 4 deletions.
10 changes: 10 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@ This documents significant changes in the 1.0 branch of ksh 93u+m.
For full details, see the git log at: https://github.com/ksh93/ksh/tree/1.0
Uppercase BUG_* IDs are shell bug IDs as used by the Modernish shell library.

2025-01-07:

- Fixed the expansion ${variable//""/string}; since the shell pattern is
empty, this should output the value of the variable unchanged, and now
once again does. Bug introduced on 2023-05-18.

- Fixed multibyte processing in the expansion ${variable//~(E)/string},
i.e., the pattern is an empty Extended Regular Expression (ERE). This
now inserts string between each character instead of each byte.

2025-01-05:

- Fixed a crash that could occur if a discipline function was first assigned
Expand Down
2 changes: 1 addition & 1 deletion src/cmd/ksh93/include/version.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#include <ast_release.h>
#include "git.h"

#define SH_RELEASE_DATE "2025-01-05" /* must be in this format for $((.sh.version)) */
#define SH_RELEASE_DATE "2025-01-07" /* must be in this format for $((.sh.version)) */
/*
* This comment keeps SH_RELEASE_DATE a few lines away from SH_RELEASE_SVER to avoid
* merge conflicts when cherry-picking dev branch commits onto a release branch.
Expand Down
10 changes: 7 additions & 3 deletions src/cmd/ksh93/sh/macro.c
Original file line number Diff line number Diff line change
Expand Up @@ -1916,7 +1916,7 @@ static int varsub(Mac_t *mp)
flag & STR_MAXIMAL);
else
nmatch = strngrpmatch(v, vsize,
*pattern ? pattern : "~(E)^",
*pattern ? pattern : (c=='#' ? "~(E)^" : pattern),
(ssize_t*)match,
elementsof(match) / 2,
flag | STR_INT);
Expand All @@ -1943,9 +1943,13 @@ static int varsub(Mac_t *mp)
/* avoid infinite loop */
if(nmatch && match[1]==0)
{
int sz;
nmatch = 0;
mac_copy(mp,v,1);
v++;
/* copy, and advance v by, one character */
if ((sz = mbsize(v)) < 1)
sz = 1;
mac_copy(mp, v, sz);
v += sz;
}
tsize -= v-oldv;
continue;
Expand Down
16 changes: 16 additions & 0 deletions src/cmd/ksh93/tests/locale.sh
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,22 @@ then unset s "${!LC_@}"
exp=$'typeset -L 12 s=コーンシェル\n6' # each double-width character counts for two terminal positions
[[ $got == "$exp" ]] || err_exit "default terminal width for typeset -L incorrect" \
"(expected $(printf %q "$exp"); got $(printf %q "$got"))"
unset s
fi
# ======
# https://github.com/ksh93/ksh/issues/813
if ((SHOPT_MULTIBYTE))
then LANG=C.UTF-8
s='あいう'
got=${s//""/-}
exp=$s
[[ $got == "$exp" ]] || err_exit '${parameter//“”/string}' \
"(expected $(printf %q "$exp"); got $(printf %q "$got"))"
got=${s//~(E)^/-}
exp='-あ-い-う-'
[[ $got == "$exp" ]] || err_exit '${parameter//~(E)^/string}' \
"(expected $(printf %q "$exp"); got $(printf %q "$got"))"
fi
# ======
Expand Down

0 comments on commit 2582449

Please sign in to comment.