Fix UTF-64LE string detection when contains `\xa1` character #1836

XVilka · 2021-10-14T14:31:35Z

[XX] db/cmd/metadata Csa, Cs. and Cs.l
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'e str.escbslash=true
s 0x140016018
Csa
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csw
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
' bins/pe/testapp-msvc64.exe
-- stdout
--- expected
+++ actual
@@ -4,10 +4,11 @@
 0x140016018 ascii[2] "\t"
             ;-- str.wide_esc:___0m:
             0x140016018     .string "\t" ; len=2
-Csw 19 @ 0x140016018 # \twide\\esc: \e[0m\xa1\r\n
-"\twide\\esc: \e[0m\xa1\r\n"
-ut16le[15] "\twide\\esc: \e[0m\xa1\r\n"
+Csw 31 @ 0x140016018 # \twide\\esc: \e[0m
+"\twide\\esc: \e[0m"
+utf16le[31] "\twide\\esc: \e[0m"
+0x140016018 utf16le[31] "\twide\\esc: \e[0m"
             ;-- str.wide_esc:___0m:
-            0x140016018     .string "\twide\\esc: \e[0m\xa1\r\n" ; len=19
+            0x140016018     .string "\twide\\esc: \e[0m" ; len=31
 0x140016018 ascii[4] "\t"
 0x140016018 ascii[4] "\t"

See the test called Csa, Cs. and Cs.l in `test/db/cmd/metadata.

Regression was introduced in 39613b5 (#1752)

Fix is likely required in these files:

librz/util/str_search.c
libz/util/utf8.c
librz/util/str.c

The text was updated successfully, but these errors were encountered:

XVilka · 2021-10-15T08:49:49Z

@borzacchiello since you worked successfully on the string search algorithm, could you please take a look?

borzacchiello · 2021-10-15T20:14:40Z

@borzacchiello since you worked successfully on the string search algorithm, could you please take a look?

yep, I will look into it

borzacchiello · 2021-10-15T21:05:41Z

The string at 0x140016018 is misinterpreted due to the false-positive filter (https://github.com/rizinorg/rizin/blob/dev/librz/util/str_search.c#L69).

The filter checks whether:
n_ascii_chars < n_chars / n_utf_blocks
In the case of the string, there are 2 UTF blocks (due to \xa1), and all characters but one is ASCII, so it fails, and the string is processed again accepting only ASCII characters (thus cutting the string up to \xa1).

I tried old commits (even older than the UTF refactoring), and it seems that the string was already misinterpreted.

To fix this, I think we should modify the false-positive filter.

XVilka · 2021-10-16T03:36:36Z

Yes, in the test it was using the Cs commands that didn't share the code with the izz (the code you refactored into librz/util/str_search.c), it had no such checks and FP filters. But I checked the string and it's absolutely valid, thus it's a bug in the FP filter.

ret2libc · 2022-02-10T10:47:26Z

I didn't see any work on this during this time, so I doubt it will be done for 0.4.0.
I'm removing the 0.4.0 milestone for now until someone starts to actively looking at this.

cc @XVilka

XVilka mentioned this issue Oct 14, 2021

Port C (metainformation) commands to the rzshell #1752

Merged

4 tasks

ret2libc removed this from the 0.4.0 milestone Feb 10, 2022

borzacchiello mentioned this issue Jun 13, 2022

Better False-Positive Detection in rz_scan_strings #2691

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UTF-64LE string detection when contains `\xa1` character #1836

Fix UTF-64LE string detection when contains `\xa1` character #1836

XVilka commented Oct 14, 2021 •

edited

Loading

XVilka commented Oct 15, 2021

borzacchiello commented Oct 15, 2021

borzacchiello commented Oct 15, 2021 •

edited

Loading

XVilka commented Oct 16, 2021

ret2libc commented Feb 10, 2022

Fix UTF-64LE string detection when contains \xa1 character #1836

Fix UTF-64LE string detection when contains \xa1 character #1836

Comments

XVilka commented Oct 14, 2021 • edited Loading

XVilka commented Oct 15, 2021

borzacchiello commented Oct 15, 2021

borzacchiello commented Oct 15, 2021 • edited Loading

XVilka commented Oct 16, 2021

ret2libc commented Feb 10, 2022

Fix UTF-64LE string detection when contains `\xa1` character #1836

Fix UTF-64LE string detection when contains `\xa1` character #1836

XVilka commented Oct 14, 2021 •

edited

Loading

borzacchiello commented Oct 15, 2021 •

edited

Loading