Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UTF-64LE string detection when contains \xa1 character #1836

Closed
XVilka opened this issue Oct 14, 2021 · 5 comments · Fixed by #2691
Closed

Fix UTF-64LE string detection when contains \xa1 character #1836

XVilka opened this issue Oct 14, 2021 · 5 comments · Fixed by #2691
Labels
bug Something isn't working help wanted Extra attention is needed high-priority regression RzUtil
Milestone

Comments

@XVilka
Copy link
Member

XVilka commented Oct 14, 2021

[XX] db/cmd/metadata Csa, Cs. and Cs.l
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'e str.escbslash=true
s 0x140016018
Csa
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csw
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
' bins/pe/testapp-msvc64.exe
-- stdout
--- expected
+++ actual
@@ -4,10 +4,11 @@
 0x140016018 ascii[2] "\t"
             ;-- str.wide_esc:___0m:
             0x140016018     .string "\t" ; len=2
-Csw 19 @ 0x140016018 # \twide\\esc: \e[0m\xa1\r\n
-"\twide\\esc: \e[0m\xa1\r\n"
-ut16le[15] "\twide\\esc: \e[0m\xa1\r\n"
+Csw 31 @ 0x140016018 # \twide\\esc: \e[0m
+"\twide\\esc: \e[0m"
+utf16le[31] "\twide\\esc: \e[0m"
+0x140016018 utf16le[31] "\twide\\esc: \e[0m"
             ;-- str.wide_esc:___0m:
-            0x140016018     .string "\twide\\esc: \e[0m\xa1\r\n" ; len=19
+            0x140016018     .string "\twide\\esc: \e[0m" ; len=31
 0x140016018 ascii[4] "\t"
 0x140016018 ascii[4] "\t"

See the test called Csa, Cs. and Cs.l in `test/db/cmd/metadata.

Regression was introduced in 39613b5 (#1752)

Fix is likely required in these files:

  • librz/util/str_search.c
  • libz/util/utf8.c
  • librz/util/str.c
@XVilka
Copy link
Member Author

XVilka commented Oct 15, 2021

@borzacchiello since you worked successfully on the string search algorithm, could you please take a look?

@borzacchiello
Copy link
Contributor

@borzacchiello since you worked successfully on the string search algorithm, could you please take a look?

yep, I will look into it

@borzacchiello
Copy link
Contributor

borzacchiello commented Oct 15, 2021

The string at 0x140016018 is misinterpreted due to the false-positive filter (https://github.com/rizinorg/rizin/blob/dev/librz/util/str_search.c#L69).

The filter checks whether:
n_ascii_chars < n_chars / n_utf_blocks
In the case of the string, there are 2 UTF blocks (due to \xa1), and all characters but one is ASCII, so it fails, and the string is processed again accepting only ASCII characters (thus cutting the string up to \xa1).

I tried old commits (even older than the UTF refactoring), and it seems that the string was already misinterpreted.

To fix this, I think we should modify the false-positive filter.

@XVilka
Copy link
Member Author

XVilka commented Oct 16, 2021

Yes, in the test it was using the Cs commands that didn't share the code with the izz (the code you refactored into librz/util/str_search.c), it had no such checks and FP filters. But I checked the string and it's absolutely valid, thus it's a bug in the FP filter.

@ret2libc
Copy link
Member

I didn't see any work on this during this time, so I doubt it will be done for 0.4.0.
I'm removing the 0.4.0 milestone for now until someone starts to actively looking at this.

cc @XVilka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed high-priority regression RzUtil
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants