Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port C (metainformation) commands to the rzshell #1752

Merged
merged 21 commits into from
Oct 15, 2021
Merged

Conversation

XVilka
Copy link
Member

@XVilka XVilka commented Sep 26, 2021

SQUASH ME

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the documentation and the rizin book with the relevant information (if needed)

Detailed description

  • Port C (meta information, comments) commands to the newshell
  • Moved all subcommands of Cvr, Cvs, Cvb to the higher level as subcommands of Cv:
    - Cv [name] [text] to append the comment of any kind of variable
    - Cv- [name] to remove the comment of any kind of variable
    - Cve [name] to open cfg.editor to edit the comment of any kind of variable
  • Cvr, Cvs, and Cvb just list all comments of the variables/arguments of the corresponding kind
  • Cd previously allowed the second parameter (repeat) to be passed in two different ways: Cd 4[8] and Cd 4 8, where 4 is the size and 8 is the repeat. I removed the Cd 4[8] and left only Cd 4 8 for simplicity and consistency.
  • Csj, Cs*, Csl became Cslj, Csl*, Csll since pure Cs should not list strings and only add it, Csl was made to list strings in various modes.
  • Cd, Cf, CC, CS now require their arguments, previously called without arguments they were listing corresponding meta information.
  • The listing feature moved to Cdl, Cfl, CCl, CSl, etc, similar to Csl.
  • Cz command as an alias to Csa command was completely removed
  • CC, (filelink) command was renamed to CCF.
  • CCa (append command) was removed in favor of plain CC
  • Csg command to guess encoding was removed, Cs now tries to guess the encoding by default
  • Cs.. was renamed to Cs.l - as in LONG output type of the Cs. command (there is also JSON output with Cs.j)
  • Added new commands Csw and CsW to add UTF-16 and UTF-32 strings correspondingly

Test plan

CI is green

Closes #792
Partially addresses #1342

args: []
- name: CCu
cname: comment_unique
summary: Add unique comment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a unique comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ret2libc other commands, e.g. CC and CCa append a new comment to all comments existed before. This one overwrites any existing comment and ensures there is only one. A bit weird, I know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I guessed that but I suggest to edit a bit the summary/description of this command because it is quite hard to understand.

@XVilka

This comment has been minimized.

@ret2libc

This comment has been minimized.

@wargio

This comment has been minimized.

@XVilka
Copy link
Member Author

XVilka commented Oct 11, 2021

Ok, the only remaining issue now is the escaping backslashes. When strings found during the binary scanning in librz/bin/bfile.c they are already escaped the special characters, e.g. <Tab> character becomes \t, etc. The problem is, that if the string contained the \ (backslash) character, it doesn't escape it. Thus, if we escape it during the printing, these \t get second escape e.g. \\t.

[XX] db/cmd/metadata Cs8
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'e str.escbslash=true
s 0x004021ff
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs8
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
' bins/elf/strenc
-- stdout
--- expected
+++ actual
@@ -1,9 +1,9 @@
-Cs8 61 @ 0x004021ff # utf8> \\u00a2\\u20ac\\U00010348 in yellow:\e[33m \xc2\xa2\xe2\x82\xac\xf0\x90\x8d\x88 \e[0m\n
-"utf8> \\u00a2\\u20ac\\U00010348 in yellow:\e[33m \xc2\xa2\xe2\x82\xac\xf0\x90\x8d\x88 \e[0m\n"
-utf8[61] "utf8> \\u00a2\\u20ac\\U00010348 in yellow:\e[33m \xc2\xa2\xe2\x82\xac\xf0\x90\x8d\x88 \e[0m\n"
-0x004021ff utf8[61] "utf8> \\u00a2\\u20ac\\U00010348 in yellow:\e[33m \xc2\xa2\xe2\x82\xac\xf0\x90\x8d\x88 \e[0m\n"
+Cs8 61 @ 0x004021ff # utf8> \\\\u00a2\\\\u20ac\\\\U00010348 in yellow:\\e[33m \u00a2\u20ac\U00010348 \\e[0m\\n
+"utf8> \\\\u00a2\\\\u20ac\\\\U00010348 in yellow:\\e[33m \u00a2\u20ac\U00010348 \\e[0m\\n"
+utf8[61] "utf8> \\\\u00a2\\\\u20ac\\\\U00010348 in yellow:\\e[33m \u00a2\u20ac\U00010348 \\e[0m\\n"
+0x004021ff utf8[61] "utf8> \\\\u00a2\\\\u20ac\\\\U00010348 in yellow:\\e[33m \u00a2\u20ac\U00010348 \\e[0m\\n"
             ;-- str.utf8____u00a2__u20ac__U00010348_in_yellow:_e_33m____________e_0m:
-            0x004021ff     .string "utf8> \\u00a2\\u20ac\\U00010348 in yellow:\e[33m \xc2\xa2\xe2\x82\xac\xf0\x90\x8d\x88 \e[0m\n" ; len=61
+            0x004021ff     .string "utf8> \\\\u00a2\\\\u20ac\\\\U00010348 in yellow:\\e[33m \u00a2\u20ac\U00010348 \\e[0m\\n" ; len=61
 Cs8 61 @ 0x004021ff # utf8> \\u00a2\\u20ac\\U00010348 in yellow:\x1b[33m \u00a2\u20ac\U00010348 \x1b[0m\n
 "utf8> \\u00a2\\u20ac\\U00010348 in yellow:\x1b[33m \u00a2\u20ac\U00010348 \x1b[0m\n"
 utf8[61] "utf8> \\u00a2\\u20ac\\U00010348 in yellow:\x1b[33m \u00a2\u20ac\U00010348 \x1b[0m\n"

[XX] db/cmd/metadata Csa, Cs. and Cs.l
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'e str.escbslash=true
s 0x140016018
Csa
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csw
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
' bins/pe/testapp-msvc64.exe
-- stdout
--- expected
+++ actual
@@ -4,10 +4,11 @@
 0x140016018 ascii[2] "\t"
             ;-- str.wide__esc:__e_0m:
             0x140016018     .string "\t" ; len=2
-Csw 19 @ 0x140016018 # \twide\\esc: \x1b[0m\xa1\r\n
-"\twide\\esc: \x1b[0m\xa1\r\n"
-ut16le[15] "\twide\\esc: \x1b[0m\xa1\r\n"
+Csw 31 @ 0x140016018 # \\twide\\\\esc: \\e[0m
+"\\twide\\\\esc: \\e[0m"
+utf16le[31] "\\twide\\\\esc: \\e[0m"
+0x140016018 utf16le[31] "\\twide\\\\esc: \\e[0m"
             ;-- str.wide__esc:__e_0m:
-            0x140016018     .string "\twide\\esc: \x1b[0m\xa1\r\n" ; len=19
+            0x140016018     .string "\\twide\\\\esc: \\e[0m" ; len=31
 0x140016018 ascii[4] "\t"
 0x140016018 ascii[4] "\t"

@ret2libc

This comment has been minimized.

@ret2libc

This comment has been minimized.

@ret2libc

This comment has been minimized.

@ret2libc

This comment has been minimized.

Copy link
Member

@ret2libc ret2libc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of missing errors check in the command handlers, but those things seem to be missing at the API level so that's alright for now.

Everything, from the code to the help structure look much better organized and simple to understand. There may be bugs in the rewrite, but... we'll catch them I guess :) Nice job!

Just fix those few remaining things, but overall LGTM.

@ret2libc
Copy link
Member

Ok, the only remaining issue now is the escaping backslashes. When strings found during the binary scanning in librz/bin/bfile.c they are already escaped the special characters, e.g. <Tab> character becomes \t, etc. The problem is, that if the string contained the \ (backslash) character, it doesn't escape it. Thus, if we escape it during the printing, these \t get second escape e.g. \\t.

Why do we need to escape it when finding them? I think only the printing should escape/sanitize strings, not the search. One could as well define a string including \x03\x02\x01\x04 and it's probably better to store those strings just that way, instead of `"\x03\x02\x01\x04"

@XVilka
Copy link
Member Author

XVilka commented Oct 11, 2021

Why do we need to escape it when finding them?

I don't know, this is how it was implemented (it's outside of the RzAnalysis).

@ret2libc

This comment has been minimized.

@ret2libc

This comment has been minimized.

@ret2libc

This comment has been minimized.

@ret2libc
Copy link
Member

[XX] db/cmd/metadata Csa, Cs. and Cs.l
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'e str.escbslash=true
s 0x140016018
Csa
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csw
Csl*~`spad`
Cs.q
Cs.
Cs.l
pd 1
Cs-
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
Csa 4
Cs.l
Cs.l @ 0x14001601c  # should print nothing
' bins/pe/testapp-msvc64.exe
-- stdout
--- expected
+++ actual
@@ -4,10 +4,11 @@
 0x140016018 ascii[2] "\t"
             ;-- str.wide_esc:___0m:
             0x140016018     .string "\t" ; len=2
-Csw 19 @ 0x140016018 # \twide\\esc: \e[0m\xa1\r\n
-"\twide\\esc: \e[0m\xa1\r\n"
-ut16le[15] "\twide\\esc: \e[0m\xa1\r\n"
+Csw 31 @ 0x140016018 # \twide\\esc: \e[0m
+"\twide\\esc: \e[0m"
+utf16le[31] "\twide\\esc: \e[0m"
+0x140016018 utf16le[31] "\twide\\esc: \e[0m"
             ;-- str.wide_esc:___0m:
-            0x140016018     .string "\twide\\esc: \e[0m\xa1\r\n" ; len=19
+            0x140016018     .string "\twide\\esc: \e[0m" ; len=31
 0x140016018 ascii[4] "\t"
 0x140016018 ascii[4] "\t"

This needs more investigation. there are missing chars in the string (\xa1\r\n) and also the length is different. I guess the old code was using the length of the utf8 string while the new one use the number of raw bytes.

@XVilka
Copy link
Member Author

XVilka commented Oct 14, 2021

@kazarmy ignore that please, it was invalid assumption

@kazarmy
Copy link
Member

kazarmy commented Oct 14, 2021 via email

@XVilka
Copy link
Member Author

XVilka commented Oct 14, 2021

Marked the remaining test as BROKEN and added an issue #1836

Time to merge once green (or fix while it's building, if you are brave enough).

@XVilka
Copy link
Member Author

XVilka commented Oct 14, 2021

There is one test broken only on Travis PPC64LE and ARM:

[XX] db/formats/smd smd strings
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc '. scripts/smd_strings.rz
e bin.str.filter=U
izz
' malloc://256k
-- stdout
--- expected
+++ actual
@@ -96,4 +96,6 @@
 283 0x0001acde 0x0001acde 14  15           ascii DEAQDEATDEATDE
 325 0x0001c542 0x0001c542 14  15           ascii DEAQDEATDEATDE
 369 0x0001d8aa 0x0001d8aa 12  13           ascii DDEQTDDEUUUU
+372 0x0001dd9e 0x0001dd9e 11  30           utf8  QAဘႈ膈膈膈膈膈膈ႁ blocks=Basic Latin,Myanmar,CJK Unified Ideographs
+379 0x0001edc6 0x0001edc6 11  30           utf8  QAဘႈ膈膈膈膈膈膈ႁ blocks=Basic Latin,Myanmar,CJK Unified Ideographs
 394 0x0002036c 0x0002036c 63  64           ascii 3331333333333333DDDADDDADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDADDDADDD

@XVilka

This comment has been minimized.

@XVilka
Copy link
Member Author

XVilka commented Oct 15, 2021

MacOS worker failed with unrelated error (network problem):


[XX] db/io/http r2 http://example.org
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'r
b 64
ps
' http://example.org
-- stdout
--- expected
+++ actual
@@ -1,6 +1,0 @@
-1256
-<!doctype html>
-<html>
-<head>
-    <title>Example Domain</title>
-

-- stderr
rz_socket_connect: Error in getaddrinfo: nodename nor servname provided, or not known (example.org:80)
Cannot connect to example.org:80
No HTTP response
[r] Cannot open 'http://example.org'

-- exit status: 1

@XVilka XVilka merged commit 39613b5 into dev Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert C (cmd_meta.c) commands to rzshell
4 participants