Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported opcode: <INVALID> (bytecode=A6h) at position 36. #514

Open
Niocas opened this issue Sep 1, 2024 · 10 comments
Open

Unsupported opcode: <INVALID> (bytecode=A6h) at position 36. #514

Niocas opened this issue Sep 1, 2024 · 10 comments

Comments

@Niocas
Copy link

Niocas commented Sep 1, 2024

Unsupported opcode: (bytecode=A6h) at position 36.

I am trying to decompile a python 3.12 .pyc file. But it fails for nearly all files at bytecode "A6h".
How can I possibly fix that? I wrote python script with python 3.12 and imported opcode to print
all opcodes, but it seems like that are not all of them? What am I missing here, how can I fix
the decompiling process?

user@Windows-11-Pro:/mnt/c/Users/user/OneDrive/Desktop/oh-data/pycdc$ pycdc item_data_2.pyc
# Source Generated with Decompyle++
# File: item_data_2.pyc (Python 3.12)

Unsupported opcode: <INVALID> (bytecode=A6h) at position 36.
import bindict
# WARNING: Decompyle incomplete
@Niocas
Copy link
Author

Niocas commented Sep 1, 2024

item_data_pyc.zip

Here is one of the files I am trying to decompile.

@Niocas
Copy link
Author

Niocas commented Sep 1, 2024

image

I added it to the pythonb_3_12.cpp now, But now the output look like this when executing: " pycdc item_data_2.pyc".
Any ideas?

@Niocas
Copy link
Author

Niocas commented Sep 1, 2024

pycdas item_data_2.pyc outputs the following:
image

@greenozon
Copy link
Contributor

there is opcode 166 in your pyc - it is not legal one,
from cpython include/opcode.h: (Python 3.12)

image

@wilson0x4d
Copy link

wilson0x4d commented Sep 1, 2024

the direct answer here is: fixing <INVALID> from pycdc requires implementing a decompilation strategy in ASTree.cpp for the specific opcode/instruction, which is non-trivial. you can add the opcode to the case statement in ASTree.cpp just to get the tool to be quiet but it often results in incorrect/incomplete python output.

examples of opcodes blocking successful python code generation (from "OH" pycs) include:

  • END_FOR
  • JUMP_BACKWARD
  • JUMP_BACKWARD_NO_INTERRUPT
  • COPY
  • END_SEND
  • CALL_INTRINSIC_1
  • CLEANUP_THROW
  • DICT_MERGE
  • DICT_UPDATE
  • MAKE_CELL
  • RERAISE
  • SEND
  • UNPACK_SEQUENCE_LIST
  • UNPACK_SEQUENCE_TUPLE
  • UNPACK_SEQUENCE_TWO_TUPLE

Since getting an ultra-trivial merge for a PR proved impossible (#511 - nothing more than "testing the waters" here) I forked and stopped trying to work with pycdc devs, based on the title this message is coming from that fork.. that means you're also going to battle the fact that the original repo doesn't have complete opcode maps (pycdas doesn't produce 100% correct results for 3.11 nor 3.12) and you may be asking devs to implement/investigate something they haven't support for yet in the main repo.

for example, according to pycdc main repo "166" is not a valid opcode, but we can see that it is "UNPACK_SEQUENCE_TUPLE" from cpython source code.

#define UNPACK_SEQUENCE_TUPLE                  166

you can see the response from @greenozon illustrating the problem you are going to face here.

i'm trying to be kind about this problem. the fact is we have binaries in the wild which contain opcodes which the pycdc project denies exist.

as for the code you're reversing, in most cases the modules containing bindict have no useful code, they contain a bindict and a call out to a native bindict module that i've not been able to locate (possibly is packed inside the 50MB main exe, it doesn't exist anywhere in the pyc's) -- the bindict format is essentially a table similar to NXFNs along with a trailing binary blob (which is not consistent between bindicts, which means it must be contextual.) to illustrate what i mean, consider this pycdas result from another bindict file:

        0       RESUME                          0
        2       LOAD_CONST                      0: 0
        4       LOAD_CONST                      1: None
        6       IMPORT_NAME                     0: bindict
        8       STORE_NAME                      0: bindict
        10      PUSH_NULL                       
        12      LOAD_NAME                       0: bindict
        14      LOAD_ATTR                       0: bindict
        34      LOAD_CONST                      2: b'\x01\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00abnormal_item_state\x0c\x00\x00\x00\x00\x01\x00\x00\x01\x96\x05\x02v\x01\x0b\x01\x0f\x17\xfd8\x18\x00\x00\x00\x89\xc0\x95\x12\t\x00'
        36      UNPACK_SEQUENCE_TUPLE           1
        40      CALL                            1
        50      STORE_NAME                      1: data
        52      LOAD_CONST                      1: None
        54      RETURN_VALUE                    

you can see this is basically just calling bindict.bindict(...) passing in the constant bytes/string shown in the disasm. this is basically the same in all files containing bindict data.

the approximate py output from pycdc (if it were actually implemented rather than being denied) would look something like this:

# WIP opcode: UNPACK_SEQUENCE_TUPLE (bytecode=A6h) at position 36.
# Source Generated with Decompyle++
# File: abnormal_capture_rate_data.do.pyc (Python 3.12)

import bindict
data = bindict.bindict(b'\x07\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x000\x00\x00\x00?\x00\x00\x00O\x00\x00\x00_\x00\x00\x00h\x00\x00\x00settlement_rate2settlement_rate4settlement_rate3max_capture_nummust_succeed_numsettlement_rate1init_rateG\x01\x00\x00\x00\x00\x02\x02\x01\x01\x06\x03\x04\x05\n\x06\x00\x12\x00"\x02\x12\x02"\x01\x12\x01"\x06"\x03\x01\x04\x01\x05"\x96\x0e*\xfc\xa9\xf1\xd2Mb`?\xfa~j\xbct\x93h?{\x14\xaeG\xe1zt?\xfc\xa9\xf1\xd2MbP?\x04c\xfc\xa9\xf1\xd2MbP?\x96\x0e\x15\x00\x00\x80?\x00\x00\x00\x00\x00\x00\x00\x00ffffff\xe6?\x02\x02\x9a\x99\x99\x99\x99\x99\xe9?\x96\x0e*{\x14\xaeG\xe1z\x84?\xb8\x1e\x85\xebQ\xb8\x8e?\x9a\x99\x99\x99\x99\x99\x99?\xfa~j\xbct\x93h?\x04c{\x14\xaeG\xe1zt?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x03c333333\xe3?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xa9?333333\xb3?\x00\x00\x00>{\x14\xaeG\xe1zt?\x04c\x9a\x99\x99\x99\x99\x99\x99?\x96\x0e*333333\xe3?\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x9a\x99\x99\x99\x99\x99\xc9?\x04c\x9a\x99\x99\x99\x99\x99\xd9?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xc9?333333\xd3?\x00\x00\x00?\x9a\x99\x99\x99\x99\x99\xa9?\x04c\x9a\x99\x99\x99\x99\x99\xb9?f\x0b\x07\x00\x00\x00\x00\x93\x01\x00\x00\x1bc\r4\x97\x01\x00\x006\xc6\x1ah\x8f\x01\x00\x00\xc99\xe5\x97\x85\x01\x00\x00R)(\x9c\x88\x01\x00\x00\xe4\x9c\xf2\xcb\x8b\x01\x00\x00m\x8c5\xd0\x82\x01\x00\x00\x11\x07$\x01\x02Q\x11\x05r\x01\x01\x9f\x01\x11\x03\xc8\x01\x01\x00\xf1\x01\x11\x01\x9e\x02\x00')

anyway, the short answer is resolving the issue requires updating ASTree.cpp (after fixing the incomplete opcode maps.)

@wilson0x4d
Copy link

wilson0x4d commented Sep 1, 2024

@greenozon you might find this of interest:

https://github.com/wilson0x4d/pycdc/blob/wip/bytes/python_3_11.cpp

https://github.com/wilson0x4d/pycdc/blob/wip/bytes/python_3_12.cpp

i see no reason to not have entries for any opcode appearing in official cpython, it actually works against pycdc maintainers and its end-users trying to figure out what to keep and what to remove, and it causes no harm in having entries that cpython's compile(...) would not produce, the mere fact the opcode has representation in cpython source code at any point during the lifetime of a given version/branch is sufficient reason to be including them (IMHO)

@wilson0x4d
Copy link

i also have ASTree implementation code for a half dozen ops not pushed to my wip branch. would love if i could work with people that understand how to work with the ast stack and frame logic better than i do.

@jsrcode
Copy link

jsrcode commented Dec 1, 2024

这里的直接答案是:修复 FROM 需要针对特定的操作码/指令实施反编译策略,这并非易事。您可以将操作码添加到 case 语句中,只是为了让工具保持安静,但这通常会导致 Python 输出不正确/不完整。<INVALID>``pycdc``ASTree.cpp``ASTree.cpp

阻止成功生成 python 代码的操作码示例(来自 “OH” pycs)包括:

  • END_FOR
  • JUMP_BACKWARD
  • JUMP_BACKWARD_NO_INTERRUPT
  • 复制
  • END_SEND
  • CALL_INTRINSIC_1
  • CLEANUP_THROW
  • DICT_MERGE
  • DICT_UPDATE
  • MAKE_CELL
  • 再加注
  • 发送
  • UNPACK_SEQUENCE_LIST
  • UNPACK_SEQUENCE_TUPLE
  • UNPACK_SEQUENCE_TWO_TUPLE

由于为 PR 进行极其琐碎的合并被证明是不可能的(#511 - 这里只不过是“试水”),我分叉并停止尝试与 pycdc 开发人员合作,根据这条消息来自那个分叉的标题..这意味着您还将与原始存储库没有完整操作码映射的事实作斗争(pycdas 无法为 3.11 或 3.12 生成 100% 正确的结果),并且您可能会要求开发人员在主存储库中实现/调查他们尚不支持的东西。

例如,根据 Pycdc main repo “166” 不是一个有效的操作码,但我们可以看到它是 cpython 源码中的 “UNPACK_SEQUENCE_TUPLE”。

#define UNPACK_SEQUENCE_TUPLE                  166

您可以在此处看到 Illustproving the problem you will facing 的响应。

我试图对这个问题保持善意。事实是,我们在野外有二进制文件,其中包含 PycDC 项目否认存在的操作码。

至于你要反转的代码,在大多数情况下,包含的模块没有有用的代码,它们包含一个 bindict 和一个对我无法找到的原生模块的调用(可能打包在 50MB 的主 exe 中,它在 pyc 的任何地方都不存在)——bindict 格式本质上是一个类似于 NXFN 的表以及一个尾随的二进制 blob(这在 bindict 之间不一致, 这意味着它必须与上下文相关。为了说明我的意思,请考虑来自另一个 Bindict 文件的 pycdas 结果:bindict``bindict

        0       RESUME                          0
        2       LOAD_CONST                      0: 0
        4       LOAD_CONST                      1: None
        6       IMPORT_NAME                     0: bindict
        8       STORE_NAME                      0: bindict
        10      PUSH_NULL                       
        12      LOAD_NAME                       0: bindict
        14      LOAD_ATTR                       0: bindict
        34      LOAD_CONST                      2: b'\x01\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00abnormal_item_state\x0c\x00\x00\x00\x00\x01\x00\x00\x01\x96\x05\x02v\x01\x0b\x01\x0f\x17\xfd8\x18\x00\x00\x00\x89\xc0\x95\x12\t\x00'
        36      UNPACK_SEQUENCE_TUPLE           1
        40      CALL                            1
        50      STORE_NAME                      1: data
        52      LOAD_CONST                      1: None
        54      RETURN_VALUE                    

你可以看到,这基本上只是调用 PASS传入 DISASM 中显示的常量 bytes/string。这在包含 Bindict 数据的所有文件中基本相同。bindict.bindict(...)

(如果它实际实现而不是被拒绝)的近似 py 输出将如下所示:pycdc

# WIP opcode: UNPACK_SEQUENCE_TUPLE (bytecode=A6h) at position 36.
# Source Generated with Decompyle++
# File: abnormal_capture_rate_data.do.pyc (Python 3.12)

import bindict
data = bindict.bindict(b'\x07\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x000\x00\x00\x00?\x00\x00\x00O\x00\x00\x00_\x00\x00\x00h\x00\x00\x00settlement_rate2settlement_rate4settlement_rate3max_capture_nummust_succeed_numsettlement_rate1init_rateG\x01\x00\x00\x00\x00\x02\x02\x01\x01\x06\x03\x04\x05\n\x06\x00\x12\x00"\x02\x12\x02"\x01\x12\x01"\x06"\x03\x01\x04\x01\x05"\x96\x0e*\xfc\xa9\xf1\xd2Mb`?\xfa~j\xbct\x93h?{\x14\xaeG\xe1zt?\xfc\xa9\xf1\xd2MbP?\x04c\xfc\xa9\xf1\xd2MbP?\x96\x0e\x15\x00\x00\x80?\x00\x00\x00\x00\x00\x00\x00\x00ffffff\xe6?\x02\x02\x9a\x99\x99\x99\x99\x99\xe9?\x96\x0e*{\x14\xaeG\xe1z\x84?\xb8\x1e\x85\xebQ\xb8\x8e?\x9a\x99\x99\x99\x99\x99\x99?\xfa~j\xbct\x93h?\x04c{\x14\xaeG\xe1zt?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x03c333333\xe3?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xa9?333333\xb3?\x00\x00\x00>{\x14\xaeG\xe1zt?\x04c\x9a\x99\x99\x99\x99\x99\x99?\x96\x0e*333333\xe3?\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x9a\x99\x99\x99\x99\x99\xc9?\x04c\x9a\x99\x99\x99\x99\x99\xd9?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xc9?333333\xd3?\x00\x00\x00?\x9a\x99\x99\x99\x99\x99\xa9?\x04c\x9a\x99\x99\x99\x99\x99\xb9?f\x0b\x07\x00\x00\x00\x00\x93\x01\x00\x00\x1bc\r4\x97\x01\x00\x006\xc6\x1ah\x8f\x01\x00\x00\xc99\xe5\x97\x85\x01\x00\x00R)(\x9c\x88\x01\x00\x00\xe4\x9c\xf2\xcb\x8b\x01\x00\x00m\x8c5\xd0\x82\x01\x00\x00\x11\x07$\x01\x02Q\x11\x05r\x01\x01\x9f\x01\x11\x03\xc8\x01\x01\x00\xf1\x01\x11\x01\x9e\x02\x00')

无论如何,简短的回答是解决问题需要更新ASTree.cpp(在修复不完整的操作码映射之后)。

Unsupported opcode: END_FOR (113)那么有什么方法可以获取到他

@greenozon
Copy link
Contributor

@jsrcode not clear what you have wrote
please use English language

@jsrcode
Copy link

jsrcode commented Dec 2, 2024

@jsrcode not clear what you have wrote please use English language

I'm getting an error and he tells me Unsupported opcode: END_FOR (113) That is END_FOR this bytecode is not recognized, is there any way I can make this tool recognize him? I want to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants