Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytecode documentation issues for Python 3.12 #107457

Closed
20 tasks
MatthieuDartiailh opened this issue Jul 30, 2023 · 21 comments
Closed
20 tasks

Bytecode documentation issues for Python 3.12 #107457

MatthieuDartiailh opened this issue Jul 30, 2023 · 21 comments
Labels
3.12 bugs and security fixes docs Documentation in the Doc dir

Comments

@MatthieuDartiailh
Copy link
Contributor

MatthieuDartiailh commented Jul 30, 2023

As part of adding support for Python 3.12 to https://github.com/MatthieuDartiailh/bytecode I have identified a number of documentation issues both in the dismodule documentation in the bytecode section of the What's new in 3.12. Opening a separate issue for each point would just clutter the issue tracker so I will try to summarize all my findings here.

Issue in dis documentation

  • COMPARE_OP oparg now uses the 4 lower bits as a cache. As a consequence:
    • the statement The operation name can be found in cmp_op[opname] is not true anymore.
    • the meaning of the cache is not documented but it can be != 0 even for a freshly compiled bytecode and as a consequence the difference is visible in dis.dis output
  • MIN_INSTRUMENTED_OPCODE is not documented but needed to determine what are the "normal" opcodes
  • LOAD_SUPER_ATTR description could use a code block to describe its stack effect IMO
  • POP_JUMP_IF_NOT_NONE and POP_JUMP_IF_NONE are still described as pseudo-instructions even though there are not anymore in 3.12
  • CALL_INSTRINSIC_2 stack manipulation description looks wrong.
    The description reads Passes STACK[-2], STACK[-1] as the arguments and sets STACK[-1] to the result, but the implementation pops 2 values from the stack
  • END_SEND is not documented
  • how to account for the presence of CACHE instructions following jumping instructions is not described (FOR_ITER and SEND for the time being)

Issue in What's new

  • BINARY_SLICE is not mentioned
  • STORE_SLICE is not mentioned
  • CLEANUP_THROW is not mentioned
  • RETURN_CONST is not mentioned
  • LOAD_FAST_CHECK is not mentioned
  • END_SEND is not mentioned
  • CALL_INSTRINSIC_1/2 are not mentioned
  • FOR_ITER new behavior is not mentioned
  • The fact that POP_JUMP_IF_* family of instructions are now real instructions is not mentioned
  • YIELD_VALUE need for an argument is not mentioned
  • The addition of dis.hasexc is not mentioned

Linked PRs

@MatthieuDartiailh MatthieuDartiailh added the docs Documentation in the Doc dir label Jul 30, 2023
@iritkatriel
Copy link
Member

@MatthieuDartiailh Thanks for this. Are you working on a PR?

@iritkatriel iritkatriel added 3.12 bugs and security fixes needs backport to 3.12 bug and security fixes labels Aug 1, 2023
@MatthieuDartiailh
Copy link
Contributor Author

Not at the moment. If I manage to finish my work on bytecode, I will make a PR, but feel free to beat me to it.

@MatthieuDartiailh
Copy link
Contributor Author

Also note that while I could address many of those, I did not figure out yet what the lower 4 bits of COMPARE_OP are for.

@gvanrossum
Copy link
Member

Also note that while I could address many of those, I did not figure out yet what the lower 4 bits of COMPARE_OP are for.

See this fragment from Include/internal/pycore_code.h:

/* Comparison bit masks. */

/* Note this evaluates its arguments twice each */
#define COMPARISON_BIT(x, y) (1 << (2 * ((x) >= (y)) + ((x) <= (y))))

/*
 * The following bits are chosen so that the value of
 * COMPARSION_BIT(left, right)
 * masked by the values below will be non-zero if the
 * comparison is true, and zero if it is false */

/* This is for values that are unordered, ie. NaN, not types that are unordered, e.g. sets */
#define COMPARISON_UNORDERED 1

#define COMPARISON_LESS_THAN 2
#define COMPARISON_GREATER_THAN 4
#define COMPARISON_EQUALS 8

#define COMPARISON_NOT_EQUALS (COMPARISON_UNORDERED | COMPARISON_LESS_THAN | COMPARISON_GREATER_THAN)

@MatthieuDartiailh
Copy link
Contributor Author

Thanks !

I also found this fragment in compile.c that completes the picture:

static const int compare_masks[] = {
    [Py_LT] = COMPARISON_LESS_THAN,
    [Py_LE] = COMPARISON_LESS_THAN | COMPARISON_EQUALS,
    [Py_EQ] = COMPARISON_EQUALS,
    [Py_NE] = COMPARISON_NOT_EQUALS,
    [Py_GT] = COMPARISON_GREATER_THAN,
    [Py_GE] = COMPARISON_GREATER_THAN | COMPARISON_EQUALS,
};

@carljm
Copy link
Member

carljm commented Aug 8, 2023

Thanks for the report!

LOAD_SUPER_ATTR description could use a code block to describe its stack effect IMO

Re-reading this doc entry, I think it could be clearer that it pops three elements from the stack and pushes either one or two, depending on low bit of oparg (just like LOAD_ATTR). Would this prose clarification address your issue?

I'm not convinced that a code block would be better than a small addition to the prose. A code block is more verbose and introduces more distracting details, which would either be confusing because they differ from the actual implementation, or confusing because they match the actual implementation and thus add complexity that isn't needed in the documentation. But if you have an example of what a code block clarification would look like that you think is better than prose, I'm open to considering it.

@gvanrossum
Copy link
Member

Not to mention this magical code in bytecodes.c:

            double dleft = PyFloat_AS_DOUBLE(left);
            double dright = PyFloat_AS_DOUBLE(right);
            // 1 if NaN, 2 if <, 4 if >, 8 if ==; this matches low four bits of the oparg
            int sign_ish = COMPARISON_BIT(dleft, dright);
            _Py_DECREF_SPECIALIZED(left, _PyFloat_ExactDealloc);
            _Py_DECREF_SPECIALIZED(right, _PyFloat_ExactDealloc);
            res = (sign_ish & oparg) ? Py_True : Py_False;

and

            Py_ssize_t ileft = _PyLong_CompactValue((PyLongObject *)left);
            Py_ssize_t iright = _PyLong_CompactValue((PyLongObject *)right);
            // 2 if <, 4 if >, 8 if ==; this matches the low 4 bits of the oparg
            int sign_ish = COMPARISON_BIT(ileft, iright);
            _Py_DECREF_SPECIALIZED(left, (destructor)PyObject_Free);
            _Py_DECREF_SPECIALIZED(right, (destructor)PyObject_Free);
            res = (sign_ish & oparg) ? Py_True : Py_False;

That completes the picture; these are the only two uses (but they are in very performance-critical code, hence all these shenanigans).

@MatthieuDartiailh
Copy link
Contributor Author

I just noticed that END_SEND does not appear in the dis doc, I will edit my first message.

@MatthieuDartiailh
Copy link
Contributor Author

If anybody could enlighten me regarding the use of END_SEND and CLEANUP_THROW I would greatly appreciate it. I am currently having some issues when trying to compute the stack size for async function using async with in https://github.com/MatthieuDartiailh/bytecode.

@gvanrossum
Copy link
Member

For END_SEND, all I know is that it was added to support PEP 669 (instrumentation) in gh-103083.

For CLEANUP_THROW, look at gh-98001; that seems to indicate it has to do with line number tables.

@MatthieuDartiailh
Copy link
Contributor Author

MatthieuDartiailh commented Aug 11, 2023

It seems to me that SEND documentation is either wrong or poorly phrased. It says:

If the call raises StopIteration, pop both items, push the exception’s value attribute, and increment the bytecode counter by delta.

Which to me sounds like two values are removed from the stack and one is pushed giving a stack of -1 but SEND stack effect is 0

Another undocumented feature if that when jumping the offset is augmented by 1 compared to the oparg !!! which is rather surprising.

EDIT: same issue for FOR_ITER

@gvanrossum
Copy link
Member

For SEND, comparing the description to the actual code in the main branch, it looks like the docs are just wrong -- it leaves STACK[-2] (the generator or generator-ish object) in place. According to blame, that sentence was originally written by @brandtbucher about a year ago -- maybe it was always wrong, or maybe the implementation of SEND has changed since then? (The new sentence doesn't appear in the 3.11 docs, only in 3.12 and 3.13.)

I'm not sure the FOR_ITER docs are incorrect -- it doesn't have the same sentence in its docs, nor are the semantics the same. It's clear in main that it pushes an extra value if iter.__next__() produces one, and jumps by oparg if it raises StopIteration. The implementation pops the iterator and jumps one instruction more, but that's considered an optimization -- the instruction it skips is END_FOR, which pops two values off the stack, following the fiction that FOR_ITER always pushes a value. It is a requirement that the instruction pointed at by oparg is always END_FOR.

@MatthieuDartiailh
Copy link
Contributor Author

Thanks for your confirmation for the SEND docs.

Regarding FOR_ITER I was only bothered by the computation of the jump target, sorry for the confusion. I raised the issue because it was posing issues when trying to replace offset by labels in bytecode. So even if it is seen only as an optimization it does impact how one interpret the bytecode.

See for example:

>>> def f():
...  for i in range(10):
...   pass
...
>>> dis.dis(f)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + range)
             12 LOAD_CONST               1 (10)
             14 CALL                     1
             22 GET_ITER
        >>   24 FOR_ITER                 2 (to 32)
             28 STORE_FAST               0 (i)

  3          30 JUMP_BACKWARD            4 (to 24)

  2     >>   32 END_FOR
             34 RETURN_CONST             0 (None)

to reach the same conclusion as dis, one needs to know the oparg is augmented by one.

@gvanrossum
Copy link
Member

If you ask dis to show the cache entries it becomes clearer. You also need to know that the oparg of a branch (like FOR_ITER) is relative to the start of the following instruction.

>>> dis.dis(f, show_caches=True)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (range + NULL)
              4 CACHE                    0 (counter: 0)
              6 CACHE                    0 (index: 0)
              8 CACHE                    0 (module_keys_version: 0)
             10 CACHE                    0 (builtin_keys_version: 0)
             12 LOAD_CONST               1 (10)
             14 CALL                     1
             16 CACHE                    0 (counter: 0)
             18 CACHE                    0 (func_version: 0)
             20 CACHE                    0
             22 GET_ITER
        >>   24 FOR_ITER                 3 (to 34)
             26 CACHE                    0 (counter: 0)
             28 STORE_FAST               0 (i)

  3          30 JUMP_BACKWARD            5 (to 24)
             32 CACHE                    0 (counter: 0)

  2     >>   34 END_FOR
             36 RETURN_CONST             0 (None)
>>> 

That's in main; in 3.12, JUMP_BACKWARD has no cache entry, so the FOR_ITER oparg is 2. In both versions it means to jump over the STORE_FAST and JUMP_BACKWARD instructions, to the END_FOR.

This should be the same as for other forward jumps.

@MatthieuDartiailh
Copy link
Contributor Author

My issue is that it is not the same for other jump forward instr.

In the above example from @gvanrossum , we start counting from the instruction at offset 26, and we count 3 instructions: the one at offset 28, 30 and 32 but END_FOR is at 34. As a point of comparison we can look at the following case.

>>> def f(b):  
...   a = 1    
...   if b:
...     a += 1  
...   return a
>>> dis.dis(f, show_caches=True) 
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (1)
              4 STORE_FAST               1 (a)

  3           6 LOAD_FAST                0 (b)
              8 POP_JUMP_IF_FALSE        5 (to 20)

  4          10 LOAD_FAST                1 (a)
             12 LOAD_CONST               1 (1)
             14 BINARY_OP               13 (+=)
             16 CACHE                    0 (counter: 0)
             18 STORE_FAST               1 (a)

  5     >>   20 LOAD_FAST                1 (a)
             22 RETURN_VALUE

To figure out the target of POP_JUMP_IF_FALSE, we count 5 instructions from the instruction at offset 10, which gives us 12, 14, 16, 18, 20 and the instruction at offset 20 is indeed the jump target. So to get the right result for END_FOR/SEND_END we need to know that the oparg is increased by 1 when figuring out the jump target.

Does my explanation make sense ?

@MatthieuDartiailh
Copy link
Contributor Author

Different issue: LOAD_CLOSURE documentation refers to a non-existent co_fastlocalnames. Should it read co_cellvars instead ?

@gvanrossum
Copy link
Member

The question to ask generally is, where would a jump with oparg == 0 go. It should always go to the instruction immediately following the jump, after any caches belonging to the jump have been accounted for.

(Also, the count isn't really instructions -- an instruction can include cache entries so it has a variable length. The jump count is a number of "instruction units", which are always 2 bytes.)

@gvanrossum
Copy link
Member

Different issue: LOAD_CLOSURE documentation refers to a non-existent co_fastlocalnames. Should it read co_cellvars instead ?

IIRC co_fastlocalnames got renamed when we unified the numbering for locals and fast locals. I think we're looking for co_localsplusnames now. (In 3.13, LOAD_CLOSURE is gone.)

@MatthieuDartiailh
Copy link
Contributor Author

Thanks for the explanation regarding the jumps. It turns what looked like a magic number for just two opcodes into a generic rule. Would it be fine to add this explanation to the dis documentation ?

@gvanrossum
Copy link
Member

That would be great!

vstinner pushed a commit that referenced this issue Oct 17, 2023
…08900) (#110985)

gh-107457: update dis documentation with changes in 3.12 (GH-108900)
(cherry picked from commit 198aa67)

Co-authored-by: Matthieu Dartiailh <[email protected]>
@MatthieuDartiailh
Copy link
Contributor Author

I believe all points have been addressed by the referenced PRs and this can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

5 participants