Skip to content

Commit

Permalink
pythongh-119786: move interpreter doc from devguide to InternalDocs
Browse files Browse the repository at this point in the history
  • Loading branch information
iritkatriel committed Oct 18, 2024
1 parent e99650b commit f59037b
Show file tree
Hide file tree
Showing 6 changed files with 455 additions and 7 deletions.
Binary file added InternalDocs/.parser.md.swo
Binary file not shown.
29 changes: 22 additions & 7 deletions InternalDocs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,31 @@ it is not, please report that through the
Index:
-----

[Guide to the parser](parser.md)
** Compiling Python Source Code **
---

[Compiler Design](compiler.md)
- [Guide to the parser](parser.md)

[Frames](frames.md)
- [Compiler Design](compiler.md)

[Adaptive Instruction Families](adaptive.md)
** Runtime Objects **
---

[The Source Code Locations Table](locations.md)
- [Code Objects (coming soon)](code_objects.md)

[Garbage collector design](garbage_collector.md)
- [The Source Code Locations Table](locations.md)

[Exception Handling](exception_handling.md)
- [Generators (coming soon)](generators.md)

- [Frames](frames.md)

** Program Execution **
---

- [The Interpreter](interpreter.md)

- [Adaptive Instruction Families](adaptive.md)

- [Garbage collector design](garbage_collector.md)

- [Exception Handling](exception_handling.md)
43 changes: 43 additions & 0 deletions InternalDocs/_code_objects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@

Code objects
============

The interpreter uses a code object (``frame->f_code``) as its starting point.
Code objects contain many fields used by the interpreter, as well as some for use by debuggers and other tools.
In 3.11, the final field of a code object is an array of indeterminate length containing the bytecode, ``code->co_code_adaptive``.
(In previous versions the code object was a :class:`bytes` object, ``code->co_code``; it was changed to save an allocation and to allow it to be mutated.)

Code objects are typically produced by the bytecode :ref:`compiler <compiler>`, although they are often written to disk by one process and read back in by another.
The disk version of a code object is serialized using the :mod:`marshal` protocol.
Some code objects are pre-loaded into the interpreter using ``Tools/scripts/deepfreeze.py``, which writes ``Python/deepfreeze/deepfreeze.c``.

Code objects are nominally immutable.
Some fields (including ``co_code_adaptive``) are mutable, but mutable fields are not included when code objects are hashed or compared.

The locations table
-------------------

Whenever an exception is raised, we add a traceback entry to the exception.
The ``tb_lineno`` field of a traceback entry is (lazily) set to the line number of the instruction that raised it.
This field is computed from the locations table, ``co_linetable`` (this name is an understatement), using :c:func:`PyCode_Addr2Line`.
This table has an entry for every instruction rather than for every ``try`` block, so a compact format is very important.

The full design of the 3.11 locations table is written up in :cpy-file:`InternalDocs/locations.md`.
While there are rumors that this file is slightly out of date, it is still the best reference we have.
Don't be confused by :cpy-file:`Objects/lnotab_notes.txt`, which describes the 3.10 format.
For backwards compatibility this format is still supported by the ``co_lnotab`` property.

The 3.11 location table format is different because it stores not just the starting line number for each instruction, but also the end line number, *and* the start and end column numbers.
Note that traceback objects don't store all this information -- they store the start line number, for backward compatibility, and the "last instruction" value.
The rest can be computed from the last instruction (``tb_lasti``) with the help of the locations table.
For Python code, a convenient method exists, :meth:`~codeobject.co_positions`, which returns an iterator of :samp:`({line}, {endline}, {column}, {endcolumn})` tuples, one per instruction.
There is also ``co_lines()`` which returns an iterator of :samp:`({start}, {end}, {line})` tuples, where :samp:`{start}` and :samp:`{end}` are bytecode offsets.
The latter is described by :pep:`626`; it is more compact, but doesn't return end line numbers or column offsets.
From C code, you have to call :c:func:`PyCode_Addr2Location`.

Fortunately, the locations table is only consulted by exception handling (to set ``tb_lineno``) and by tracing (to pass the line number to the tracing function).
In order to reduce the overhead during tracing, the mapping from instruction offset to line number is cached in the ``_co_linearray`` field.


TODO:
- co_consts, co_names, co_varnames, and their ilk
5 changes: 5 additions & 0 deletions InternalDocs/code_objects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

Code objects
============

Coming soon.
9 changes: 9 additions & 0 deletions InternalDocs/generators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

Generators
==========

Coming soon.

<!--
- Generators, async functions, async generators, and ``yield from`` (next, send, throw, close; and await; and how this code breaks the interpreter abstraction)
-->
Loading

0 comments on commit f59037b

Please sign in to comment.