From 871e866d09f849261d08a131a2be4f6b2f13e75e Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Mon, 21 Oct 2024 23:33:19 +0100 Subject: [PATCH] relative links to source --- InternalDocs/adaptive.md | 3 +- InternalDocs/compiler.md | 232 +++++++++++++---------------- InternalDocs/exception_handling.md | 20 +-- InternalDocs/frames.md | 10 +- InternalDocs/parser.md | 80 +++++----- 5 files changed, 148 insertions(+), 197 deletions(-) diff --git a/InternalDocs/adaptive.md b/InternalDocs/adaptive.md index 09245730b271fa..4ae9e85b387f39 100644 --- a/InternalDocs/adaptive.md +++ b/InternalDocs/adaptive.md @@ -31,8 +31,7 @@ although these are not fundamental and may change: ## Example family -The `LOAD_GLOBAL` instruction (in -[Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c)) +The `LOAD_GLOBAL` instruction (in [Python/bytecodes.c](../Python/bytecodes.c)) already has an adaptive family that serves as a relatively simple example. The `LOAD_GLOBAL` instruction performs adaptive specialization, diff --git a/InternalDocs/compiler.md b/InternalDocs/compiler.md index ed62f47bbe35d4..0da4670c792cb5 100644 --- a/InternalDocs/compiler.md +++ b/InternalDocs/compiler.md @@ -7,17 +7,16 @@ Abstract In CPython, the compilation from source code to bytecode involves several steps: -1. Tokenize the source code - [Parser/lexer/](https://github.com/python/cpython/blob/main/Parser/lexer/) - and [Parser/tokenizer/](https://github.com/python/cpython/blob/main/Parser/tokenizer/). +1. Tokenize the source code [Parser/lexer/](../Parser/lexer/) + and [Parser/tokenizer/](../Parser/tokenizer/). 2. Parse the stream of tokens into an Abstract Syntax Tree - [Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c). + [Parser/parser.c](../Parser/parser.c). 3. Transform AST into an instruction sequence - [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c). + [Python/compile.c](../Python/compile.c). 4. Construct a Control Flow Graph and apply optimizations to it - [Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c). + [Python/flowgraph.c](../Python/flowgraph.c). 5. Emit bytecode based on the Control Flow Graph - [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c). + [Python/assemble.c](../Python/assemble.c). This document outlines how these steps of the process work. @@ -36,12 +35,10 @@ of tokens rather than a stream of characters which is more common with PEG parsers. The grammar file for Python can be found in -[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram). +[Grammar/python.gram](../Grammar/python.gram). The definitions for literal tokens (such as `:`, numbers, etc.) can be found in -[Grammar/Tokens](https://github.com/python/cpython/blob/main/Grammar/Tokens). -Various C files, including -[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c) -are generated from these. +[Grammar/Tokens](../Grammar/Tokens). Various C files, including +[Parser/parser.c](../Parser/parser.c) are generated from these. See Also: @@ -63,7 +60,7 @@ specification of the AST nodes is specified using the Zephyr Abstract Syntax Definition Language (ASDL) [^1], [^2]. The definition of the AST nodes for Python is found in the file -[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl). +[Parser/Python.asdl](../Parser/Python.asdl). Each AST node (representing statements, expressions, and several specialized types, like list comprehensions and exception handlers) is @@ -156,8 +153,8 @@ In general, unless you are working on the critical core of the compiler, memory management can be completely ignored. But if you are working at either the very beginning of the compiler or the end, you need to care about how the arena works. All code relating to the arena is in either -[Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h) -or [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c). +[Include/internal/pycore_pyarena.h](../Include/internal/pycore_pyarena.h) +or [Python/pyarena.c](../Python/pyarena.c). `PyArena_New()` will create a new arena. The returned `PyArena` structure will store pointers to all memory given to it. This does the bookkeeping of @@ -181,17 +178,17 @@ Source code to AST The AST is generated from source code using the function `_PyParser_ASTFromString()` or `_PyParser_ASTFromFile()` -[Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c). +[Parser/peg_api.c](../Parser/peg_api.c). After some checks, a helper function in -[Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c) +[Parser/parser.c](../Parser/parser.c) begins applying production rules on the source code it receives; converting source code to tokens and matching these tokens recursively to their corresponding rule. The production rule's corresponding rule function is called on every match. These rule functions follow the format `xx_rule`. Where *xx* is the grammar rule that the function handles and is automatically derived from -[Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram) by -[Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py). +[Grammar/python.gram](../Grammar/python.gram) by +[Tools/peg_generator/pegen/c_generator.py](../Tools/peg_generator/pegen/c_generator.py). Each rule function in turn creates an AST node as it goes along. It does this by allocating all the new nodes it needs, calling the proper AST node creation @@ -202,12 +199,9 @@ there are no more rules, an error is set and the parsing ends. The AST node creation helper functions have the name `_PyAST_{xx}` where *xx* is the AST node that the function creates. These are defined by the -ASDL grammar and contained in -[Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c) -(which is generated by -[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py) -from -[Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl)). +ASDL grammar and contained in [Python/Python-ast.c](../Python/Python-ast.c) +(which is generated by [Parser/asdl_c.py](../Parser/asdl_c.py) +from [Parser/Python.asdl](../Parser/Python.asdl)). This all leads to a sequence of AST nodes stored in `asdl_seq` structs. To demonstrate everything explained so far, here's the @@ -262,9 +256,8 @@ manner stated in the previous paragraphs. There are macros for creating and using `asdl_xx_seq *` types, where *xx* is a type of the ASDL sequence. Three main types are defined manually -- `generic`, `identifier` and `int`. These types are found in -[Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c) -and its corresponding header file -[Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h). +[Python/asdl.c](../Python/asdl.c) and its corresponding header file +[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h). Functions and macros for creating `asdl_xx_seq *` types are as follows: `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)` @@ -275,10 +268,8 @@ Functions and macros for creating `asdl_xx_seq *` types are as follows: Allocate memory for an `asdl_int_seq` of the specified length In addition to the three types mentioned above, some ASDL sequence types are -automatically generated by -[Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py) -and found in -[Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h). +automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in +[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h). Macros for using both manually defined and automatically generated ASDL sequence types are as follows: @@ -355,12 +346,10 @@ AST to CFG to bytecode ====================== The conversion of an `AST` to bytecode is initiated by a call to the function -`_PyAST_Compile()` in -[Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c). +`_PyAST_Compile()` in [Python/compile.c](../Python/compile.c). The first step is to construct the symbol table. This is implemented by -`_PySymtable_Build()` in -[Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c). +`_PySymtable_Build()` in [Python/symtable.c](../Python/symtable.c). This function begins by entering the starting code block for the AST (passed-in) and then calling the proper `symtable_visit_{xx}` function (with *xx* being the AST node type). Next, the AST tree is walked with the various code blocks that @@ -368,13 +357,12 @@ delineate the reach of a local variable as blocks are entered and exited using `symtable_enter_block()` and `symtable_exit_block()`, respectively. Once the symbol table is created, the `AST` is transformed by `compiler_codegen()` -in [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c) -into a sequence of pseudo instructions. These are similar to bytecode, but -in some cases they are more abstract, and are resolved later into actual -bytecode. The construction of this instruction sequence is handled by several -functions that break the task down by various AST node types. The functions are -all named `compiler_visit_{xx}` where *xx* is the name of the node type (such -as `stmt`, `expr`, etc.). Each function receives a `struct compiler *` +in [Python/compile.c](../Python/compile.c) into a sequence of pseudo instructions. +These are similar to bytecode, but in some cases they are more abstract, and are +resolved later into actual bytecode. The construction of this instruction sequence +is handled by several functions that break the task down by various AST node types. +The functions are all named `compiler_visit_{xx}` where *xx* is the name of the node +type (such as `stmt`, `expr`, etc.). Each function receives a `struct compiler *` and `{xx}_ty` where *xx* is the AST node type. Typically these functions consist of a large 'switch' statement, branching based on the kind of node type passed to it. Simple things are handled inline in the @@ -439,31 +427,27 @@ by `_PyCfg_FromInstructionSequence()`. Then `_PyCfg_OptimizeCodeUnit()` applies various peephole optimizations, and `_PyCfg_OptimizedCfgToInstructionSequence()` converts the optimized `CFG` back into an instruction sequence. These conversions and optimizations are -implemented in -[Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c). +implemented in [Python/flowgraph.c](../Python/flowgraph.c). Finally, the sequence of pseudo-instructions is converted into actual bytecode. This includes transforming pseudo instructions into actual instructions, converting jump targets from logical labels to relative offsets, and -construction of the -[exception table](exception_handling.md) and -[locations table](https://github.com/python/cpython/blob/main/InternalDocs/locations.md). +construction of the [exception table](exception_handling.md) and +[locations table](locations.md). The bytecode and tables are then wrapped into a `PyCodeObject` along with additional metadata, including the `consts` and `names` arrays, information about function reference to the source code (filename, etc). All of this is implemented by -`_PyAssemble_MakeCodeObject()` in -[Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c). +`_PyAssemble_MakeCodeObject()` in [Python/assemble.c](../Python/assemble.c). Code objects ============ The result of `PyAST_CompileObject()` is a `PyCodeObject` which is defined in -[Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h). +[Include/cpython/code.h](../Include/cpython/code.h). And with that you now have executable Python bytecode! -The code objects (byte code) are executed in -[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c). +The code objects (byte code) are executed in [Python/ceval.c](../Python/ceval.c). This file will also need a new case statement for the new opcode in the big switch statement in `_PyEval_EvalFrameDefault()`. @@ -471,152 +455,138 @@ statement in `_PyEval_EvalFrameDefault()`. Important files =============== -* [Parser/](https://github.com/python/cpython/blob/main/Parser/) +* [Parser/](../Parser/) - * [Parser/Python.asdl](https://github.com/python/cpython/blob/main/Parser/Python.asdl): + * [Parser/Python.asdl](../Parser/Python.asdl): ASDL syntax file. - * [Parser/asdl.py](https://github.com/python/cpython/blob/main/Parser/asdl.py): + * [Parser/asdl.py](../Parser/asdl.py): Parser for ASDL definition files. Reads in an ASDL description and parses it into an AST that describes it. - * [Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py): + * [Parser/asdl_c.py](../Parser/asdl_c.py): Generate C code from an ASDL description. Generates - [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c) - and - [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h). - - * [Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c): - The new PEG parser introduced in Python 3.9. - Generated by - [Tools/peg_generator/pegen/c_generator.py](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/c_generator.py) - from the grammar [Grammar/python.gram](https://github.com/python/cpython/blob/main/Grammar/python.gram). + [Python/Python-ast.c](../Python/Python-ast.c) and + [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h). + + * [Parser/parser.c](../Parser/parser.c): + The new PEG parser introduced in Python 3.9. Generated by + [Tools/peg_generator/pegen/c_generator.py](../Tools/peg_generator/pegen/c_generator.py) + from the grammar [Grammar/python.gram](../Grammar/python.gram). Creates the AST from source code. Rule functions for their corresponding production rules are found here. - * [Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c): - Contains high-level functions which are - used by the interpreter to create an AST from source code. + * [Parser/peg_api.c](../Parser/peg_api.c): + Contains high-level functions which are used by the interpreter to create + an AST from source code. - * [Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c): + * [Parser/pegen.c](../Parser/pegen.c): Contains helper functions which are used by functions in - [Parser/parser.c](https://github.com/python/cpython/blob/main/Parser/parser.c) - to construct the AST. Also contains helper functions which help raise better error messages - when parsing source code. + [Parser/parser.c](../Parser/parser.c) to construct the AST. Also contains + helper functions which help raise better error messages when parsing source code. - * [Parser/pegen.h](https://github.com/python/cpython/blob/main/Parser/pegen.h): - Header file for the corresponding - [Parser/pegen.c](https://github.com/python/cpython/blob/main/Parser/pegen.c). + * [Parser/pegen.h](../Parser/pegen.h): + Header file for the corresponding [Parser/pegen.c](../Parser/pegen.c). Also contains definitions of the `Parser` and `Token` structs. -* [Python/](https://github.com/python/cpython/blob/main/Python) +* [Python/](../Python) - * [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c): + * [Python/Python-ast.c](../Python/Python-ast.c): Creates C structs corresponding to the ASDL types. Also contains code for marshalling AST nodes (core ASDL types have marshalling code in - [Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c)). - File automatically generated by - [Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py). + [Python/asdl.c](../Python/asdl.c)). + File automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py). This file must be committed separately after every grammar change is committed since the `__version__` value is set to the latest grammar change revision number. - * [Python/asdl.c](https://github.com/python/cpython/blob/main/Python/asdl.c): + * [Python/asdl.c](../Python/asdl.c): Contains code to handle the ASDL sequence type. Also has code to handle marshalling the core ASDL types, such as number - and identifier. Used by - [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c) + and identifier. Used by [Python/Python-ast.c](../Python/Python-ast.c) for marshalling AST nodes. - * [Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c): + * [Python/ast.c](../Python/ast.c): Used for validating the AST. - * [Python/ast_opt.c](https://github.com/python/cpython/blob/main/Python/ast_opt.c): + * [Python/ast_opt.c](../Python/ast_opt.c): Optimizes the AST. - * [Python/ast_unparse.c](https://github.com/python/cpython/blob/main/Python/ast_unparse.c): + * [Python/ast_unparse.c](../Python/ast_unparse.c): Converts the AST expression node back into a string (for string annotations). - * [Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c): + * [Python/ceval.c](../Python/ceval.c): Executes byte code (aka, eval loop). - * [Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c): + * [Python/symtable.c](../Python/symtable.c): Generates a symbol table from AST. - * [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c): + * [Python/pyarena.c](../Python/pyarena.c): Implementation of the arena memory manager. - * [Python/compile.c](https://github.com/python/cpython/blob/main/Python/compile.c): + * [Python/compile.c](../Python/compile.c): Emits pseudo bytecode based on the AST. - * [Python/flowgraph.c](https://github.com/python/cpython/blob/main/Python/flowgraph.c): + * [Python/flowgraph.c](../Python/flowgraph.c): Implements peephole optimizations. - * [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c): + * [Python/assemble.c](../Python/assemble.c): Constructs a code object from a sequence of pseudo instructions. - * [Python/instruction_sequence.c](https://github.com/python/cpython/blob/main/Python/instruction_sequence.c): + * [Python/instruction_sequence.c](../Python/instruction_sequence.c): A data structure representing a sequence of bytecode-like pseudo-instructions. -* [Include/](https://github.com/python/cpython/blob/main/Include/) +* [Include/](../Include/) - * [Include/cpython/code.h](https://github.com/python/cpython/blob/main/Include/cpython/code.h) - : Header file for - [Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c); + * [Include/cpython/code.h](../Include/cpython/code.h) + : Header file for [Objects/codeobject.c](../Objects/codeobject.c); contains definition of `PyCodeObject`. - * [Include/opcode.h](https://github.com/python/cpython/blob/main/Include/opcode.h) - : One of the files that must be modified if - [Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py) is. + * [Include/opcode.h](../Include/opcode.h) + : One of the files that must be modified whenever + [Lib/opcode.py](../Lib/opcode.py) is. - * [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h) + * [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h) : Contains the actual definitions of the C structs as generated by - [Python/Python-ast.c](https://github.com/python/cpython/blob/main/Python/Python-ast.c) - Automatically generated by - [Parser/asdl_c.py](https://github.com/python/cpython/blob/main/Parser/asdl_c.py). + [Python/Python-ast.c](../Python/Python-ast.c). + Automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py). - * [Include/internal/pycore_asdl.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_asdl.h) - : Header for the corresponding - [Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c). + * [Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h) + : Header for the corresponding [Python/ast.c](../Python/ast.c). - * [Include/internal/pycore_ast.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_ast.h) - : Declares `_PyAST_Validate()` external (from - [Python/ast.c](https://github.com/python/cpython/blob/main/Python/ast.c)). + * [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h) + : Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)). - * [Include/internal/pycore_symtable.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_symtable.h) - : Header for - [Python/symtable.c](https://github.com/python/cpython/blob/main/Python/symtable.c). + * [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h) + : Header for [Python/symtable.c](../Python/symtable.c). `struct symtable` and `PySTEntryObject` are defined here. - * [Include/internal/pycore_parser.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_parser.h) - : Header for the corresponding - [Parser/peg_api.c](https://github.com/python/cpython/blob/main/Parser/peg_api.c). + * [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h) + : Header for the corresponding [Parser/peg_api.c](../Parser/peg_api.c). - * [Include/internal/pycore_pyarena.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_pyarena.h) - : Header file for the corresponding - [Python/pyarena.c](https://github.com/python/cpython/blob/main/Python/pyarena.c). + * [Include/internal/pycore_pyarena.h](../Include/internal/pycore_pyarena.h) + : Header file for the corresponding [Python/pyarena.c](../Python/pyarena.c). - * [Include/opcode_ids.h](https://github.com/python/cpython/blob/main/Include/opcode_ids.h) - : List of opcodes. Generated from - [Python/bytecodes.c](https://github.com/python/cpython/blob/main/Python/bytecodes.c) + * [Include/opcode_ids.h](../Include/opcode_ids.h) + : List of opcodes. Generated from [Python/bytecodes.c](../Python/bytecodes.c) by - [Tools/cases_generator/opcode_id_generator.py](https://github.com/python/cpython/blob/main/Tools/cases_generator/opcode_id_generator.py). + [Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py). -* [Objects/](https://github.com/python/cpython/blob/main/Objects/) +* [Objects/](../Objects/) - * [Objects/codeobject.c](https://github.com/python/cpython/blob/main/Objects/codeobject.c) + * [Objects/codeobject.c](../Objects/codeobject.c) : Contains PyCodeObject-related code. - * [Objects/frameobject.c](https://github.com/python/cpython/blob/main/Objects/frameobject.c) + * [Objects/frameobject.c](../Objects/frameobject.c) : Contains the `frame_setlineno()` function which should determine whether it is allowed to make a jump between two points in a bytecode. -* [Lib/](https://github.com/python/cpython/blob/main/Lib/) +* [Lib/](../Lib/) - * [Lib/opcode.py](https://github.com/python/cpython/blob/main/Lib/opcode.py) + * [Lib/opcode.py](../Lib/opcode.py) : opcode utilities exposed to Python. - * [Include/core/pycore_magic_number.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_magic_number.h) + * [Include/core/pycore_magic_number.h](../Include/internal/pycore_magic_number.h) : Home of the magic number (named `MAGIC_NUMBER`) for bytecode versioning. @@ -625,7 +595,7 @@ Objects * [Locations](locations.md): Describes the location table * [Frames](frames.md): Describes frames and the frame stack -* [Objects/object_layout.md](https://github.com/python/cpython/blob/main/Objects/object_layout.md): Describes object layout for 3.11 and later +* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later * [Exception Handling](exception_handling.md): Describes the exception table diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index ddbc25a07d3aec..14066a5864b4da 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -68,8 +68,7 @@ Handling Exceptions ------------------- At runtime, when an exception occurs, the interpreter calls -`get_exception_handler()` in -[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c) +`get_exception_handler()` in [Python/ceval.c](../Python/ceval.c) to look up the offset of the current instruction in the exception table. If it finds a handler, control flow transfers to it. Otherwise, the exception bubbles up to the caller, and the caller's frame is @@ -78,8 +77,7 @@ repeats until a handler is found or the topmost frame is reached. If no handler is found, then the interpreter function (`_PyEval_EvalFrameDefault()`) returns NULL. During unwinding, the traceback is constructed as each frame is added to it by -`PyTraceBack_Here()`, which is in -[Python/traceback.c](https://github.com/python/cpython/blob/main/Python/traceback.c). +`PyTraceBack_Here()`, which is in [Python/traceback.c](../Python/traceback.c). Along with the location of an exception handler, each entry of the exception table also contains the stack depth of the `try` instruction @@ -175,13 +173,11 @@ which is then encoded as: for a total of five bytes. The code to construct the exception table is in `assemble_exception_table()` -in [Python/assemble.c](https://github.com/python/cpython/blob/main/Python/assemble.c). +in [Python/assemble.c](../Python/assemble.c). The interpreter's function to lookup the table by instruction offset is -`get_exception_handler()` in -[Python/ceval.c](https://github.com/python/cpython/blob/main/Python/ceval.c). -The Python function `_parse_exception_table()` in -[Lib/dis.py](https://github.com/python/cpython/blob/main/Lib/dis.py) +`get_exception_handler()` in [Python/ceval.c](../Python/ceval.c). +The Python function `_parse_exception_table()` in [Lib/dis.py](../Lib/dis.py) returns the exception table content as a list of namedtuple instances. Exception Chaining Implementation @@ -190,6 +186,6 @@ Exception Chaining Implementation [Exception chaining](https://docs.python.org/dev/tutorial/errors.html#exception-chaining) refers to setting the `__context__` and `__cause__` fields of an exception as it is being raised. The `__context__` field is set by `_PyErr_SetObject()` in -[Python/errors.c](https://github.com/python/cpython/blob/main/Python/errors.c) -(which is ultimately called by all `PyErr_Set*()` functions). -The `__cause__` field (explicit chaining) is set by the `RAISE_VARARGS` bytecode. +[Python/errors.c](../Python/errors.c) (which is ultimately called by all +`PyErr_Set*()` functions). The `__cause__` field (explicit chaining) is set by +the `RAISE_VARARGS` bytecode. diff --git a/InternalDocs/frames.md b/InternalDocs/frames.md index 8bc0f145ad29de..06dc8f0702c3d9 100644 --- a/InternalDocs/frames.md +++ b/InternalDocs/frames.md @@ -11,19 +11,18 @@ of three conceptual sections: previous frame, etc. The definition of the `_PyInterpreterFrame` struct is in -[Include/internal/pycore_frame.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_frame.h). +[Include/internal/pycore_frame.h](../Include/internal/pycore_frame.h). # Allocation Python semantics allows frames to outlive the activation, so they need to be allocated outside the C call stack. To reduce overhead and improve locality of reference, most frames are allocated contiguously in a per-thread stack -(see `_PyThreadState_PushFrame` in -[Python/pystate.c](https://github.com/python/cpython/blob/main/Python/pystate.c)). +(see `_PyThreadState_PushFrame` in [Python/pystate.c](../Python/pystate.c)). Frames of generators and coroutines are embedded in the generator and coroutine objects, so are not allocated in the per-thread stack. See `PyGenObject` in -[Include/internal/pycore_genobject.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_genobject.h). +[Include/internal/pycore_genobject.h](../Include/internal/pycore_genobject.h). ## Layout @@ -90,8 +89,7 @@ frames on the per-thread stack via the linkage fields. If a frame object associated with a generator outlives the generator, then the embedded `_PyInterpreterFrame` is copied into the frame object (see -`take_ownership()` in -[Python/frame.c](https://github.com/python/cpython/blob/main/Python/frame.c)). +`take_ownership()` in [Python/frame.c](../Python/frame.c)). ### Field names diff --git a/InternalDocs/parser.md b/InternalDocs/parser.md index 37e5ca95810f9a..6398ba6cd2838f 100644 --- a/InternalDocs/parser.md +++ b/InternalDocs/parser.md @@ -14,7 +14,7 @@ the original [`LL(1)`](https://en.wikipedia.org/wiki/LL_parser) parser. The code implementing the parser is generated from a grammar definition by a [parser generator](https://en.wikipedia.org/wiki/Compiler-compiler). Therefore, changes to the Python language are made by modifying the -[grammar file](https://github.com/python/cpython/blob/main/Grammar/python.gram). +[grammar file](../Grammar/python.gram). Developers rarely need to modify the generator itself. See the devguide's [Changing CPython's grammar](https://devguide.python.org/developer-workflow/grammar/#grammar) @@ -422,21 +422,19 @@ Pegen Pegen is the parser generator used in CPython to produce the final PEG parser used by the interpreter. It is the program that can be used to read the python -grammar located in -[`Grammar/python.gram`](https://github.com/python/cpython/blob/main/Grammar/python.gram) -and produce the final C parser. It contains the following pieces: +grammar located in [`Grammar/python.gram`](../Grammar/python.gram) and produce +the final C parser. It contains the following pieces: - A parser generator that can read a grammar file and produce a PEG parser written in Python or C that can parse said grammar. The generator is located at - [`Tools/peg_generator/pegen`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen). + [`Tools/peg_generator/pegen`](../Tools/peg_generator/pegen). - A PEG meta-grammar that automatically generates a Python parser which is used for the parser generator itself (this means that there are no manually-written parsers). The meta-grammar is located at - [`Tools/peg_generator/pegen/metagrammar.gram`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/metagrammar.gram). + [`Tools/peg_generator/pegen/metagrammar.gram`](../Tools/peg_generator/pegen/metagrammar.gram). - A generated parser (using the parser generator) that can directly produce C and Python AST objects. -The source code for Pegen lives at -[`Tools/peg_generator/pegen`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen) +The source code for Pegen lives at [`Tools/peg_generator/pegen`](../Tools/peg_generator/pegen) but normally all typical commands to interact with the parser generator are executed from the main makefile. @@ -457,15 +455,14 @@ use the Visual Studio project files to regenerate the parser or to execute: ./PCbuild/build.bat --regen ``` -The generated parser file is located at -[`Parser/parser.c`](https://github.com/python/cpython/blob/main/Parser/parser.c). +The generated parser file is located at [`Parser/parser.c`](../Parser/parser.c). How to regenerate the meta-parser --------------------------------- The meta-grammar (the grammar that describes the grammar for the grammar files themselves) is located at -[`Tools/peg_generator/pegen/metagrammar.gram`](https://github.com/python/cpython/blob/main/Tools/peg_generator/pegen/metagrammar.gram). +[`Tools/peg_generator/pegen/metagrammar.gram`](../Tools/peg_generator/pegen/metagrammar.gram). Although it is very unlikely that you will ever need to modify it, if you make any modifications to this file (in order to implement new Pegen features) you will need to regenerate the meta-parser (the parser that parses the grammar files). @@ -491,7 +488,7 @@ Pegen has some special grammatical elements and rules: - Strings with single quotes (') (for example, `'class'`) denote KEYWORDS. - Strings with double quotes (") (for example, `"match"`) denote SOFT KEYWORDS. - Uppercase names (for example, `NAME`) denote tokens in the - [`Grammar/Tokens`](https://github.com/python/cpython/blob/main/Grammar/Tokens) file. + [`Grammar/Tokens`](../Grammar/Tokens) file. - Rule names starting with `invalid_` are used for specialized syntax errors. - These rules are NOT used in the first pass of the parser. @@ -515,8 +512,7 @@ dealing with encoding, interactive mode and much more. Some of these reasons are also there for historical purposes, and some others are useful even today. The list of tokens (all uppercase names in the grammar) that you can use can -be found in thei -[`Grammar/Tokens`](https://github.com/python/cpython/blob/main/Grammar/Tokens) +be found in the [`Grammar/Tokens`](../Grammar/Tokens) file. If you change this file to add new tokens, make sure to regenerate the files by executing: @@ -532,9 +528,7 @@ the tokens or to execute: ``` How tokens are generated and the rules governing this are completely up to the tokenizer -([`Parser/lexer`](https://github.com/python/cpython/blob/main/Parser/lexer) -and -[`Parser/tokenizer`](https://github.com/python/cpython/blob/main/Parser/tokenizer)); +([`Parser/lexer`](../Parser/lexer) and [`Parser/tokenizer`](../Parser/tokenizer)); the parser just receives tokens from it. Memoization @@ -567,8 +561,7 @@ To determine whether a new rule needs memoization or not, benchmarking is requir (comparing execution times and memory usage of some considerably large files with and without memoization). There is a very simple instrumentation API available in the generated C parse code that allows to measure how much each rule uses -memoization (check the -[`Parser/pegen.c`](https://github.com/python/cpython/blob/main/Parser/pegen.c) +memoization (check the [`Parser/pegen.c`](../Parser/pegen.c) file for more information) but it needs to be manually activated. Automatic variables @@ -731,7 +724,7 @@ acts in two phases: > (see the [how PEG parsers work](#how-peg-parsers-work) section for more information). You can find a collection of macros to raise specialized syntax errors in the -[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h) +[`Parser/pegen.h`](../Parser/pegen.h) header file. These macros allow also to report ranges for the custom errors, which will be highlighted in the tracebacks that will be displayed when the error is reported. @@ -764,17 +757,15 @@ Generating AST objects ---------------------- The output of the C parser used by CPython, which is generated from the -[grammar file](https://github.com/python/cpython/blob/main/Grammar/python.gram), -is a Python AST object (using C structures). This means that the actions in the -grammar file generate AST objects when they succeed. Constructing these objects -can be quite cumbersome (see the [AST compiler section](compiler.md#abstract-syntax-trees-ast) +[grammar file](../Grammar/python.gram), is a Python AST object (using C +structures). This means that the actions in the grammar file generate AST +objects when they succeed. Constructing these objects can be quite cumbersome +(see the [AST compiler section](compiler.md#abstract-syntax-trees-ast) for more information on how these objects are constructed and how they are used by the compiler), so special helper functions are used. These functions are -declared in the -[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h) -header file and defined in the -[`Parser/action_helpers.c`](https://github.com/python/cpython/blob/main/Parser/action_helpers.c) -file. The helpers include functions that join AST sequences, get specific elements +declared in the [`Parser/pegen.h`](../Parser/pegen.h) header file and defined +in the [`Parser/action_helpers.c`](../Parser/action_helpers.c) file. The +helpers include functions that join AST sequences, get specific elements from them or to perform extra processing on the generated tree. @@ -788,11 +779,9 @@ from them or to perform extra processing on the generated tree. As a general rule, if an action spawns multiple lines or requires something more complicated than a single expression of C code, is normally better to create a -custom helper in -[`Parser/action_helpers.c`](https://github.com/python/cpython/blob/main/Parser/action_helpers.c) -and expose it in the -[`Parser/pegen.h`](https://github.com/python/cpython/blob/main/Parser/pegen.h) -header file so that it can be used from the grammar. +custom helper in [`Parser/action_helpers.c`](../Parser/action_helpers.c) +and expose it in the [`Parser/pegen.h`](../Parser/pegen.h) header file so that +it can be used from the grammar. When parsing succeeds, the parser **must** return a **valid** AST object. @@ -801,16 +790,15 @@ Testing There are three files that contain tests for the grammar and the parser: -- [test_grammar.py](https://github.com/python/cpython/blob/main/Lib/test/test_grammar.py) -- [test_syntax.py](https://github.com/python/cpython/blob/main/Lib/test/test_syntax.py) -- [test_exceptions.py](https://github.com/python/cpython/blob/main/Lib/test/test_exceptions.py) +- [test_grammar.py](../Lib/test/test_grammar.py) +- [test_syntax.py](../Lib/test/test_syntax.py) +- [test_exceptions.py](../Lib/test/test_exceptions.py) -Check the contents of these files to know which is the best place for new tests, depending -on the nature of the new feature you are adding. +Check the contents of these files to know which is the best place for new +tests, depending on the nature of the new feature you are adding. Tests for the parser generator itself can be found in the -[test_peg_generator](https://github.com/python/cpython/blob/main/Lib/test_peg_generator) -directory. +[test_peg_generator](../Lib/test_peg_generator) directory. Debugging generated parsers @@ -825,8 +813,7 @@ correctly compile and execute Python anymore. This makes it a bit challenging to debug when something goes wrong, especially when experimenting. For this reason it is a good idea to experiment first by generating a Python -parser. To do this, you can go to the -[Tools/peg_generator](https://github.com/python/cpython/blob/main/Tools/peg_generator) +parser. To do this, you can go to the [Tools/peg_generator](../Tools/peg_generator) directory on the CPython repository and manually call the parser generator by executing: ``` @@ -849,9 +836,9 @@ Verbose mode When Python is compiled in debug mode (by adding `--with-pydebug` when running the configure step in Linux or by adding `-d` when calling the -[PCbuild/build.bat](https://github.com/python/cpython/blob/main/PCbuild/build.bat)), -it is possible to activate a **very** verbose mode in the generated parser. This -is very useful to debug the generated parser and to understand how it works, but it +[PCbuild/build.bat](../PCbuild/build.bat)), it is possible to activate a +**very** verbose mode in the generated parser. This is very useful to +debug the generated parser and to understand how it works, but it can be a bit hard to understand at first. > [!NOTE] @@ -891,4 +878,5 @@ is being attempted. > **Document history** > > Pablo Galindo Salgado - Original author +> > Irit Katriel and Jacob Coffee - Convert to Markdown