-
-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: AST Traverse #2987
WIP: AST Traverse #2987
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging #2987 will not alter performanceComparing Summary
|
20b2746
to
df95968
Compare
@rzvxa So how is this working exactly? Does |
It would modify and validate our ast nodes and generate their appropriate traversable versions, mutation wrappers, etc. |
5cf4899
to
9736d4d
Compare
crates/oxc_ast/src/lib.rs
Outdated
@@ -58,15 +60,15 @@ fn size_asserts() { | |||
use crate::ast; | |||
use oxc_index::assert_eq_size; | |||
|
|||
assert_eq_size!(ast::Statement, [u8; 16]); | |||
assert_eq_size!(ast::Statement, [u8; 24]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's causing these increases exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the representation from rust to C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are possible workarounds to reverse the type size inflation, if we want to prioritize it. It does affect the benchmarks a little (-1% on parser benchmarks), but to be honest I expected it to be much worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was also expecting a hit in the performance; Not having any tangible performance drop is a sentiment to how well modern CPUs are in pre-fetching and pipelining the data.
5321829
to
1f96a8c
Compare
I've started going through this. So far just been tidying up a little. Have pushed a few commits to remove changes which aren't relevant to the thrust of this PR, and submitted a couple as separate PRs (#3026, #3027). Aim is to reduce the diff a little, and make it a bit easier to follow the substantive changes. Will start going through the meat of it soon as I can. |
1f96a8c
to
03d38b2
Compare
@rzvxa It's starting to come into focus for me now. I attempted to make it work without changing the AST further, but unfortunately can't find a way to avoid soundness issues without all enum variants being boxed, and other changes (my failed attempt to get My conclusion is: If there's a
Unfortunately, that will involve quite a lot of changes to the AST. NB: A variant on my failed attempt above should work in some cases, as long as the type is always in a fixed position in all other types (i.e. not within I've added a commit to use layout_inspect on the AST, which allows running queries on the AST types: node tasks/inspect_ast Do have a look if you're interested. Clocking off for tonight now, but will continue tomorrow. |
d2a9bab
to
c2480a5
Compare
PS: I also made some bug fixes and improvements to demo impl (the last 6 commits here: https://github.com/overlookmotel/ast-cell-test/commits/main/), which we should apply to this full implementation at some point. |
a2cdd65
to
481b301
Compare
28f1313
to
15ae0e7
Compare
ecb8159
to
4da2a3a
Compare
A lot of work went into this, and I would like to save it somewhere. It's possible we want to come back to this later. I'll save it in a branch and then will close again. |
OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (oxc-project#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), /* ...other Declaration variants... */ } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, /* ...other Statement variants... */ VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as oxc-project#2847, but here applied to *all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (oxc-project#2987). 2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as *pure data* (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (oxc-project#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not *required* to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is *fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```
Do you think we could close this for housekeeping reasons? It seems like this PR has drifted too far from our current trunk, So I guess what I'm trying to ask here is do you still believe that the code here can be useful at some point? |
Yes sorry, I requested it be kept, and then forgot about it. I've merged #3149 into this, and saved it on traversable-ast branch. I don't know for sure if we'll need it, but there was a lot of interesting stuff in this PR, so might want to go back to it/pull some stuff out if it later on. Closing this PR. |
No description provided.