Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST visitor with bottom-up reading WIP #3152

Closed
wants to merge 7 commits into from
Closed

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented May 2, 2024

AST traversal with ability to:

  1. Mutate downwards - mutate current node or children.
  2. Read upwards - read parents/ancestors.

NB: Unlike the cell-based approach (#2987) this does NOT allow mutating parents/ancestors, only reading them.

Please see last commit for an implementation of transform-react-display-name which uses this ability, and passes the final failing test for that plugin.

#2987 will work eventually, but it's proving a quagmire to implement for whole AST, so I thought I'd try this other approach to get something working faster. In any case, perhaps reading up is all we need, and #2987's ability to mutate parents is not actually required to implement transformers?

Notes

  • VisitMut is replaced by Traverse.
  • visit_* functions are replaced by enter_* and exit_*.
  • Visitors do not call walk_* themselves. walk_* drives the process, and calls enter_* and exit_* itself.
  • A side-effect of this is that multiple transforms will be easier to compose together (e.g. combine React + TS transforms into one).
  • Traverse trait, Ancestor and walk_* functions are generated by a codegen currently written in JS (for speed of implementation). We should rewrite it in Rust as a build script if we want to go forwards with this.
  • #[visited_node] attr acts as a marker for the codegen, telling it which types to generate visitors for. It can likely be removed by making the codegen smarter.

Performance

Performance is not great. -4% on transformer benchmarks. However, please note that transformer benchmarks run the whole process including parsing, and parsing takes up ~80% of the bench run time. Therefore a -4% drop on the transformer benchmark means that the transformer itself has got about 20% slower.

But...

  1. Implementation can probably be optimized to reduce this perf cost.
  2. Maybe it is worth it for the ability to walk up?
  3. We may be able to use the read-upwards ability to implement transforms more efficiently, and win that performance back.
  4. Actually I'm not sure if the measure is fair.

(4) is the big one. To explain:

I noticed that VisitMut doesn't actually fire visitors everywhere it should (I had to make changes to one of the TS transforms because previously VisitMut wasn't calling transform_identifier_reference for IdentifierReferences in TSInterfaceHeritage or TSTypeReference).

In comparison, Traverse is codegen-ed, so it performs visitation in a completely exhaustive fashion. Nothing gets missed.

So the majority of the perf hit may just be that it's visiting all the AST, where VisitMut was missing some - but that's a feature not a bug!

Draft status

This PR is not entirely complete. Still some stuff to fix around lifetimes, and would like to test it more exhaustively with Miri. But I think it's in a fit state for comments at this point.

@overlookmotel overlookmotel marked this pull request as draft May 2, 2024 15:09
@github-actions github-actions bot added A-ast Area - AST A-transformer Area - Transformer / Transpiler labels May 2, 2024
Copy link

codspeed-hq bot commented May 2, 2024

CodSpeed Performance Report

Merging #3152 will degrade performances by 3.5%

Comparing traverse-immut (2f14e2a) with main (f88f330)

Summary

❌ 1 regressions
✅ 29 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main traverse-immut Change
transformer[pdf.mjs] 105 ms 108.9 ms -3.5%

@overlookmotel overlookmotel force-pushed the traverse-immut branch 3 times, most recently from b5ee93c to ce282cc Compare May 3, 2024 11:02
@overlookmotel overlookmotel changed the title Immutable bottom-up traverse WIP AST visitor with bottom-up reading WIP May 3, 2024
@Boshen Boshen requested review from Boshen, milesj, rzvxa and Dunqing May 3, 2024 15:21
Copy link
Contributor

@rzvxa rzvxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following this PR while it was in progress so here are my initial thoughts on it.

I'm good with this one if we look at it as a stepping-stone toward the GCell version, Otherwise, I think the top-down is better than this Since while top-down approach is much more limited than having a 2-way traversal - either mutable or immutable - It is much cleaner.

About the performance drop, I'm sure we can make it better(not sure by how much). Still, one of the reasons we pursued this bottom-up approach was to lower the mispredicted branches and increase performance along with the API ergonomics. We have to stay focused on the initial goals(although we might want to compromise some of those goals just to push something out).

PS: I suggest that if we can extend our time window for this feature, We should do it. We may never be able to get it perfect first try, But more time in the oven can reduce the friction of refactoring in the future.

PS2: I believe even with a -20% performance we are still doing better than swc since we don't do multiple passes on the AST.

Copy link
Member

@Boshen Boshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel comfortable to work on this.

@Boshen Boshen requested review from mysteryven and magic-akari May 3, 2024 16:18
@Boshen
Copy link
Member

Boshen commented May 3, 2024

Requesting a review from everyone who is going to work on the transformer in the near future.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented May 3, 2024

Thanks both for reviewing.

Just to say, this is not ready to merge yet.

I think it's sound - Miri gives it a pass on the few conformance tests that I've tried - but I would like to test it much more exhaustively to make sure. This kind of subverts Rust's usual borrowing rules. I think it's bending the rules rather than breaking them, but I'm not 100% sure, and it needs more testing to be confident. There is a chance it may turn out not to be viable.

Also, some of the lifetimes I don't think are correct currently, which definitely needs addressing, or it could lead to illegal reference aliasing.

But, since there's some positive interest, I'll continue cleaning it up and hardening it next week.

@Boshen 3 questions:

  1. Would you be happy to merge even with the codegen written in JS? It should be rewritten in Rust as a build script, but not sure if that needs to happen now, or can be pushed until later.
  2. Do you have any thoughts on the missing visitors in VisitMut that I mentioned above? Is this intentional? Or just the peril of it being written by hand, and so inevitably containing some omissions?
  3. Are you not worried by the perf drop?

@Dunqing
Copy link
Member

Dunqing commented May 3, 2024

I noticed that VisitMut doesn't actually fire visitors everywhere it should (I had to make changes to one of the TS transforms because previously VisitMut wasn't calling transform_identifier_reference for IdentifierReferences in TSInterfaceHeritage or TSTypeReference).

Previously, VisitMut only added the transform_xxxxx that we needed since we will remove all ts-only code when codegen, we don't have to care about TSInterfaceHeritage and TSTypeReference.

Your change is correct, when I'm not sure here is necessary. Since the typescript plugin removes all ts-only code first, maybe no TSInterfaceHeritage or TSTypeReference will call transform_identifier_reference, but it doesn't matter, we can optimize it after the merge

@Boshen
Copy link
Member

Boshen commented May 4, 2024

Would you be happy to merge even with the codegen written in JS? It should be rewritten in Rust as a build script, but not sure if that needs to happen now, or can be pushed until later.

I'm fine with it, we can move it to a Rust task later. We can bikeshed this later.

Do you have any thoughts on the missing visitors in VisitMut that I mentioned above? Is this intentional? Or just the peril of it being written by hand, and so inevitably containing some omissions?

It was written by hand so had missing parts.

Are you not worried by the perf drop?

Nope, the transformer started out from not doing much work, I expect things to slow down a bit.

@overlookmotel overlookmotel force-pushed the traverse-immut branch 7 times, most recently from a1e0966 to 05ca458 Compare May 5, 2024 13:04
@overlookmotel
Copy link
Contributor Author

Previously, VisitMut only added the transform_xxxxx that we needed since we will remove all ts-only code when codegen, we don't have to care about TSInterfaceHeritage and TSTypeReference.

Your change is correct, when I'm not sure here is necessary. Since the typescript plugin removes all ts-only code first, maybe no TSInterfaceHeritage or TSTypeReference will call transform_identifier_reference, but it doesn't matter, we can optimize it after the merge

Thanks for coming back @Dunqing. Ah ha that makes sense. I suppose it's possible for some future use cases you might want to keep TS syntax present in AST - it could be useful for optimizations in a minifier, for example. But for now, yes, there's no point. As you say, we could hash that out a bit later.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented May 5, 2024

OK, I think this is good to go now.

However, I don't think this PR should be merged as is. I've submitted the first part as #3169 - just the oxc_traverse crate by itself, without applying it to the transformer. In my opinion, that could be merged now, and then we can iterate to add scopes to it before merging the rest.

I wanted to split it up and submit all the parts as a Graphite stack, but don't seem to be able to use Graphite now - it sends me to a "billing" page. @Boshen did they start charging open source?

Changes since previous version

Soundness

There were indeed some soundness issues, but have ironed them out. In truth, there may be more - there's some code I'm not completely sure of - but we'll find out once we are using it and have a wide range of usage to run Miri on. If it does turn out there is still some unsoundness, I don't think it's a show-stopper, as I'm pretty confident that I know how to fix it.

Docs

I've added lengthy doc comments explaining how the scheme works. The implementation is sprawling, but the core concept is quite simple, so it should be comprehensible. If it's not, then that's my fault for not writing clearly enough. Please let me know whether what I've written makes sense or not, and I'll try again if it doesn't.

Crates

It's a lot of code, so I've moved it into it's own crate oxc_traverse, so as not to slow down compilation for everything else which depends on oxc_ast but doesn't need traversal.

I've also made a separate crate oxc_ast_macros for the #[visited_node] macro. That macro is now a no-op - it just passes through the input stream unchanged, so doesn't depend on syn. This removes the need for oxc_ast to depend on oxc_macros.

@overlookmotel
Copy link
Contributor Author

By the way, I realized today that this scheme can be extended without too much effort to also allow mutating upwards (with restriction that you can't mutate parent, only siblings).

But I'd like to mull that a bit more, and do it as a follow-up.

Boshen pushed a commit that referenced this pull request May 6, 2024
First part of #3152.

This adds a crate `oxc_traverse`, but doesn't connect it up to the
transformer or anything else yet.

I think we could merge this now - as it doesn't affect any code that's
in use - and then iterate on it to add scopes before using it in
transformer. Please see
#3152 (comment) for
the broader picture.
@overlookmotel
Copy link
Contributor Author

Superceded by #3169, #3182 and #3183 which contain same commits as this PR, but split up into a Graphite stack.

@overlookmotel overlookmotel deleted the traverse-immut branch May 6, 2024 18:47
Dunqing pushed a commit that referenced this pull request May 8, 2024
Sliced off from #3152.

This switches the transformer over to use `Traverse` instead of
`VisitMut`.

This is incomplete - scopes are not implemented yet. At present, no
transforms use scopes anyway, so all tests pass, but regardless I don't
think should be merged until the implementation is complete.
Dunqing pushed a commit that referenced this pull request May 8, 2024
…up lookup (#3183)

Sliced off from #3152.

Re-implement `transform-react-display-name` using bottom-up lookup
provided by `Traverse` trait.

This fixes the 1 remaining failing test case for this plugin (see
#2937).

`Traverse` is not complete yet (see #3182), so this is also not ready to
merge yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST A-transformer Area - Transformer / Transpiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants