Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce AST enum type sizes #3047

Closed
overlookmotel opened this issue Apr 21, 2024 · 6 comments
Closed

Reduce AST enum type sizes #3047

overlookmotel opened this issue Apr 21, 2024 · 6 comments
Assignees
Labels
A-ast Area - AST C-performance Category - Solution not expected to change functional behavior, only performance

Comments

@overlookmotel
Copy link
Contributor

overlookmotel commented Apr 21, 2024

In general, Oxc's AST types which are enums have all their variants boxed. e.g.:

pub enum Expression<'a> {
    BooleanLiteral(Box<'a, BooleanLiteral>),
    NullLiteral(Box<'a, NullLiteral>),
    /* ... */
}

The enum then only weighs 16 bytes, and is used unboxed in other types' fields:

pub struct UnaryExpression<'a> {
    pub argument: Expression<'a>,
    /* ... */
}

There are however some exceptions to this rule, where enums are larger than they need to be because not all variants are boxed, and some variants are larger than others. This results in less compact storage in the arena, due to excess padding.

Boxing the variants in these cases should save on memory. I imagine it may cost a little in performance in the parser due to extra calls to alloc, but in return traversing the AST should be cheaper, as the data is more compact, so less cache misses.

However, I am not 100% sure of that in all cases. What I think should be done for each type is listed below, but we should apply each change individually, and check benchmarks to ensure performance does not degrade.

The affected types are:

ImportDeclarationSpecifier

Variant sizes: 40, 40, 80
Unboxed variants: ImportSpecifier (struct), ImportDefaultSpecifier (struct), ImportNamespaceSpecifier (struct)
Dependents: Vec<ImportDeclarationSpecifier> in ImportDeclaration (struct)
=> Box all variants.

MemberExpression

Variant sizes: 48, 56, 56
Unboxed variants: ComputedMemberExpression (struct), StaticMemberExpression (struct), PrivateFieldExpression (struct)
Dependents: Box<MemberExpression> in Expression (enum), SimpleAssignmentTarget (enum), ChainElement (enum)
=> Not sure. Always used within box as another enum variant. Would be better to box variants and flatten this enum into parent enums, but that's tricky.

ModuleExportName

Variant sizes: 24, 24
Unboxed variants: IdentifierName (struct), StringLiteral (struct)
Dependents:

  • ImportSpecifier (struct)
  • Option<ModuleExportName> in ExportAllDeclaration (struct)
  • ExportDefaultDeclaration (struct)
  • ExportSpecifier (struct)

=> Not sure. Would be good if ExportAllDeclaration::exported could be reduced in size (it's currently Option<ModuleExportName>) but otherwise, fine to leave as is.

JSXAttributeName

Variant sizes: 8, 24
Unboxed variants: JSXIdentifier (struct)
Dependents: JSXAttribute (struct)
=> Probably box all variants.

JSXAttributeValue

Variant sizes: 8, 8, 24, 32
Unboxed variants: StringLiteral (struct), JSXExpressionContainer (struct)
Dependents: Option<JSXAttributeValue> in JSXAttribute
= Probably box all variants.

JSXChild

Variant sizes: 8, 8, 24, 24, 32
Unboxed variants: JSXText (struct), JSXExpressionContainer (struct), JSXSpreadChild (struct)
Dependents: Vec<JSXChild> in JSXElement (struct) and JSXFragment (struct)
=> Box all variants.

JSXElementName

Variant sizes: 8, 8, 24
Unboxed variants: JSXIdentifier (struct)
Dependents: JSXOpeningElement (struct), JSXClosingElement (struct)
=> Probably box all variants.

JSXMemberExpressionObject

Variant sizes: 8, 24
Unboxed variants: JSXIdentifier (struct)
Dependents: JSXMemberExpression (struct)
=> Probably box all variants.

TSEnumMemberName

Variant sizes: 16, 24, 24, 40
Unboxed variants: IdentifierName (struct), StringLiteral (struct), NumericLiteral (struct)
Dependents: TSEnumMember (struct)
=> Box all variants.

TSModuleReference

Variant sizes: 16, 32
Unboxed variants: TSExternalModuleReference (struct)
Dependents: Box<TSModuleReference> in TSImportEqualsDeclaration
=> Box all variants. Replace Box<TSModuleReference> with TSModuleReference in TSImportEqualsDeclaration.

TSTypePredicateName

Variant sizes: 8, 24
Unboxed variants: IdentifierName (struct), TSThisType (struct)
Dependents: TSTypePredicate (struct)
=> Box Identifier variant. Do not box This variant as only 8 bytes (just contains Span).

TSTypeQueryExprName

Variant sizes: 16, 88
Unboxed variants: TSImportType (struct)
Dependents: TSTypeQuery (struct)
=> Box TSImportType variant. Do not box TSTypeName variant, as is another enum.

@overlookmotel overlookmotel added the C-bug Category - Bug label Apr 21, 2024
@overlookmotel
Copy link
Contributor Author

Reason I raise this now is it will simplify #2987, but I believe these changes will be beneficial overall, so worthwhile doing regardless of whether #2987 is a goer or not.

@Boshen
Copy link
Member

Boshen commented Apr 21, 2024

Can you try MemberExpression first because it has the highest usage count.

@overlookmotel
Copy link
Contributor Author

Thanks for coming back. MemberExpression is on my list of "not sure if we should do this one". There's not a huge difference between the variant sizes for that one (48, 56, 56) so upside is fairly small, and only places it's used are Box<MemberExpression> in other enums. A better solution for this one would be to squash the enums (similar to #2847).

Could we start with one of the JSX types first where the savings are likely greater e.g. JSXAttributeValue or JSXChild? Which is the most commonly used of these JSX types?

@overlookmotel overlookmotel added C-performance Category - Solution not expected to change functional behavior, only performance A-ast Area - AST and removed C-bug Category - Bug labels Apr 21, 2024
@Boshen
Copy link
Member

Boshen commented Apr 21, 2024

Could we start with one of the JSX types first where the savings are likely greater e.g. JSXAttributeValue or JSXChild? Which is the most commonly used of these JSX types?

Sounds like a plan. More frequent usages to less.

Boshen pushed a commit that referenced this issue Apr 22, 2024
Part of #3047.

As with #3058, it's hard to interpret the benchmark results here. But in
this case I think it's easier to see from "first principles" that this
should be an improvement - `ImportSpecifier` is pretty massive (80
bytes) vs `ImportDefaultSpecifier` (40 bytes), and the latter (e.g.
`import React from 'react'`) is common in JS code.
Boshen pushed a commit that referenced this issue Apr 22, 2024
Box all enum variants for JSX types (`JSXAttributeName`,
`JSXAttributeValue`, `JSXChild`, `JSXElementName`,
`JSXMemberExpressionObject`). Part of #3047.

I'm not sure how to interpret the benchmark results. As I said on #3047:

> I imagine it may cost a little in performance in the parser due to
extra calls to `alloc`, but in return traversing the AST should be
cheaper, as the data is more compact, so less cache misses.

Sure enough, there is a small impact (1%) on the 2 parser benchmarks for
JSX files. However, the other benchmarks have too much noise in them to
see whether this is repaid in a speed up on transformer etc, especially
as the transformer benchmarks also include parsing.

What do you think @Boshen?
Boshen pushed a commit that referenced this issue Apr 22, 2024
Similar to #3058 and #3061 it is a continuation of #3047.

Handles these enum types:

> TSEnumMemberName
> Variant sizes: 16, 24, 24, 40
> Unboxed variants: IdentifierName (struct), StringLiteral (struct),
NumericLiteral (struct)
> Dependents: TSEnumMember (struct)
> => Box all variants.
>
> TSModuleReference 
> Variant sizes: 16, 32
> Unboxed variants: TSExternalModuleReference (struct)
> Dependents: Box<TSModuleReference> in TSImportEqualsDeclaration
> => Box all variants. Replace Box<TSModuleReference> with
TSModuleReference in TSImportEqualsDeclaration.
>
> TSTypePredicateName 
> Variant sizes: 8, 24
> Unboxed variants: IdentifierName (struct), TSThisType (struct)
> Dependents: TSTypePredicate (struct)
> => Box Identifier variant. Do not box This variant as only 8 bytes
(just contains Span).
>
> TSTypeQueryExprName 
> Variant sizes: 16, 88
> Unboxed variants: TSImportType (struct)
> Dependents: TSTypeQuery (struct)
> => Box TSImportType variant. Do not box TSTypeName variant, as is
another enum.
@Boshen
Copy link
Member

Boshen commented Apr 22, 2024

All done?

@overlookmotel
Copy link
Contributor Author

Yes. Thanks for merging all the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

No branches or pull requests

2 participants