-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Atom<'a>
and CompactString
nice and fast
#46
Comments
I'm actually not sure how much of an optimization inlining will be. The main advantage of Maybe it will still be a solid gain due to cache locality and faster string comparison, but I think we'd need to measure to know either way. That could inform what ideal size is - make it bigger to inline more often vs smaller to reduce size of AST. |
After some consideration, I believe we will be satisfied with the current implementation for a significant period. Although it's tempting to pursue further optimization, we should direct our energy toward endeavors with broader impact.
|
OK cool. I do have a WIP already, so may finish it up when I get a chance, but yeah, the main problems (memory leak and unsoundness) are now solved. |
Reopening since you have a WIP. |
I think I got what you mean now.
Smart. |
Yes,
So, yes, it's an O(1) operation (and a very cheap one) converting an inline However, if the string is longer than 16 bytes, conversion is O(n) because the string content will need to be copied to general heap, so that But my guess is that the majority of strings will be 16 bytes or less (because they are predominantly identifiers, which tend to be quite short in JS), so it'll be O(1) mostly. If OXC transformer/semantic/etc can just use |
First step towards #2516. This replaces `compact_str::CompactString` with an immutable interface `CompactStr`. Currently just implemented as a wrapper around `CompactString` which hides all its mutation methods. A more optimized implementation to follow, which shrinks size of `CompactStr` to 16 bytes by removing the `capacity` field. The rationale for the change of name is: `CompactString` is like `String` in that it's mutable. `CompactStr` is more like `str` - immutable - so its name mirrors `str`.
I'd like to mention that:
|
@hyf0 Just to expand on what you've said:
"Inline" means any string 16 bytes or less. The "n" in "O(n)" is the length of the string. Only difference between It is unavoidable that there's a cost to copying strings from general heap to arena, or in the other direction - and that cost increases the longer the string is. The inline optimization is intended to ameloriate this cost for the common case - strings under 16 bytes - which I imagine covers most JS identifiers (but please tell me if your experience shows otherwise). OXC's AST cannot contain types which own heap allocations, due its use of an arena allocator (discussed on rolldown/rolldown#427). For consumers who really want to maximize performance, the best route is to work with NB: The above refers to |
@hyf0 Just to add: The use of an arena allocator in OXC, and all the lifetimes it necessitates, is an unusual pattern in Rust. Personally I see it as a feature not a bug - it's extremely performant, but beyond that, OXC essentially managing its own memory also enables some really significant things which aren't otherwise possible. However, at present it can be really painful for the consumer. I think we could better document patterns for interfacing with the allocator and AST which take advantage of it, and make the lifetimes (almost) painless. Such patterns do exist, but they're not immediately obvious. So in short, from what I understand, the problem that you're facing is very real. But there may be a different solution from the one you're looking for with |
I was wondering if we could make the string type generic like Use Use |
This is not my decision to make, but my view is this isn't a great solution for 2 reasons: Firstly, it's not just This would be very unergonomic, and would make the API pretty horrible for other external consumers of OXC. So much so, that the solution I've proposed for handling different Secondly, if your motivator is performance, my guess is that it wouldn't help with that. Yes, the string type wouldn't require conversions, so that's a gain, but on the other side, the whole AST would have to become It seems I have ended up being the guy who just says "no" all the time! That's not my nature, and it gives me no pleasure. I totally get why you're looking at every angle to find a solution, but it's just that I don't think this circle can be squared in the way you want. It's a fundamental limitation/feature/trade-off in the nature of the arena allocator which OXC uses. More positively, can I suggest:
I'm afraid I wouldn't be able to do either of those things in the near future, but if we can put this on the back burner for a while, I'd be very happy to do that. In meantime, if you have time, could you possibly outline in a bit more detail exactly what problem you're hitting? |
Cool. I wasn't saying we have to do this way but just tried giving any thought for inspiration.
Yeah it may be. But since we are already dealing with lifetime. I guess if we do it this way, it's just another dx problem that need to be handled. For example, most of users would handle the default ast type like Again, These are thoughts just for inspiration, not we have to do it.
Yeah. You could see If the costing is on-demand(only when you used owed-string type) and the empty
Totally, I'm not hurry for this but try to have collision of ideas, it's hard for me to find people to discuss such things.
This would be wonderful. Since this is not an urgent problem, we could wait until the proper timing. Like your actions of pursuing perfection on oxc. I learn a lot of practical knowledge on rust from that. |
OK, let's leave the broader issue of owned/borrowed strings for time being and come back to it when we both have time to really get into the discussion. Just to come back on a couple of points briefly:
Yes, you're right. But what I'm saying is that in your use case, using the owned string type variant, you would have a
I have the same problem! Always very happy to discuss.
OK great. Agreed. By the way, it's very kind of you to say that. In return, congratulations on getting Rolldown in shape to be be open sourced in such a short time. Brilliant work. |
Not working on this at present, but just wanted to make a note of a thought I had, before it gets lost in the ether... I am wondering if we should have two different strings types:
It strikes me that the 2 are used in quite different ways. Identifiers are constantly being compared to each other to calculate scopes etc, and tend to be small. Other strings (e.g. string literals) can be much larger, and are generally just moved around rather than compared. So maybe we could have 2 implementations optimized for the 2 different use cases. In particular fast hashing is likely a priority for identifiers, inlining is probably worthwhile, and maybe even interning. But all these things would be much less useful for string literals etc, so maybe they should just be a wrapper around If we did go with interning, maybe identifier strings could even be stored in a separate long-lived arena. So individual ASTs can come and go, but the identifiers stay. Maybe this could help with @hyf0's use case, even in the presence of Rolldown's incremental compilation. Obviously, multi-threading also comes into play, so this isn't simple. But maybe... Just a thought... |
Good thought! Sounds like an advanced version of https://github.com/rust-analyzer/smol_str. |
A thought to come back to later:
|
This is interesting: xacrimon/dashmap#287 We are commonly hashing |
Further thought: I think we need a separate type
Anatomy of an
|
Also look into this for string comparisons: https://crates.io/crates/fastcmp Unclear how much faster than |
This is a continuation of
Atom
toAtom<'a>
to make it safe oxc#2497Atom
safe and faster oxc#2295Where the requirements are:
Make our own version of
Atom<'a>
andCompactString
.These 2 types should be immutable and inline-able. The size of these two types (either 16 or 32 bytes) should be dependent on real world statistics of the length of identifiers.
The text was updated successfully, but these errors were encountered: