Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-compute unicode category list for xclasses #647

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zherczeg
Copy link
Collaborator

This patch moves a jit optimization to compiler optimization. The unicode categories (there are 30 of them) are combined into a 32 bit bitset, and stored in xclass, instead of a list of properties. This 32 bits, if present, follows the bitset for the first 256 characters. If the value of categories is 0, it is not stored in the xclass.

The patch is working, but the debug (/B) output is changed for 8-bit ucp case, because the bitset is stored in cranges, and the 8 bit ucp has no cranges.

When all 30 category bits are present, the xclass is converted to allany or nothing (negated case).

What do you think about this optimization?

@zherczeg
Copy link
Collaborator Author

Note: this patch can wait after the next release.

@NWilson
Copy link
Member

NWilson commented Dec 26, 2024

I like the idea. I'll have to think about it a bit, but it seems OK in principle.

Nice idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants