Pre-compute unicode category list for xclasses #647

zherczeg · 2024-12-26T12:36:59Z

This patch moves a jit optimization to compiler optimization. The unicode categories (there are 30 of them) are combined into a 32 bit bitset, and stored in xclass, instead of a list of properties. This 32 bits, if present, follows the bitset for the first 256 characters. If the value of categories is 0, it is not stored in the xclass.

The patch is working, but the debug (/B) output is changed for 8-bit ucp case, because the bitset is stored in cranges, and the 8 bit ucp has no cranges.

When all 30 category bits are present, the xclass is converted to allany or nothing (negated case).

What do you think about this optimization?

zherczeg · 2024-12-26T12:37:24Z

Note: this patch can wait after the next release.

NWilson · 2024-12-26T23:34:30Z

I like the idea. I'll have to think about it a bit, but it seems OK in principle.

Nice idea.

Pre-compute unicode category list for xclasses

c537dcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-compute unicode category list for xclasses #647

Pre-compute unicode category list for xclasses #647

zherczeg commented Dec 26, 2024

zherczeg commented Dec 26, 2024

NWilson commented Dec 26, 2024

Pre-compute unicode category list for xclasses #647

Are you sure you want to change the base?

Pre-compute unicode category list for xclasses #647

Conversation

zherczeg commented Dec 26, 2024

zherczeg commented Dec 26, 2024

NWilson commented Dec 26, 2024