You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When having look at #369, it looks like having access to this primitive, with a fast C implementation that uses the known size of each char in utf-8 might be able to make isSubsequenceOf quite fast, particularly when the haystack is quite large and the needle chars are spread far apart.
intfindCharIndex(constchar*haystack, intlength, inttarget) {
char*offset;
if (target<128) {
offset=memchr(haystack, lengthtarget);
} else {
// Implementation left as an exercise to the reader, but it would look for// utf-8 encoded bytes for the Char.offset=efficientlyFindUTF8Codepoint(haystack, length, target);
}
if (offset==NULL) return-1;
return (int)(offset-haystack);
}
Having this also means we could add the rule
{-# RULES "findIndex Char" findIndex (c ==) = findCharIndex c #-}
{-# RULES "findIndex Char2" findIndex (== c) = findCharIndex c #-}
The text was updated successfully, but these errors were encountered:
Thinking about this a bit more, I think there's probably an efficient vectorised algorithm for each of the 2, 3 and 4 byte sequences... now I need to go and teach my self how to write those...
#c on libera.chat mentioned that there is a nonstandard function memmem which appears to be available on linux, macOS and FreeBSD, but not Windows.
Also, implementing this using memchr would be relatively easy, since knowing that the haystack is valid utf-8, and knowing that the first byte determines the length of the match, we could do something like
When having look at #369, it looks like having access to this primitive, with a fast C implementation that uses the known size of each char in utf-8 might be able to make
isSubsequenceOf
quite fast, particularly when the haystack is quite large and the needle chars are spread far apart.Having this also means we could add the rule
The text was updated successfully, but these errors were encountered: