You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We might be able to detect UUIDs by giving them own lexer rules.
Things like 0xabg I've started to think should be flagged as "potentially bad hex"... maybe? I just flagged like anything else that doesn't match any lex pattern.
I'm starting to think the hex lexer should be something like 0x[0-9a-fA-F]+[0-9a-zA-Z] and then try to convert it to an int. If the conversion succeeds it's considered hex so not fagged. If the conversion fails the whole thing is flagged like any other unknown word. If it's a username or such the user can add it to the dictionary.
The one thing I'm not sure about is hex bigger than a u64, but I think that might be a problem with the number lexer atm too? Haven't checked...
I'm opening this issue to avoid future problems
please be careful in the way to detect things like these
String might contains generated string like a password or a uuid (like uid v7, uid v6 ksuid)
But then
Most libraries/tools like typo, codespell all have such detection.
There are tool/lib for such detection.
https://github.com/ccojocar/randdetect (a lib I know in Go)
You will have to dig a bit deeper.
Another way to detect them is to look for the string size, and look for space separator.
Some lib has minimum/maximum size of string parameter to avoid issue
This will require work, so time and iteration.
Important
my main concern is that the lib MUST add tests now to avoid regression if PRs are addressed for any feature but silently break random strings
The examples that I have shared, are of course not exhaustive
The text was updated successfully, but these errors were encountered: