true-negative inforcement for random strings (either generated or username) #555

ccoVeille · 2025-02-01T08:15:54Z

I'm opening this issue to avoid future problems

please be careful in the way to detect things like these

feat: Should check text in string literals as well as comments #544
bug: Hexadecimal numbers in code comments should not be flagged #543 (with feat(core): start support for hex numbers #553)

String might contains generated string like a password or a uuid (like uid v7, uid v6 ksuid)

0xabc is a valid hex
#abc too
#abg is not

But then

eU&1-#abg_KgCYdzR&nzN a random password containing #abg
48e2b37f-22bb-0xbg-9d4c-0xabge5a22146 (here 0xbg is an invalid hex, 0xabge)
a base 64 or base62 random string could interfere too
0xabg could be a valid GitHub user name, can be could in URL
leet speak: 0xf0rd rule is great Consider supporting leet speak #598

Most libraries/tools like typo, codespell all have such detection.

There are tool/lib for such detection.
https://github.com/ccojocar/randdetect (a lib I know in Go)
You will have to dig a bit deeper.

Another way to detect them is to look for the string size, and look for space separator.

Some lib has minimum/maximum size of string parameter to avoid issue

This will require work, so time and iteration.

Important

my main concern is that the lib MUST add tests now to avoid regression if PRs are addressed for any feature but silently break random strings

The examples that I have shared, are of course not exhaustive

hippietrail · 2025-02-01T11:50:36Z

We might be able to detect UUIDs by giving them own lexer rules.
Things like 0xabg I've started to think should be flagged as "potentially bad hex"... maybe? I just flagged like anything else that doesn't match any lex pattern.

I'm starting to think the hex lexer should be something like 0x[0-9a-fA-F]+[0-9a-zA-Z] and then try to convert it to an int. If the conversion succeeds it's considered hex so not fagged. If the conversion fails the whole thing is flagged like any other unknown word. If it's a username or such the user can add it to the dictionary.

The one thing I'm not sure about is hex bigger than a u64, but I think that might be a problem with the number lexer atm too? Haven't checked...

ccoVeille · 2025-02-01T21:10:30Z

Please understand I'm not looking for solutions right now. I'm sure solutions can be found.

My issue is to add right now a set of pseudo random strings to unit tests.

We can talk about your ideas, but I don't want the issue I created to divert from the need I raised as a warning.

Said otherwise, if you want to talk about solution for catching pseudo random string,
you should open another issue.

ccoVeille mentioned this issue Feb 5, 2025

feat(core): start support for hex numbers #553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

true-negative inforcement for random strings (either generated or username) #555

true-negative inforcement for random strings (either generated or username) #555

ccoVeille commented Feb 1, 2025 •

edited

Loading

hippietrail commented Feb 1, 2025

ccoVeille commented Feb 1, 2025

true-negative inforcement for random strings (either generated or username) #555

true-negative inforcement for random strings (either generated or username) #555

Comments

ccoVeille commented Feb 1, 2025 • edited Loading

hippietrail commented Feb 1, 2025

ccoVeille commented Feb 1, 2025

ccoVeille commented Feb 1, 2025 •

edited

Loading