Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

true-negative inforcement for random strings (either generated or username) #555

Open
ccoVeille opened this issue Feb 1, 2025 · 2 comments

Comments

@ccoVeille
Copy link
Contributor

ccoVeille commented Feb 1, 2025

I'm opening this issue to avoid future problems

please be careful in the way to detect things like these

String might contains generated string like a password or a uuid (like uid v7, uid v6 ksuid)

  • 0xabc is a valid hex
  • #abc too
  • #abg is not

But then

  • eU&1-#abg_KgCYdzR&nzN a random password containing #abg
  • 48e2b37f-22bb-0xbg-9d4c-0xabge5a22146 (here 0xbg is an invalid hex, 0xabge)
  • a base 64 or base62 random string could interfere too
  • 0xabg could be a valid GitHub user name, can be could in URL
  • leet speak: 0xf0rd rule is great Consider supporting leet speak #598

Most libraries/tools like typo, codespell all have such detection.

There are tool/lib for such detection.
https://github.com/ccojocar/randdetect (a lib I know in Go)
You will have to dig a bit deeper.

Another way to detect them is to look for the string size, and look for space separator.

Some lib has minimum/maximum size of string parameter to avoid issue

This will require work, so time and iteration.

Important

my main concern is that the lib MUST add tests now to avoid regression if PRs are addressed for any feature but silently break random strings

The examples that I have shared, are of course not exhaustive

@hippietrail
Copy link
Contributor

We might be able to detect UUIDs by giving them own lexer rules.
Things like 0xabg I've started to think should be flagged as "potentially bad hex"... maybe? I just flagged like anything else that doesn't match any lex pattern.

I'm starting to think the hex lexer should be something like 0x[0-9a-fA-F]+[0-9a-zA-Z] and then try to convert it to an int. If the conversion succeeds it's considered hex so not fagged. If the conversion fails the whole thing is flagged like any other unknown word. If it's a username or such the user can add it to the dictionary.

The one thing I'm not sure about is hex bigger than a u64, but I think that might be a problem with the number lexer atm too? Haven't checked...

@ccoVeille
Copy link
Contributor Author

Please understand I'm not looking for solutions right now. I'm sure solutions can be found.

My issue is to add right now a set of pseudo random strings to unit tests.

We can talk about your ideas, but I don't want the issue I created to divert from the need I raised as a warning.

Said otherwise, if you want to talk about solution for catching pseudo random string,
you should open another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants