-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New patterns & ML update #600
Conversation
@Samsung/credsweeper_maintainers , please give your opinion about the rules names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: rollback after Samsung/CredData#167
bencmark fails before Samsung/CredData#167 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comments below.
And next time, please separate PR by feature and request review.
As you know huge PR couldn't have good review.
credsweeper/credentials/line_data.py
Outdated
while self.value.endswith('}') and '{' in self.line[:self.value_start]: | ||
self.value = self.value[:-1] | ||
"""Parenthesis, curly and squared brackets may be caught in TOML format and bash. Simple clearing""" | ||
dirty = self.value and self.value[-1] in ['}', ']', ')'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dirty
is also good name but IMO it would be good to re-name with like as is_OOO
.
How about is_cleared
or is_clear
?
if "function" in line_data.value or self.PATTERN.search(line_data.value): | ||
if "function" in line_data.value or self.PATTERN.match(line_data.value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any FP cases that include function
keyword in the value?
I think checking function
keyword by search()
seems reasonable..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, there was FN case in random generated credentials like "12#@5(!)fs", so I refactored the pattern and applied match
to use the search from begin of a value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. still i can get it clearly.
Do you mean there is a random generated credential value which includes "function" keyword in it?
Like as "12#@5(!)function3w@91"?
If then i think search()
method working good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, i mean the second part of OR condition in the check only.
The function
check does not use regex, so i did not find any FP or FN for function
yet.
Otherside, "12#@5(!)fs" - or something like this was set as FN with the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. I just misunderstood about this.
But in some case like below, search()
method still more useful doesn't it?
myPassword = "this_is_not_password"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter does not return True for the line. Only if it will be
myPassword = this_is_not_password()
or
myPasswsord=password_function
*depends on file
Otherside, your code line means that a string is assigned for variable with name "password"
So, it may be a password. Even "1234" may be a password, but CredSweeper skips the sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay i agree. Let's keep the change then.
@@ -1,36 +0,0 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this filter has been removed?
There is no description for it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS_KEY_ID is filtered often :(
The feature was migrated to ML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VariableNotAllowedPatternCheck mention was added to PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see..
Is there a lot of patterns that do not allowed?
I think if we delegate this filter's role to ML, we will be harder to reasoning and debugging why.
So I'm worry about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter eliminated a chance for ML to report for a credential like:
PASSWORD_FOR_ID="Th@tHid$mYpDW"
and something sensetive data like AWS ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems filtering the credentials from ML will be better..
Okay let's move the role to ML.
f3241c5
to
330cc04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Description
Please include a summary of the change and which is fixed.
How has this been tested?
Please describe the tests that you ran to verify your changes.