Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you please check whether the SemiComplexPolicy is correctly implemented? #11

Open
m33x opened this issue Jun 3, 2017 · 1 comment

Comments

@m33x
Copy link

m33x commented Jun 3, 2017

I fear SemiComplexPolicy and SemiComplexPolicyLowercase need to be patched.
Depending on how one has measured and implemented it, the results for 3class12 in the paper might be unreliable. Please check, whether or not I'm wrong here.

I think you will need a not in front of if self.all_from_group(pwd, self.non_symbols):

Please have a look here
https://github.com/cupslab/neural_network_cracking/blob/master/pwd_guess.py#L2069
and here
https://github.com/cupslab/neural_network_cracking/blob/master/pwd_guess.py#L2085


A better fix would be to stop using "if not non_symbols" but instead use "if symbols".

You can generate symbols like this

non_symbols = set(string.digits + string.ascii_uppercase + string.ascii_lowercase)
# ASCII-95 == 0x20 to 0x7E (32 to 126 decimal)
# Note: using string.punctuation + string.whitespace or string.printable is not the same.
# Instead we use the characters as defined here https://en.wikibooks.org/wiki/C%2B%2B_Programming/ASCII
all_95 = set([chr(i) for i in range(32,127)])
symbols = filter(lambda x:x not in non_symbols, all_95)

this way you simply stop using the function all_from_groupand replace it with a standard conform if has_group(pwd, symbols):. This would increase readability a lot.

@wrmelicher
Copy link
Member

I took a quick look at it, and I'm not sure that this is an issue yet. I need to think about it and investigate more. As for the results for 3class12 in the paper, the test set was definitely 3class12 since it was collected from a separate data set which didn't touch this portion of the code. It might be possible that this method of selecting training data isn't ideal though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants