-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pe_source module from pe-reports #1
base: develop
Are you sure you want to change the base?
Conversation
… versions For example, pandas 1.5.1
Ready for review |
@dav3r This PR is ready. Let me know if you have any questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to review the rest of this later this week. I noted a few items that caught my eye.
Thanks. Addressed your initial review. Will look out for the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it a little bit further through my review, but there's a lot of code in this PR. I'll have to continue at a later date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some change requests and suggestions for improvement.
.gitignore
Outdated
pe_reports_logging.log | ||
src/pe_source/data/dnstwist_output.txt | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary blank line:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here f58ca92
.pre-commit-config.yaml
Outdated
- types-setuptools | ||
- types-python-dateutil==2.8.19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this dependency in alphabetical order like the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here: 5d1b723
src/pe_source/pe_scripts.py
Outdated
"""A tool for gathering pe source data. | ||
|
||
Usage: | ||
pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS] [--soc_med_included] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscores are not used for any of the other options:
pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS] [--soc_med_included] | |
pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS] [--soc-med-included] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/pe_source/pe_scripts.py
Outdated
If not specified, all will run. Valid values are "alerts", | ||
"credentials", "mentions", "topCVEs". E.g. alerts,mentions. | ||
[default: all] | ||
-sc --soc_med_included Include social media posts from cybersixgill in data collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscores are not used for any of the other options:
-sc --soc_med_included Include social media posts from cybersixgill in data collection. | |
-sc --soc-med-included Include social media posts from cybersixgill in data collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d8e46a6
for org_index, org_row in org_names_df.iterrows(): | ||
for domain_index, domain_row in domain_df.iterrows(): | ||
if org_row["domain_name"] == domain_row["domainName"]: | ||
domain_df.at[domain_index, "org"] = org_row["org"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could possibly rewrite this to avoid the double for
loop and speed it up. Just something to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Made adjustments here: c362c8e
import scrubadub | ||
import scrubadub.detectors.date_of_birth | ||
|
||
# List of unique regexes to identify each state's Drivers License format in a larger string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all 50 states not listed because the states not listed use a person's SSN as the identifier?
for i, ip_chunk in enumerate(ip_chunks): | ||
count = i + 1 | ||
try_count = 1 | ||
while try_count < 7: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic numbers like 7
should be avoided; use variables so you can easily change the value later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here: 10f0e6d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got through a few more files - will continue to chip away at this review when I am able to.
Was pulled from another repo. Not needed here
Pull Request Test Coverage Report for Build 5592380198Details
💛 - Coveralls |
Remove .DS_Store
Co-authored-by: dav3r <[email protected]> Co-authored-by: Shane Frasier <[email protected]>
Co-authored-by: dav3r <[email protected]> Co-authored-by: Shane Frasier <[email protected]>
Co-authored-by: Shane Frasier <[email protected]> Co-authored-by: dav3r <[email protected]>
Co-authored-by: Shane Frasier <[email protected]>
…urce into AL-copy-pe-reports
Change --soc_med_included to --soc-med-included change -csg to -c and -sc to -s to consistently keep the flags one character
Co-authored-by: dav3r <[email protected]>
Co-authored-by: dav3r <[email protected]>
…urce into AL-copy-pe-reports
Co-authored-by: dav3r <[email protected]>
…urce into AL-copy-pe-reports
Add comment explaining for cybersixgill.py and add variable for retry count
🗣 Description
Move the existing pe-source module in pe-reports to this repo.
💭 Motivation and context
Splitting the repo will allow other applications to install just the pe-source module without any of the report/email functionality in pe-reports.
🧪 Testing
Updates pytests and passes pre-commits.
✅ Pre-approval checklist
in code comments.
to reflect the changes in this PR.