Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pe_source module from pe-reports #1

Open
wants to merge 40 commits into
base: develop
Choose a base branch
from

Conversation

aloftus23
Copy link
Collaborator

@aloftus23 aloftus23 commented Jul 6, 2023

🗣 Description

Move the existing pe-source module in pe-reports to this repo.

💭 Motivation and context

Splitting the repo will allow other applications to install just the pe-source module without any of the report/email functionality in pe-reports.

🧪 Testing

Updates pytests and passes pre-commits.

✅ Pre-approval checklist

  • This PR has an informative and human-readable title.
  • Changes are limited to a single goal - eschew scope creep!
  • All future TODOs are captured in issues, which are referenced
    in code comments.
  • All relevant type-of-change labels have been added.
  • I have read the CONTRIBUTING document.
  • These code changes follow cisagov code standards.
  • All relevant repo and/or project documentation has been updated
    to reflect the changes in this PR.
  • Tests have been added and/or modified to cover the changes in this PR.
  • All new and existing tests pass.

@aloftus23 aloftus23 self-assigned this Jul 6, 2023
@aloftus23 aloftus23 marked this pull request as ready for review July 18, 2023 21:06
@aloftus23
Copy link
Collaborator Author

Ready for review

@aloftus23
Copy link
Collaborator Author

@dav3r This PR is ready. Let me know if you have any questions

Copy link
Member

@dav3r dav3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to review the rest of this later this week. I noted a few items that caught my eye.

setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@aloftus23
Copy link
Collaborator Author

I'll try to review the rest of this later this week. I noted a few items that caught my eye.

Thanks. Addressed your initial review. Will look out for the rest.

Copy link
Member

@dav3r dav3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a little bit further through my review, but there's a lot of code in this PR. I'll have to continue at a later date.

.DS_Store Outdated Show resolved Hide resolved
.flake8 Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
src/pe_source/__init__.py Outdated Show resolved Hide resolved
src/pe_source/data/database.ini Outdated Show resolved Hide resolved
tests/.DS_Store Outdated Show resolved Hide resolved
src/pe_source/pe_scripts.py Outdated Show resolved Hide resolved
Copy link
Member

@jsf9k jsf9k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some change requests and suggestions for improvement.

.github/workflows/build.yml Outdated Show resolved Hide resolved
.github/workflows/build.yml Outdated Show resolved Hide resolved
.github/workflows/build.yml Outdated Show resolved Hide resolved
.gitignore Outdated
pe_reports_logging.log
src/pe_source/data/dnstwist_output.txt


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary blank line:

Suggested change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here f58ca92

- types-setuptools
- types-python-dateutil==2.8.19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put this dependency in alphabetical order like the rest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: 5d1b723

"""A tool for gathering pe source data.

Usage:
pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS] [--soc_med_included]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscores are not used for any of the other options:

Suggested change
pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS] [--soc_med_included]
pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS] [--soc-med-included]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: d8e46a6 and 3a2e262

If not specified, all will run. Valid values are "alerts",
"credentials", "mentions", "topCVEs". E.g. alerts,mentions.
[default: all]
-sc --soc_med_included Include social media posts from cybersixgill in data collection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscores are not used for any of the other options:

Suggested change
-sc --soc_med_included Include social media posts from cybersixgill in data collection.
-sc --soc-med-included Include social media posts from cybersixgill in data collection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d8e46a6

Comment on lines 25 to 28
for org_index, org_row in org_names_df.iterrows():
for domain_index, domain_row in domain_df.iterrows():
if org_row["domain_name"] == domain_row["domainName"]:
domain_df.at[domain_index, "org"] = org_row["org"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could possibly rewrite this to avoid the double for loop and speed it up. Just something to think about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Made adjustments here: c362c8e

import scrubadub
import scrubadub.detectors.date_of_birth

# List of unique regexes to identify each state's Drivers License format in a larger string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all 50 states not listed because the states not listed use a person's SSN as the identifier?

for i, ip_chunk in enumerate(ip_chunks):
count = i + 1
try_count = 1
while try_count < 7:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers like 7 should be avoided; use variables so you can easily change the value later if necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: 10f0e6d

src/pe_source/cybersixgill.py Outdated Show resolved Hide resolved
src/pe_source/cybersixgill.py Outdated Show resolved Hide resolved
Copy link
Member

@dav3r dav3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got through a few more files - will continue to chip away at this review when I am able to.

src/pe_source/cybersixgill.py Show resolved Hide resolved
src/pe_source/cybersixgill.py Outdated Show resolved Hide resolved
src/pe_source/data/dnsmonitor/root_domains_dnsmonitor.csv Outdated Show resolved Hide resolved
src/pe_source/data/dnsmonitor/source.py Outdated Show resolved Hide resolved
Was pulled from another repo. Not needed here
@aloftus23 aloftus23 requested a review from jasonodoom as a code owner August 18, 2023 15:25
@coveralls
Copy link

coveralls commented Sep 24, 2023

Pull Request Test Coverage Report for Build 5592380198

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-100.0%) to 0.0%

Totals Coverage Status
Change from base Build 5977993120: -100.0%
Covered Lines: 0
Relevant Lines: 0

💛 - Coveralls

aloftus23 and others added 23 commits September 24, 2023 14:12
Remove .DS_Store
Co-authored-by: dav3r <[email protected]>
Co-authored-by: Shane Frasier <[email protected]>
Co-authored-by: dav3r <[email protected]>
Co-authored-by: Shane Frasier <[email protected]>
Co-authored-by: Shane Frasier <[email protected]>
Co-authored-by: dav3r <[email protected]>
Change --soc_med_included to --soc-med-included

change -csg to -c and -sc to -s to consistently keep the flags one character
Add comment explaining for cybersixgill.py and add variable for retry count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants