Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Code Addition Request]: Pipeline for Detecting whether given PDF is malicious or not #569

Closed
3 tasks done
DarshAgrawal14 opened this issue Oct 13, 2024 · 2 comments · Fixed by #573
Closed
3 tasks done
Assignees
Labels
Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. gssoc-ext hacktoberfest level1 Status: Assigned💻 Indicates an issue has been assigned to a contributor.

Comments

@DarshAgrawal14
Copy link
Contributor

Have you completed your first issue?

  • I have completed my first issue

Guidelines

  • I have read the guidelines
  • I have the link to my latest merged PR

Latest Merged PR Link

#416

Project Description

I would like to contribute by developing a pipeline that, when provided with a PDF, extracts metadata, content, and other relevant features. These extracted elements are then processed and passed to a model, which predicts whether the PDF is malicious.

Additions:

  • Model Training notebook
  • Pdf feature extraction notebook
  • Data processing and model prediction notebook
  • Dataset used
  • Trained Model
  • Readme.md

Full Name

Darsh Agrawal

Participant Role

GSSOC

Copy link

🙌 Thank you for bringing this issue to our attention! We appreciate your input and will investigate it as soon as possible.

Feel free to join our community on Discord to discuss more!

@UTSAVS26 UTSAVS26 added Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. Status: Assigned💻 Indicates an issue has been assigned to a contributor. level1 gssoc-ext hacktoberfest labels Oct 14, 2024
UTSAVS26 added a commit that referenced this issue Oct 16, 2024
## Pull Request for PyVerse 💡

### Requesting to submit a pull request to the PyVerse repository.

---

#### Issue Title
**Please enter the title of the issue related to your pull request.**  
Pipeline for Detecting whether given PDF is malicious or not

- [x] I have provided the issue title.

---

#### Info about the Related Issue
**What's the goal of the project?**  
*Describe the aim of the project.*

- [ ] I have described the aim of the project.

---

#### Name
**Please mention your name.**  
Darsh Agrawal

- [x] I have provided my name.

---

#### GitHub ID
**Please mention your GitHub ID.**  
DarshAgrawal14 

- [x] I have provided my GitHub ID.

---

#### Email ID
**Please mention your email ID for further communication.**  
[email protected]

- [x] I have provided my email ID.

---

#### Identify Yourself
**Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC,
SWOC).**
GSSOC

- [x] I have mentioned my participant role.

---

#### Closes
**Enter the issue number that will be closed through this PR.**  
Closes: #569 

- [x] I have provided the issue number.

---

#### Describe the Add-ons or Changes You've Made
**Give a clear description of what you have added or modified.**  
I have added a pipeline which detects whether the given pdf contains
malware or not. It extracts features from given pdf such as meta-data ,
images , links , content etc and analyses it in order to detect malware
or malicious content.



https://github.com/user-attachments/assets/4908ba06-627a-43ba-ac21-7ea14ccf632f



  

- [x] I have described my changes.

---

#### Type of Change
**Select the type of change:**  
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Code style update (formatting, local variables)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update

---

#### How Has This Been Tested?
**Describe how your changes have been tested.**  
*Describe your testing process here.*
Run the predict.py file along with path to pdf file and model will
predict whether the given pdf is malicious or not
- [x] I have described my testing process.

---

#### Checklist
**Please confirm the following:**  
- [x] My code follows the guidelines of this project.
- [x] I have performed a self-review of my own code.
- [x] I have commented my code, particularly wherever it was hard to
understand.
- [x] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have added things that prove my fix is effective or that my
feature works.
- [x] Any dependent changes have been merged and published in downstream
modules.
Copy link

✅ This issue has been closed. Thank you for your contribution! If you have any further questions or issues, feel free to join our community on Discord to discuss more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. gssoc-ext hacktoberfest level1 Status: Assigned💻 Indicates an issue has been assigned to a contributor.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants