Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update URL instead of creating new records in mongoDB #427

Open
romisfrag opened this issue Sep 13, 2024 · 2 comments
Open

feat: update URL instead of creating new records in mongoDB #427

romisfrag opened this issue Sep 13, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request mongodb

Comments

@romisfrag
Copy link
Contributor

For example I have runned the katana command
image

And katana does not give the status code (because it try to follow redirect and it's circular so it can't).
But this is not the problem, imagine that then I run the following httpx on every URL that was found by katana
image
Then I will have 4 records inside my database, 2 records created for katana and 2 records by httpx

I know there is a field that store the _source of the tool that found the URL, but does this not cause a problem?

Maybe nothing to do here, maybe I am just talking about a wanted behavior.

@romisfrag
Copy link
Contributor Author

Maybe the most problematic is when running the same tool, twice the new records get also added to the database
image
I get the first 4 and then 4 new again

@ocervell
Copy link
Contributor

ocervell commented Sep 13, 2024

This is mostly by design, updating a previous record would require going through the whole list of URLs in MongoDB, which would be too heavy when the number of records grow.

The vision is more to have use the implemented de-duplication of findings in MongoDB in the target workspace instead, so you can query URLs with {"_type": "url", "_context.workspace_id": <WORSPACE_ID, "_context.workspace_duplicate": false} and you would get de-deduped results from the workspace.

Ideally katana should run httpx internally to enrich results using the Go library so that we don't need to run the latter at all.

@ocervell ocervell changed the title feat: Update URL instead of creating new records in mongoDB feat: update URL instead of creating new records in mongoDB Sep 17, 2024
@ocervell ocervell self-assigned this Oct 10, 2024
@ocervell ocervell added enhancement New feature or request mongodb labels Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mongodb
Projects
None yet
Development

No branches or pull requests

2 participants