-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Ignore Parameter Values in Crawler #1165
Comments
Thanks so much for your feature request @Rand0x , we'll take a look into this! |
@Rand0x, thank you for your feature request. Have you explored the |
@dogancanbakir, thank you for your answer. Yes, I already explored As you can see in the image, I have a website that handles the CSRF token via the |
@Rand0x So, are you looking for an option to skip or ignore, for example, URLs that include the |
@Rand0x
|
Yes, correct.
I do not want to exclude links which contain the newtoken parameter. I want to exclude duplicates. example.com?file=abc.pdf&newtoken=12312312 example.com?file=deaf.pdf&newtoken=44332145&user=1 2 different sites, but every site links the other site, with a new value of the parameter |
Please describe your feature request:
I would like to request a feature for the Katana crawler that allows users to ignore the values of URL parameters during the crawling process. Currently, Katana crawls all variations of a URL, including those with different parameter values, which can lead to excessive crawling of fundamentally similar pages. For instance, the URLs "http://example.com?param1=1¶m2=2" and "http://example.com?param1=2¶m2=1" may lead to nearly identical content, yet they are treated as completely distinct pages by the crawler.
Describe the use case of this feature:
The primary motivation for this feature is to optimize the crawling efficiency of Katana. By ignoring the specific values of parameters, users can reduce the number of redundant requests made during a crawl. This would not only improve the crawling speed but also minimize the load on the target server, helping to avoid potential rate limiting or being flagged for excessive requests.
In practice, this feature could be particularly beneficial for users who work with large websites that have numerous parameters appended to their URLs, enabling a more streamlined and effective crawling process. It would help ensure that Katana focuses on the structural aspects of the site rather than getting caught in unnecessary loops due to value variations in query strings.
Thank you for considering this feature request to enhance the capabilities of the Katana crawler.
Additionally: It may be beneficial to allow users to choose which parameters to ignore, potentially by passing them as a list.
The text was updated successfully, but these errors were encountered: