Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow crawlers on the /data subdirectory #334

Merged
merged 1 commit into from
Nov 10, 2024
Merged

Conversation

PGijsbers
Copy link
Contributor

The /data directory just contains our data files. We noticed crawlers using 95% of the total bandwidth by scraping data files. We do not see the use case.

Additionally, the explicit mention of the different user-agents is unnecessary. Most likely, it even did not do what it intended in the first place: googlebot (and I assume more) look at the most specific user-agent match, and follow those rules. That means that all the explicit mentions of the user-agents was doing, is to explicitly allow them to also crawl the /cgi-bin/ in addition to allowing everything else. That did not seem intentional. So, I took the liberty to simplify.

The `/data` directory just contains our data files. We noticed crawlers using 95% of the total bandwidth by scraping data files. We do not see the use case.

Additionally, the explicit mention of the different user-agents is unnecessary. Most likely, it even did not do what it intended in the first place: googlebot (and I assume more) look at the most specific user-agent match, and follow those rules. That means that all the explicit mentions of the user-agents was doing, is to explicitly allow them to also crawl the /cgi-bin/ in addition to allowing everything else. That did not seem intentional. So, I took the liberty to simplify.
@PGijsbers PGijsbers merged commit e735f01 into master Nov 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants