Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for keeping the Geo-IP database updated for Domain Crawls #123

Open
anjackson opened this issue Nov 9, 2023 · 0 comments
Open

Comments

@anjackson
Copy link
Contributor

For Domain Crawls, we rely on (GeoLite2 Free Geolocation Data)[https://dev.maxmind.com/geoip/geolite2-free-geolocation-data] to find URLs that are in the UK but not on UK domain names.

Maxmind stopped allowing unauthenticated downloads to that DB file, so now we need to find a different way to keep it up to date. This likely means using the GEOLITE2_CITY_MMDB_LOCATION configuration option to map the DB file in from the host rather than used the version embedded in the ukwa/ukwa-heritrix container, and then documenting how to update it as part of the DC setup process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant