You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.
One of the these issues regularly occurs shortly after starting a new crawl:
The crawler keeps crawling, but only indexes the initial seed URLs: Outlinks are fetched, the crawlstatus index is still updated, but no new pages appear in the results index.
The crawler stops crawling entirely after a short time: No pages are fetched anymore at all, though the crawlstatus index contains newly discovered pages.
I can't reproduce this every time. Restarting the crawler continues the crawl as expected without further issues.
It seems to be a caching issue, as this only occurs for a crawl whose exact configuration has been used before (?).
After a thorough search I could not identify the cause; reverting to earlier versions does not fix the issue!
The text was updated successfully, but these errors were encountered:
disabling es.status.reset.fetchdate.after somehow caused the crawler
to not return any new results.
another issue remains:
a small percentage of pages is crawled again and again, but never
indexed nor updated to FETCHED..
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
One of the these issues regularly occurs shortly after starting a new crawl:
The crawler keeps crawling, but only indexes the initial seed URLs: Outlinks are fetched, the crawlstatus index is still updated, but no new pages appear in the
results
index.The crawler stops crawling entirely after a short time: No pages are fetched anymore at all, though the crawlstatus index contains newly discovered pages.
I can't reproduce this every time. Restarting the crawler continues the crawl as expected without further issues.
It seems to be a caching issue, as this only occurs for a crawl whose exact configuration has been used before (?).
After a thorough search I could not identify the cause; reverting to earlier versions does not fix the issue!
The text was updated successfully, but these errors were encountered: