You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ideally, we would include revisit records at playback time, as they indicate when we visited a page even if the content did not change. As of PyWB 2.6.2, large chains of redirects still seem to cause problems, and it is not clear that the closest_limit is working as expected. See webrecorder/pywb#606
Not sure how to handle this. For now, skipping redirects from CDX queries.
The redirect_to_exact setting doesn't seem to be working now either.
The text was updated successfully, but these errors were encountered:
Ah, so the example I was looking at had a chain of over 300 warc/revisit entries before hitting the most recent copy of the GOV.UK robots.txt file. This is over the hard-coded 100, so this is why it didn't resolve. But even upping that, it's really slow.
Hm. This is caused specifically by revisit lookups or bouncing between 3xx redirects, or a combination of both?
Probably the main optimization is just to include the redirect URL in the CDXJ, especially in case of redirects.
If it is a chain of revisits that ends up just being a 200, then probably not much that can be done?
Perhaps something to discuss also in the context of reindexing?
Current status is that I'm filtering out all revisit records at playback time. This is sub-optimal, as you can't see when pages were seen unchanged, but can't be resolved until this issue is resolved.
Ideally, we would include revisit records at playback time, as they indicate when we visited a page even if the content did not change. As of PyWB 2.6.2, large chains of redirects still seem to cause problems, and it is not clear that the
closest_limit
is working as expected. See webrecorder/pywb#606Not sure how to handle this. For now, skipping redirects from CDX queries.
The
redirect_to_exact
setting doesn't seem to be working now either.The text was updated successfully, but these errors were encountered: