[Enhancement] Overhaul indexing to be more efficient #540
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's new?
N/A
What's changed?
This is a very meaningful change to the way indexing works but you can skip to the bottom for a tl;dr
Background
Before, any indexing run would read the entire contents of a channel and capture everything, every time. This was initially seen as an advantage for two reasons:
In practice, the former rarely happens after the first few days of a video's upload and the latter happens so infrequently as to not be worth hinging such a big design decision on. I wouldn't have made this PR if it were just about idealized design choices - there are some real downsides to this approach:
To combat this, I introduced a concept called fast indexing. This helped alleviate the pressure by doing more frequent checks using more efficient mechanisms (RSS, API), but it still required a full scan once per month to ensure everything was kept in sync.
So, what's new?
Indexing has been updated so that it stops once it's reached content it's seen before. More precisely, it stops when it's scanned a certain number of previously-indexed videos. For this initial iteration I've set that number to 20, so this means that indexing stops once it's scanned 20 videos past the most recent video you've indexed.
Why have this weird offset logic? It's to still allow changes from the uploader to get picked up by Pinchflat. If I stopped once indexing had scanned the latest video you've indexed, any changes to recent videos wouldn't get picked up. This isn't a perfect approach, but I've been running some long tests over the last few months and 20 videos is the sweet spot for catching essentially all changes while not taking too much time. I'm sure there's a use case I haven't thought of so the number itself may be subject to change, but the opportunity to cut indexing times for large channels down from days to ~2 minutes was too good to pass up.
If you still need to scan the full channel for whatever reason, you can select "force index" from the actions menu while viewing a source. Playlists are not impacted by this change - only channels. Fast indexing still works great and there's no reason to move away from it if you're already using it.
TL;DR:
What's fixed?
Any other comments?
N/A