Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cdxj_indexer maintenance is laggy and complex. It becomes to be a concern for warc2zim.
By laggy I mean for instance webrecorder/cdxj-indexer#26 which highlight that as-of-today, using cdxj-indexer forces us to use an old
idna
version (< 3, which means < 2021).By complex I mean for instance webrecorder/cdxj-indexer#24 which highlight that as-of-today, using cdxj-indexer forces is to use PyAMF library which is not maintained anymore (this lib is not maintained by webrecorder team).
I doubt these two issues can be quickly solved due to their potentially complex consequences for a lib widely shared and probably running on various Python versions.
I want to insist that this is not really a problem of lack of maintenance effort by cdxj_indexer maintainer.
Since the pace of cdxj_indexer changes are very limited, and our usage of the codebase quite modest, I propose to hence "fork" cdxj_indexer in warc2zim, solve these two issues (and few others) on our side, and if this proves to work well then we will be in a better position to suggest to fix upstream issues, but again confirming impact on a significant ecosystem is not necessarily easy.
Note that this not a total "fork", rather of "duplication" of useful cdxj_indexer code in our own codebase.