Skip to content

Version-1.0

Latest
Compare
Choose a tag to compare
@thomasegense thomasegense released this 14 Nov 09:29
· 10 commits to main since this release

Release version-1.0.
Jwarc-cdx-indexer-workflow is a workflow to start a large scale CDX-indexing of WARC-files.

Download and extract the release.

To use:

  1. Edit the configuration in conf/jwarc-cdx-indexer-workflow-behaviour.yaml
  2. Create a txt file with list of WARC-files to process
  3. Start the workflow with bin/start-script.sh

For more details see:
https://github.com/netarchivesuite/jwarc-cdx-indexer-workflow/blob/main/README.md