Releases: netarchivesuite/jwarc-cdx-indexer-workflow
Releases · netarchivesuite/jwarc-cdx-indexer-workflow
Version-1.0
Release version-1.0.
Jwarc-cdx-indexer-workflow is a workflow to start a large scale CDX-indexing of WARC-files.
Download and extract the release.
To use:
- Edit the configuration in conf/jwarc-cdx-indexer-workflow-behaviour.yaml
- Create a txt file with list of WARC-files to process
- Start the workflow with bin/start-script.sh
For more details see:
https://github.com/netarchivesuite/jwarc-cdx-indexer-workflow/blob/main/README.md