Skip to content

Releases: netarchivesuite/jwarc-cdx-indexer-workflow

Version-1.0

14 Nov 09:29
Compare
Choose a tag to compare

Release version-1.0.
Jwarc-cdx-indexer-workflow is a workflow to start a large scale CDX-indexing of WARC-files.

Download and extract the release.

To use:

  1. Edit the configuration in conf/jwarc-cdx-indexer-workflow-behaviour.yaml
  2. Create a txt file with list of WARC-files to process
  3. Start the workflow with bin/start-script.sh

For more details see:
https://github.com/netarchivesuite/jwarc-cdx-indexer-workflow/blob/main/README.md