Skip to content
This repository has been archived by the owner on Jan 25, 2021. It is now read-only.

salevajo/benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Create a test dataset for benchmarking

The DigitalCorpora.org Team is providing a huge amount of digital corpora for use in computer forensics research.

At http://downloads.digitalcorpora.org/corpora/files/govdocs1/threads you can find and download zip files containing distinct sets of around 1000 files each. This is a useful dataset for benchmarking processing speed.

Create a collection folder within your Hoover checkout dir ~/docker-setup:

cd ~/docker-setup
mkdir -p collections/benchmark
cd collections/benchmark

Use fetch.sh to get all files:

#!/bin/bash
cd data
for i in {0..9}
do
   echo "Downloading thread$i.zip"
   curl http://downloads.digitalcorpora.org/corpora/files/govdocs1/threads/thread$i.zip -o thread$i.zip
done

Process all files using Hoover:

cd ~/docker-setup
./createcollection -c benchmark
./instructions/init-benchmark.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages