ssdeep Cluster clusters files using ssdeep as a comparison algorithm. Results are written out to a tar file, which puts the files into a directory with the files its comparable to. A file can be in multiple groups. I have found this tool to be helpful when needing to analyze a large number of samples, with an ever decreasing amount of time to do it in.
Included in the resulting tar file is a .gexf file. This can be used to visualize the results in Gephi.
git clone https://github.com/bwall/ssdc.git
cd ssdc
sudo python setup.py install
bwall@highwind:~$ ssdc -h
usage: ssdc [-h] [-v] [-r] [-o [output]] [-s] [-d] path [path ...]
Clusters files based on their ssdeep hash
positional arguments:
path Paths to files or directories to scan
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-r, --recursive Scan paths recursively
-o [output], --output [output]
Path to write the resulting tarball to
(default=output.tar)
-s, --storefiles Store files in output tar
-d, --dontcompute Treat input as ssDeep hashes
ssdc v1.2.0 by Brian Wallace (@botnet_hunter)