Set of scripts for simplification of using SPMF library for sequential rule and pattern mining in cybersecurity alerts.
Scripts provide two functions.
- Generate sequential databases in formats required by SPMF. For this purpose there is
src/databases/generate-db.py
script.- Databases can be generated from alerts in IDEA format or alerts in csv with parameters order described in
src/databases/alerts/csv.py
module. - Because SPMF requires numeric representation of items for majority of algorithms,
script generates output file with sequences where items are represented as numbers
and also csv mapping between numeric representation of items and string representation of items. The mapping is saved in the same directory with
.map
suffix. - For more information run
python src/databases/generate-db.py -h
.
- Databases can be generated from alerts in IDEA format or alerts in csv with parameters order described in
- Process outputs of SPMF library (files with rules and patterns). For this purpose there is
src/process-outputs.py
script.- The main purpose of processing outputs is to replace numeric representation of items with their string representation.
- For more information run
python src/process-outputs.py -h
.
(All scripts are written in python 3.6)
First of all install required packages.
$ pip install -r requirements.txt
Generate sequential database.
$ python src/databases/generate_db.py -i alerts.json -o data/db/ --format basic --db-types src src-port
-i alerts.json
Input file with alerts is alerts.json
.
-o data/db/
Databases will be saved into data/db/
directory.
We specify format as basic
which means, that the generated databases will look as described in src/databases/formats/basic/abstract.py
module.
The db-types parameter is specified as src src-port
which means that the databases defined in src/databases/formats/basic/src.py
and src/databases/formats/basic/src-port.py
will be generated.
Run SPMF algorithms.
Download spmf.jar
from http://www.philippe-fournier-viger.com/spmf/index.php?link=download.php.
$ java -jar spmf.jar run RuleGrowth data/db/basic/src-port data/outputs/src-port.RuleGrowth 0.001 0.1
Now we run RuleGrowth algorithm for the discovery of sequential rules. Rules are stored in data/outputs/src-port.RuleGrowth
file.
Process rules of SPMF.
$ python src/process-outputs.py -d data/db/basic/src-port -o data/outputs/src-port.RuleGrowth
Items in the file data/outputs/src-port.RuleGrowth
are now replaced with their string representations from data/db/basic/src-port.map
file.
If you want to create your own database, create a module containing class named Database
with a constructor and two methods.
d = Database(output_dir, file_suffix)
The constructor should be able to take two positional parameters. Output directory (where the database should be saved) and database suffix (add this string to output file name).d.read(alert)
Read method will be called for each alert in the input file. Alert is an instance of one of the classes fromsrc/databases/alerts/
package (it depends on the type of the input file).d.save()
Save method will be called after processing all alerts with read method.
You can extend AbstractDatabase
from src/databases/formats/AbstractDatabase.py
to make implementation of some stuff easier for you.
Put the module inside one of src/databases/formats/
packages or create your own. Then call generate-db.py
script with --format
as the package name and --db-types
as your module name.
This software package is an attachment to the paper "On the sequential pattern mining in the analysis of cyber security alerts" presented at ARES 2017 conference.
HUSÁK, Martin, Jaroslav KAŠPAR, Elias BOU-HARB a Pavel ČELEDA. On the Sequential Pattern and Rule Mining in the Analysis of Cyber Security Alerts. In Proceedings of the 12th International Conference on Availability, Reliability and Security. Reggio Calabria: ACM, 2017. s. "22:1-22:10", 10 s. ISBN 978-1-4503-5257-4. http://doi.acm.org/10.1145/3098954.3098981