Skip to content

MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation Learning

Notifications You must be signed in to change notification settings

CoDS-GCS/MEGR-APT-code

Repository files navigation

MEGR-APT

MEGR-APT is a scalable APT hunting system to discover suspicious subgraphs matching an attack scenario (query graph) published in Cyber Threat Intelligence (CTI) reports. MEGR-APT hunts APTs in a twofold process: (i) memory-efficient suspicious subgraphs extraction, and (ii) fast subgraph matching based on graph neural network (GNN) and attack representation learning.

Repository Roadmap

The input to the system are kernel audit logs in a structured database, Postgres, and attack query graphs in Json format. The system consist of multiple python scripts and other bash script to command them in an interactive way.

  • /src directory holds all python scripts.
  • /bash_src directory holds all bash scripts.
  • /technical_reports directory contains a separate documentation file to explain scripts.
  • /logs directory is the default location for all generated system logs
  • /model directory is the default location for all GNN-trained models.
  • /dataset directory is the default location for query graphs, IOC files, experiments checkpoints, and results and detected subgraphs.
  • Investigation_Reports.ipynb: A notebook with scripts to generate investigation reports for detected subgraphs. the notebook includes a demo scenario for two query graphs from DARPA TC3 CADETS host.

Installation

To setup the environment install requirements.txt then torch_requirements.txt. We prepared an example bash script for setting up the environment setup_environment.sh, Please recheck before using it.

The Stardog graph database instance should be set up, and the RDF Provenance Graphs should be loaded into it using bash_src/load_to_stardog.sh. The RDF Provenance Graphs are available at this link.

MEGR-APT system Architecture

System Architecture

MEGR-APT RDF Provenance graph construction

The first step in MEGR-APT is to construct provenance graphs in the RDF graph engine.

  • Use construct_pg_cadets.py to query kernel audit logs from a structured database, Postgres, and construct a provenance graph in NetworkX format.
  • Use construct_rdf_graph_cadets.py to construct RDF-Based provenance graphs and store them in RDF graph engine, Stardog.

MEGR-APT Hunting Pipeline

MEGR-APT hunting pipeline consist of 2 steps as follows:

  1. Use extract_rdf_subgraphs_cadets.py to extract suspicious subgraphs based on given attack query graphs' IOCs.
  2. Run main.py to find matches between suspicious subgraphs and attack query graphs using pre-trained GNN models (Has to run the script with the same parameters as the trained model, check the GNN matching documentation for more details).

The full hunting pipeline could be run using run-megrapt-on-a-query-graph.sh bash script to finds search for a specific query graph in a provenance graph. For evaluation, run-megrapt-per-host-for-evaluation.sh could be used. Use the Investigation_Reports.ipynb jupyter notebook to investigate detected subgraphs and produce a report to human analyst.

MEGR-APT Training Pipeline

To train a GNN graph matching model for MEGR-APT, you need to configure training/testing details in get_training_testing_sets() function in dataset_config.py file. Then take the following training steps:

  1. Use extract_rdf_subgraphs_[dataset].py with --training argument, to extract a training/testing set of random benign subgraphs.
  2. Use compute_ged_for_training.py to compute GED for the training set ( This step run is computationally expensive, takes long time, however it runs in parallel using multiple cores.).
  3. Run main.py with the selected model training parameters as arguments ( See the GNN matching documentation for more details). The training pipeline could be run using train_megrapt_model.sh bash script.

About

MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published