LDPI is a "plugin" within the Comm. Sleeve's mesh_com repository. It monitors network interfaces to identify deviations from expected traffic patterns using unsupervised and semi-supervised learning approaches. The system primarily relies on normal traffic data for training, though incorporating additional malicious samples is optional and can enhance the method's effectiveness. The pre-trained models available in this repository were trained with n = 4 and l = 60, meaning that they use the first 4 packets, each trimmed to 60 bytes, for analysis.
This repo includes all necessary components for real-time traffic analysis, from data quantization to training scripts for custom dataset adaptation. It analyzes the initial packets of network flows using their 5-tuple identifiers and inputs the raw packet data into a deep learning model. Our default setup uses 1D convolutions, but the architecture is designed to be flexible, allowing the use of various encoders such as RNNs or transformers. It employs a compact ResNet model to accommodate resource-constrained environments. However, the encoder's capacity can be expanded to suit environments with more computational resources.
Pretraining involves contrastive learning with outlier exposure through distribution augmentation, utilizing only normal data. The following contrastive loss [1] is applied during pretraining:
Subsequently, the projection head is dropped/replaced, and the model is fine-tuned using the Deep SAD loss [2]:
Once training is completed, several thresholds are computed, such as the 99th percentile and maximum. The threshold used during inference is configurable based on these thresholds.
You can adapt the model to your network environment by retraining it with your data. Delete all models within the ldpi/training/output/
folder. Note: Deleting pretrained_model.pth
is optional. Place your .pcap
files, representing normal and attack data (if available), in the datasets
folder. To ensure compatibility, your dataset should adhere to the structure outlined below (refer to Dataset Structure). Note that the model was pre-trained with n=4; if you decide to change this parameter, all modes, including the pre-trained model, should be trained from scratch.
Training involves three main steps:
-
Data Preparation: Place your benign and malicious datasets in the
datasets/
directory, structured as mentioned above. -
Preprocessing: Run
ldpi/training/preprocessing.py
to preprocess your network data for training. -
Model Training: Execute
ldpi/training/training.py
to train the model using the preprocessed data.
Your custom dataset should comply with the following structure within ids-ldpi/datasets/
[1] Learning and Evaluating Representations for Deep One-class Classification
[2] Deep Semi-Supervised Anomaly Detection