Lightweight Deep Anomaly Detection for Network Traffic

LDPI is a "plugin" within the Comm. Sleeve's mesh_com repository. It monitors network interfaces to identify deviations from expected traffic patterns using unsupervised and semi-supervised learning approaches. The system primarily relies on normal traffic data for training, though incorporating additional malicious samples is optional and can enhance the method's effectiveness. The pre-trained models available in this repository were trained with n = 4 and l = 60, meaning that they use the first 4 packets, each trimmed to 60 bytes, for analysis.

This repo includes all necessary components for real-time traffic analysis, from data quantization to training scripts for custom dataset adaptation. It analyzes the initial packets of network flows using their 5-tuple identifiers and inputs the raw packet data into a deep learning model. Our default setup uses 1D convolutions, but the architecture is designed to be flexible, allowing the use of various encoders such as RNNs or transformers. It employs a compact ResNet model to accommodate resource-constrained environments. However, the encoder's capacity can be expanded to suit environments with more computational resources.

Pretraining and Training

Pretraining involves contrastive learning with outlier exposure through distribution augmentation, utilizing only normal data. The following contrastive loss [1] is applied during pretraining:

Subsequently, the projection head is dropped/replaced, and the model is fine-tuned using the Deep SAD loss [2]:

Once training is completed, several thresholds are computed, such as the 99th percentile and maximum. The threshold used during inference is configurable based on these thresholds.

Training/Fine-tuning On Your Dataset

You can adapt the model to your network environment by retraining it with your data. Delete all models within the ldpi/training/output/ folder. Note: Deleting pretrained_model.pth is optional. Place your .pcap files, representing normal and attack data (if available), in the datasets folder. To ensure compatibility, your dataset should adhere to the structure outlined below (refer to Dataset Structure). Note that the model was pre-trained with n=4; if you decide to change this parameter, all modes, including the pre-trained model, should be trained from scratch.

Training involves three main steps:

Data Preparation: Place your benign and malicious datasets in the datasets/ directory, structured as mentioned above.
Preprocessing: Run ldpi/training/preprocessing.py to preprocess your network data for training.
Model Training: Execute ldpi/training/training.py to train the model using the preprocessed data.

Dataset Structure

Your custom dataset should comply with the following structure within ids-ldpi/datasets/

References

[1] Learning and Evaluating Representations for Deep One-class Classification

[2] Deep Semi-Supervised Anomaly Detection

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
images		images
ldpi		ldpi
sniffer		sniffer
tests		tests
.gitignore		.gitignore
README.md		README.md
main.py		main.py
main_debug.py		main_debug.py
options.py		options.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lightweight Deep Anomaly Detection for Network Traffic

Pretraining and Training

Training/Fine-tuning On Your Dataset

Dataset Structure

References

About

Releases

Packages

Contributors 2

Languages

Willtl/ids-ldpi

Folders and files

Latest commit

History

Repository files navigation

Lightweight Deep Anomaly Detection for Network Traffic

Pretraining and Training

Training/Fine-tuning On Your Dataset

Dataset Structure

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages