Users can choose whether to prepare VulCurator from scratch or use the docker image. For ease of use, users can choose to use docker image as the required libraries are already installed to ensure that VulCurator can run smoothly.
For ease of use, we provide a docker image that can be accessed in: https://hub.docker.com/r/nguyentruongggiang/vfdetector
User can pull the docker image using below command:
docker pull nguyentruongggiang/vfdetector:v1
Run docker image:
docker run --name vfdetector -it --shm-size 16G --gpus all nguyentruongggiang/vfdetector:v1
Next, Move to VulCurator's working directory:
cd ../VFDetector
Noted that we can change the gpus parameter based on the spec that what we have.
- In order to run VulCurator, user must provide commits' info followed our predefined Json format:
[
{
{
"id": <commit_id>,
"message": <commit_message>,
"issue": {
"title": <issue_title>,
"body": <issue_body>,
"comments" : [<list_of_comments]
},
"patch": [list_of_code_change]
},
...
]
The issue's information is optional.
Output of VulCurator is a json file with format depending on the selected mode.
In Prediction Mode, given the input of a dataset of commits, VulCurator returns a list of likely vulnerability fixing commits along with the confidence scores. Note that, although VulCurator sets the classification threshold at 0.5 by default, VulCurator still allows users to adjust the threshold.
python application.py -mode prediction -input <path_to_input> -threshold <threshold> -output <path_to_output>
In the ranking mode, users can input data following our format and then VulCurator will output the sorted list of commits based on the probability that the commits are for vulnerability-fixing. Particular, users can use the following commands:
python application.py -mode ranking -input <path_to_input> -output <path_to_output>
For TensorFlow dataset, please refer to: https://zenodo.org/record/6792205#.YsG03-xByw4
For SAP dataset, please refer to paper: "HERMES: Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits"
Message Classifier:
python message_classifier.py --dataset_path <path_to_dataset> --model_path <saved_model_path>
Issue Classifier:
python issue_classifier.py --dataset_path <path_to_dataset> --model_path <saved_model_path>
Patch Classifier:
- Finetuning:
python vulfixminer_finetune.py --dataset_path <path_to_dataset> --finetune_model_path <saved_finetuned_model_path>
- Training:
python vulfixminer.py --dataset_path <path_to_dataset> --model_path <saved_model_file_path> --finetune_model_path <saved_finetuned_model_path> --train_prob_path <store_train_probability_to_path> --test_prob_path <store_test_probability_to_path>
Ensemble Classifier:
python variant_ensemble.py --config_file <path_to_config>
Please follow our examples "tf_dataset.conf" or "sap_dataset.conf" for more details
Before, please download SAP dataset: https://drive.google.com/file/d/1NyCnXGD4VyVDZ2TMhqv4bDqYl14HDRUD/view?usp=sharing and put it in working directory
Next, please follow our instructions to replicate our experimental results:
For Tensorflow dataset:
To train message classifier:
python message_classifier.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_message_classifier.sav
To train issue classifier
python issue_classifier.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_issue_classifier.sav
To finetune CodeBERT for patch classifier:
python vulfixminer_finetune.py --dataset_path tf_vuln_dataset.csv --finetune_model_path model/tf_patch_vulfixminer_finetuned_model.sav
To traing patch classifier:
python vulfixminer.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_patch_vulfixminer.sav --finetune_model_path model/tf_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/tf_patch_vulfixminer_train_prob.txt --test_prob_path probs/tf_patch_vulfixminer_test_prob.txt
To run ensemble classifier:
python variant_ensemble.py --config_file tf_dataset.conf
Similarly, for SAP dataset:
To train message classifier:
python message_classifier.py --dataset_path sub_enhanced_dataset_th_100.txt --model_path model/sap_message_classifier.sav
To train issue classifier
python issue_classifier.py --dataset_path sub_enhanced_dataset_th_100.txt --model_path model/sap_issue_classifier.sav
To finetune CodeBERT for patch classifier: ``python vulfixminer_finetune.py --dataset_path sap_patch_dataset.csv --finetune_model_path model/sap_patch_vulfixminer_finetuned_model.sav```
To traing patch classifier:
python vulfixminer.py --dataset_path sap_patch_dataset.csv --model_path model/sap_patch_vulfixminer.sav --finetune_model_path model/sap_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/sap_patch_vulfixminer_train_prob.txt --test_prob_path probs/sap_patch_vulfixminer_test_prob.txt
To run ensemble classifier:
python variant_ensemble.py --config_file sap_dataset.conf