VulCurator: A Vulnerability-fixing Commit Detector

Preparation

Users can choose whether to prepare VulCurator from scratch or use the docker image. For ease of use, users can choose to use docker image as the required libraries are already installed to ensure that VulCurator can run smoothly.

Use Docker Image

For ease of use, we provide a docker image that can be accessed in: https://hub.docker.com/r/nguyentruongggiang/vfdetector

User can pull the docker image using below command:

docker pull nguyentruongggiang/vfdetector:v1

Run docker image:

docker run --name vfdetector -it --shm-size 16G --gpus all nguyentruongggiang/vfdetector:v1

Next, Move to VulCurator's working directory:

cd ../VFDetector

Noted that we can change the gpus parameter based on the spec that what we have.

Run VulCurator

Prepare Input

In order to run VulCurator, user must provide commits' info followed our predefined Json format:

[
    {
        {
        "id": <commit_id>, 
        "message": <commit_message>,
        "issue": {
            "title": <issue_title>,
            "body": <issue_body>,
            "comments" : [<list_of_comments]
        },
        "patch": [list_of_code_change]
    },
  ...
]

The issue's information is optional.

Output of VulCurator is a json file with format depending on the selected mode.

Prediction Mode

In Prediction Mode, given the input of a dataset of commits, VulCurator returns a list of likely vulnerability fixing commits along with the confidence scores. Note that, although VulCurator sets the classification threshold at 0.5 by default, VulCurator still allows users to adjust the threshold.

python application.py -mode prediction -input <path_to_input> -threshold <threshold> -output <path_to_output>

Ranking Mode

In the ranking mode, users can input data following our format and then VulCurator will output the sorted list of commits based on the probability that the commits are for vulnerability-fixing. Particular, users can use the following commands:

python application.py -mode ranking -input <path_to_input> -output <path_to_output>

Datasets:

For TensorFlow dataset, please refer to: https://zenodo.org/record/6792205#.YsG03-xByw4

For SAP dataset, please refer to paper: "HERMES: Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits"

Training:

Message Classifier:

python message_classifier.py --dataset_path <path_to_dataset> --model_path <saved_model_path>

Issue Classifier:

python issue_classifier.py --dataset_path <path_to_dataset> --model_path <saved_model_path>

Patch Classifier:

Finetuning: python vulfixminer_finetune.py --dataset_path <path_to_dataset> --finetune_model_path <saved_finetuned_model_path>
Training: python vulfixminer.py --dataset_path <path_to_dataset> --model_path <saved_model_file_path> --finetune_model_path <saved_finetuned_model_path> --train_prob_path <store_train_probability_to_path> --test_prob_path <store_test_probability_to_path>

Ensemble Classifier: python variant_ensemble.py --config_file <path_to_config>

Please follow our examples "tf_dataset.conf" or "sap_dataset.conf" for more details

Replicate our result:

Before, please download SAP dataset: https://drive.google.com/file/d/1NyCnXGD4VyVDZ2TMhqv4bDqYl14HDRUD/view?usp=sharing and put it in working directory

Next, please follow our instructions to replicate our experimental results:

For Tensorflow dataset:

To train message classifier: python message_classifier.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_message_classifier.sav

To train issue classifier python issue_classifier.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_issue_classifier.sav

To finetune CodeBERT for patch classifier: python vulfixminer_finetune.py --dataset_path tf_vuln_dataset.csv --finetune_model_path model/tf_patch_vulfixminer_finetuned_model.sav

To traing patch classifier: python vulfixminer.py --dataset_path tf_vuln_dataset.csv --model_path model/tf_patch_vulfixminer.sav --finetune_model_path model/tf_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/tf_patch_vulfixminer_train_prob.txt --test_prob_path probs/tf_patch_vulfixminer_test_prob.txt

To run ensemble classifier: python variant_ensemble.py --config_file tf_dataset.conf

Similarly, for SAP dataset:

To train message classifier: python message_classifier.py --dataset_path sub_enhanced_dataset_th_100.txt --model_path model/sap_message_classifier.sav

To train issue classifier python issue_classifier.py --dataset_path sub_enhanced_dataset_th_100.txt --model_path model/sap_issue_classifier.sav

To finetune CodeBERT for patch classifier: ``python vulfixminer_finetune.py --dataset_path sap_patch_dataset.csv --finetune_model_path model/sap_patch_vulfixminer_finetuned_model.sav```

To traing patch classifier: python vulfixminer.py --dataset_path sap_patch_dataset.csv --model_path model/sap_patch_vulfixminer.sav --finetune_model_path model/sap_patch_vulfixminer_finetuned_model.sav --train_prob_path probs/sap_patch_vulfixminer_train_prob.txt --test_prob_path probs/sap_patch_vulfixminer_test_prob.txt

To run ensemble classifier: python variant_ensemble.py --config_file sap_dataset.conf

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
model		model
probs		probs
vf_detector		vf_detector
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
application.py		application.py
application_old.py		application_old.py
commit_retriever.py		commit_retriever.py
config.py		config.py
cve_parser.py		cve_parser.py
data_loader.py		data_loader.py
data_preprocessor.py		data_preprocessor.py
dataset_formater.py		dataset_formater.py
ensemble_classifier.py		ensemble_classifier.py
entities.py		entities.py
feature_options.py		feature_options.py
github_issue_retriever.py		github_issue_retriever.py
info_sub_enhanced_dataset_th_100.txt.json		info_sub_enhanced_dataset_th_100.txt.json
info_tf_vuln_dataset.csv.json		info_tf_vuln_dataset.csv.json
issue_classifier.py		issue_classifier.py
issue_linker.py		issue_linker.py
issue_linker_infer.py		issue_linker_infer.py
issue_visualizer.py		issue_visualizer.py
main.py		main.py
message_classifier.py		message_classifier.py
message_visualizer.py		message_visualizer.py
model.py		model.py
neg_candidate.csv		neg_candidate.csv
patch_entities.py		patch_entities.py
prediction_sample_1.json		prediction_sample_1.json
sample_1.json		sample_1.json
sample_2.json		sample_2.json
sap_dataset.conf		sap_dataset.conf
sap_patch_dataset.csv		sap_patch_dataset.csv
sap_vuln_visualize_new.png		sap_vuln_visualize_new.png
selected_neg_sha.csv		selected_neg_sha.csv
tf_dataset.conf		tf_dataset.conf
tf_dataset_sap_format.txt		tf_dataset_sap_format.txt
tf_fixes.csv		tf_fixes.csv
tf_issue_linking.csv		tf_issue_linking.csv
tf_issue_linking_backup.csv		tf_issue_linking_backup.csv
tf_neg.csv		tf_neg.csv
tf_pos.csv		tf_pos.csv
tf_vuln_dataset.csv		tf_vuln_dataset.csv
tf_vuln_visualize.png		tf_vuln_visualize.png
tf_vuln_visualize_new.png		tf_vuln_visualize_new.png
trivial.py		trivial.py
utils.py		utils.py
variant_8_finetune_separate.py		variant_8_finetune_separate.py
variant_ensemble.py		variant_ensemble.py
vulfixminer.py		vulfixminer.py
vulfixminer_finetune.py		vulfixminer_finetune.py
vuln_visualize.png		vuln_visualize.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VulCurator: A Vulnerability-fixing Commit Detector

Preparation

Use Docker Image

Run VulCurator

Prepare Input

Prediction Mode

Ranking Mode

Datasets:

Training:

Replicate our result:

About

Releases

Packages

Languages

License

soarsmu/VulCurator

Folders and files

Latest commit

History

Repository files navigation

VulCurator: A Vulnerability-fixing Commit Detector

Preparation

Use Docker Image

Run VulCurator

Prepare Input

Prediction Mode

Ranking Mode

Datasets:

Training:

Replicate our result:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages