Detecting Insecure Code with LLMs

Prompt Experiments for Python Vulnerability Detection

To run the notebook or data preprocessing script, you will need to create a virtual environment and install the required packages.

virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt

Data Processing Script

The dataset from the Pearce et. al paper is processed with the process_copilot_cwe_data.py script and the raw scenario data as input. It produces an output file (processed_copilot_cwe_data.json) that is used by the notebook.

Usage: Download and extract the raw scenario data from the link above and use the path to copilot-cwe-scenarios-dataset as the first input arg and the desired output file name as the second input arg to the script.

python process_copilot_cwe_data.py /home/user/copilot-cwe-scenarios-dataset processed_copilot_cwe_data.json

Prompt Experiments & Results

Detecting_Insecure_Code - You can use the included processed_copilot_cwe_data.json file or generate it yourself using the process_copilot_cwe_data.py script (see Data Processing Script). To run the notebook, you need to have the openai python library installed and a valid OpenAI API key assigned to the OPENAI_API_KEY environment variable (alternatively, you can paste it into the first cell). Another environment variable DATASET_DIR should be set to the full path of the directory containing processed_copilot_cwe_data.json. The notebook is purposefully saved with visible output so that running the notebook is not necessary to see the results.

Citations

@misc{pearce2021asleep, title={Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions}, author={Hammond Pearce and Baleegh Ahmad and Benjamin Tan and Brendan Dolan-Gavitt and Ramesh Karri}, year={2021}, eprint={2108.09293}, archivePrefix={arXiv}, primaryClass={cs.CR} }

Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2021). Copilot CWE Scenarios Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5225651

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
Detecting_Insecure_Code.ipynb		Detecting_Insecure_Code.ipynb
README.md		README.md
data_utils.py		data_utils.py
prompt_utils.py		prompt_utils.py
requirements.txt		requirements.txt
ui_utils.py		ui_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Insecure Code with LLMs

Prompt Experiments for Python Vulnerability Detection

Data Processing Script

Prompt Experiments & Results

Citations

About

Releases

Packages

Languages

mhbuehler/cwe-detection

Folders and files

Latest commit

History

Repository files navigation

Detecting Insecure Code with LLMs

Prompt Experiments for Python Vulnerability Detection

Data Processing Script

Prompt Experiments & Results

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages