To run the notebook or data preprocessing script, you will need to create a virtual environment and install the required packages.
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt
The dataset from the Pearce et. al paper is
processed with the process_copilot_cwe_data.py
script and the raw scenario data as input. It
produces an output file (processed_copilot_cwe_data.json
) that is used by the notebook.
Usage:
Download and extract the raw scenario data from the link above and use the path to copilot-cwe-scenarios-dataset
as the first input arg and the desired output file name as the second input arg to the script.
python process_copilot_cwe_data.py /home/user/copilot-cwe-scenarios-dataset processed_copilot_cwe_data.json
Detecting_Insecure_Code -
You can use the included processed_copilot_cwe_data.json
file or generate it yourself using the process_copilot_cwe_data.py
script (see Data Processing Script). To run the notebook,
you need to have the openai
python library installed and a valid OpenAI API key assigned to the
OPENAI_API_KEY
environment variable (alternatively, you can paste it into the first cell). Another
environment variable DATASET_DIR
should be set to the full path of the directory containing processed_copilot_cwe_data.json
. The notebook is purposefully saved with visible output so that
running the notebook is not necessary to see the results.
@misc{pearce2021asleep, title={Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions}, author={Hammond Pearce and Baleegh Ahmad and Benjamin Tan and Brendan Dolan-Gavitt and Ramesh Karri}, year={2021}, eprint={2108.09293}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2021). Copilot CWE Scenarios Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5225651