docker run -it --name rep -v /WORKING_DIRECTORY:/my_home ubuntu
apt-get update
apt-get install libboost-all-dev
We use the same version released by SABD authors: fast-dbrd. The code is under REP
folder. Note: please run SABD data preprocessing first, as the data used by this approach is generated by SABD
.
An example to run
./build/bin/fast-dbrd -n rep_mozilla_no_version -r ./ranknet-configs/full-textual-no-version.cfg --ts /YOUR_HOME_DIRECTORY/SABD/dataset/eclipse/timestamp_file.txt --time-constraint 365 --training-duplicates 922 --recommend /YOUR_HOME_DIRECTORY/SABD/dataset/eclipse/dbrd_test.txt
Notes:
Please check the issues under fast-dbrd-modified
repo to understand what each argument means. Simply put:
-n xxxx
means the name of the output file-r
choose the configurations--ts
timestamp file is generated bySABD
, do link to there--time-constraint 365
, we use one-year time window--training-duplicates
, the number is the number of duplicate bug reports in the training data (including training and validation)--recommend xxx.dbrd_test.txt
is also generated bySABD
Create the environment from the SABD/environment.yml
file:
conda env create -f environment.yml
git clone https://github.com/stanfordnlp/GloVe glove
Install Make (can use the same docker as REP)
apt-get update & apt-get install cmake
Allow run and run demo.sh
+ the project name. Please check our sample SABD/demo.sh
.
chmod 777 demo.sh
./demo.sh eclipse
Download glove.42B.300d and unzip
wget http://nlp.stanford.edu/data/glove.42B.300d.zip
unzip glove.42B.300d.zip
Create the environment from the HINDBR/hindbr.yml
file:
conda env create -f hindbr.yml
for HINDBR py2 environment
conda env create -f py27-env.yml
docker pull mysql
docker run --name dbrd-mysql -e MYSQL_ROOT_PASSWORD=12345678 -d mysql
Please download the data from here.
You can also download the processed word embeddings from here.
Please check each folder for the commands to run the approaches.
Please check the result folder. result-log
Please refer to the notebook.
Thanks the everyone kindly share their implementations and be patient to answer our questions.
Please consider citing our work:
@article{zhang2022duplicate,
author = {Zhang, Ting and Han, DongGyun and Vinayakarao, Venkatesh and Irsan, Ivana Clairine and Xu, Bowen and Thung, Ferdian and Lo, David and Jiang, Lingxiao},
title = {Duplicate Bug Report Detection: How Far Are We?},
year = {2022},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1049-331X},
url = {https://doi.org/10.1145/3576042},
doi = {10.1145/3576042},
abstract = {Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.},
note = {Just Accepted},
journal = {ACM Trans. Softw. Eng. Methodol.},
month = {dec},
keywords = {Bug Reports, Empirical Study, Duplicate Bug Report Detection, Deep Learning}
}
If you have any questions, feel free to contact Ting Zhang (email: [email protected] or [email protected]).