Skip to content

Latest commit

 

History

History
124 lines (105 loc) · 12.1 KB

README.md

File metadata and controls

124 lines (105 loc) · 12.1 KB

Reptory

This repository provides a dataset along with a framework to assist comparative experimental studies on learning-based automated program repair.

Dataset:

  • Dataset is based upon Deepbugs. The following files in the create-dataset/ folder that are in charge of data extraction are borrowed from DeepBugs although we changed them accordingly:

    • extractFromJS.js
    • extractorOfBinOps.js
    • extractorOfCalls.js
    • fileIDs.json
    • jsExtractionUtil.js
    • Util.py
  • Dataset comprises of named based bug patterns:

    • Swapped function arguments
    • Wrong binary operator
    • Wrong operand in binary operation

Framework: - A framework where researchers can incorporate additional context and use on the existing dataset.

Prerequisites

  • Install conda

    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh
    bash ~/miniconda.sh -b -p $HOME/miniconda
    conda init
    source ~/miniconda/bin/activate
    source ~/.bash_profile
    
  • Setup environment

    • Install conda:

      conda create -n context_ml python=3.6 
      conda activate context_ml
      conda install -y python=3.6    
      
    • Install tensorflow using pip: pip install tensorflow==1.5

    • Install tensorflow using conda: conda install -y -c conda-forge tensorflow=1.5.1

    • Required python packages for embedding generation:

         conda install -c anaconda nltk
         conda install -c anaconda gensim
      

      Detailed:

        conda install -c anaconda nltk
              
        import nltk
        nltk.download('punkt')
        [nltk_data] Downloading package punkt to /Users/UserName/nltk_data...
        [nltk_data]   Unzipping tokenizers/punkt.zip.
      
    • Install nvm and npm:

          curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash
          source .bashrc
          nvm install --lts
          nvm use --lts
      
    • Install node dependencies:

        cd src
        npm install
      

Dataset

Dataset

Code Representations

ID Buggy Fixed Dataset Generation Script
1a Word tokenization Word tokenization data_word_level_esprima.sh
1b Word tokenization Enhanced Word tokenization Enhanced data_word_level_word_tokenization_enhanced.sh
2 Deepbugs Representation Deepbugs Representation data_word_level_deepbugs.sh
3 Deepbugs Representation (with Types Incomplete with variable value) Deepbugs Representation (with Types Incomplete with variable value) data_word_level_deepbugs_with_type_and_variable.sh
4 Deepbugs Representation (with Types Incomplete without variable value) Deepbugs Representation (with Types Incomplete without variable value) data_word_level_deepbugs_with_type.sh
5 Code Simplification (Signatures) Code Simplification (Signatures) data_word_level_synthesized.sh
6 Code Simplification (Signatures with position anchor) Code Simplification (Signatures with position anchor) data_word_level_synthesized_with_anchor.sh
7 Code Simplification (Signatures with LIT/ID) Code Simplification (Signatures with LIT/ID) data_word_level_synthesized_with_ID_LIT.sh
8 Code Simplification (Signatures with position anchor and LIT/ID) Code Simplification (Signatures with position anchor and LIT/ID) data_word_level_synthesized_with_anchor_with_ID_LIT.sh
9 AST (of original code) AST (of original code) data_word_level_ast.sh
10 AST (of code simplification -> Type with variable value) AST (of code simplification -> Type with variable value) data_word_level_synthesized_with_variable_ast.sh
11 AST (of code simplification -> Types without variable value) AST (of code simplification -> Types without variable value) data_word_level_synthesized_without_variable_ast.sh
12 Preorder AST (of original code) Preorder AST (of original code) prepare_calls_ast_preorder.sh
13 Abstraction - Tufano Abstraction - Tufano prepare_calls_abstraction.sh

Mixed Representations

Buggy Fixed Dataset Generation Script
Code Simplification (function signatures with LIT/ID) AST (of code simplification -> Types without variable value) data_word_level_synthesized_with_ID_LIT_to_data_word_level_synthesized_without_variable_ast.sh
AST (of code simplification -> Types without variable value) Code Simplification (function signatures with LIT/ID) data_word_level_synthesized_without_variable_ast_to_data_word_level_synthesized_with_ID_LIT.sh
Word tokenization AST data_word_level_esprima_to_data_word_level_ast.sh
AST Word tokenization data_word_level_ast_to_data_word_level_esprima.sh
Buggy Fixed Dataset Generation Script
prepare_calls_tufano_abstraction_to_code_simplification_signatures_with_position_anchor.sh
prepare_code_simplification_signatures_with_position_anchor_to_calls_tufano_abstraction.sh

Generate Embedding

Embedding Script
word2Vec-CBOW getEmbeddings.sh
word2Vec-SkipGram get-embedding-final-skipgram.sh
fastText get-embedding-fasttext-final.sh
gloVe cd glove && make && getEmbeddings_glove.sh

Experiments for Embedding

Embedding Script
word2Vec-SkipGram prepare_calls_abstraction_word2vec_skipgram.sh
fastText prepare_calls_abstraction_fasttext.sh
gloVe run prepare_calls_abstraction_glove.sh and then cd GloVe && make && getEmbeddings_glove.sh. Finally run ./train-final-save-log.sh

How to calculate Accuracy and Rank?

python calculate_accuracy_and_rank.py test.correct test.buggy model.output