This repository contains the code and resources of the master's thesis Analysis and Optimization of Unsupervised Code-to-Code Translation
at the university of Heidelberg 2022. It is a fork of the original repository CodeGen from Facebook, which provided most of the code and the pretrained models for TransCoder, DOBF and TransCoder-ST.
Almost all code and scripts that were added during the master thesis can be found under codegen_sources/scripts.
Run the following command to clone the repository
git clone https://github.com/yakuhzi/c2c-translation.git
cd c2c-translation
Run the following script to install all required dependencies.
install_env.sh
The script will also download the pretrained TransCoder and TransCoder-ST models and the validation and test set for evaluation.
- TransCoder-ST Baseline
- Rule-Based Error Corrections
- Constrained Beam Search
- Nearest Neighbor Machine Translation
- Combined Results
The results of all experiments are also shown in detail in this Excel.
The validation and test parallel datasets from GeeksForGeeks, and the evaluation scripts under data/transcoder_evaluation_gfg are released under the Creative Commons Attribution-ShareAlike 2.0 license. See https://creativecommons.org/licenses/by-sa/2.0/ for more information.
The rest of the repository is under the MIT license. See LICENSE for more details.