This repository is a re-implementation of DeepCoder. DeepCoder can synthesize domain-specific programs from inputs/output examples.
I rewrite the implementation from scratch. The previous implementation is in the v0.0.0 tag.
- Google account
- Linux (I think the code works in macOSes, but I have not tested the code in a macOS)
- Python3 (`Python)
make
g++
Warning
The notebook in the examples
directory will use Google Drive as data storage. Please be careful not to overwrite your data!
inference.ipynb
synthesizes the domain-specific language using pre-training model (examples/medium/trained-model
).
# Download this repository and DeepCoder-Utils
$ git clone https://github.com/HiroakiMikami/deep-coder
$ cd deep-coder
$ git submodule init
$ git submodule update
# Build the search tool
$ make -C DeepCoder_Utils/enumerative-search -j $(nproc)
# Install python modules
$ pip install -r requirements.txt
# Setup Jupyter notebooks to use local runtimes of Colab
$ ./bin/init.bash
The notebooks in examples/medium
directory show how to train DeepCoder.
Training consists of the following steps:
- generate the dataset (
examples/medium/generate_dataset.ipynb
)- It may take more than 1 hour in Colab.
- In the above example, I used a local runtime then uploaded the dataset file to the Google Drive (
DeepCoder/dataset/length_3
).
- generate the baseline result (
examples/medium/generate_baseline_results.ipynb
) - train the DNN model by using the training dataset and validate the model (
examples/medium/train.ipynb
) - compare the results of the DNN model with the baseline (
examples/medium/comparison_with_baseline.ipynb
)
$ python -m unittest discover test
examples/small/integer_embeddings.ipynb
shows the learned embedding of integers. The embedding was trained by using the dataset with length=1 programs and E=2
model.
It does not show the clear trend shown in Figure 8 in the paper. There are many possible causes (e.g., the procedure of dataset generation, training hyperparameters) and I don't know what the root cause of this difference is.
Timeout needed to solve | 20% | 40% | 60% |
---|---|---|---|
Baseline | 53ms | 122ms | 375ms |
DeepCoder | 5ms | 24ms | 87ms |
Speedup (this implementation) | 10.8x | 5.0x | 3.6x |
Speedup (Table 1 in the paper) | 62.2x | 54.6x | 31.5x |
The trained model speeds up the program synthesize. However, the performance of this implementation is worse than which of the paper. I think the reason for this difference is the same as the reason for the integer-embedding difference, but there is no basis.
The details of the results is in examples/medium/comparison_with_baseline.ipynb
.
The binary attribute that is predicted by DNN is heavily imbalanced because each program in the dataset contains only 1-3 functions. For example, the attribute of a <- int | b <- [int] | c <- TAKE a b
contains 33 False
and only 1 True
.
I doubted that this imbalance decreases the performance of the DNN model, and introduced cost-sensitive loss function (weighted_sigmoid_cross_entropy
in src/model.py
).
However, I cannot see the performance improvement in the medium scale experiment. examples/medium/loss_function_comparison.ipynb
shows the results of the training using the cost-sensitive loss function. examples/medium/train_w0_{0.25|0.5|0.75}.ipynb
shows the training logs.
- Investigate the difference from the original paper
- Run the large scale experiment (train with the program length of
4
dataset)