deep-coder

This repository is a re-implementation of DeepCoder. DeepCoder can synthesize domain-specific programs from inputs/output examples.

Notice (Sep/1/2019)

I rewrite the implementation from scratch. The previous implementation is in the v0.0.0 tag.

Requirements

Colab

Google account

Local Runtime

Linux (I think the code works in macOSes, but I have not tested the code in a macOS)
Python3 (`Python)
make
g++

Usage

Warning The notebook in the examples directory will use Google Drive as data storage. Please be careful not to overwrite your data!

Quickstart

inference.ipynb synthesizes the domain-specific language using pre-training model (examples/medium/trained-model).

Training

1. Setup (if you use local runtimes)

# Download this repository and DeepCoder-Utils
$ git clone https://github.com/HiroakiMikami/deep-coder
$ cd deep-coder
$ git submodule init
$ git submodule update
# Build the search tool
$ make -C DeepCoder_Utils/enumerative-search -j $(nproc)
# Install python modules
$ pip install -r requirements.txt
# Setup Jupyter notebooks to use local runtimes of Colab
$ ./bin/init.bash

2. Training

The notebooks in examples/medium directory show how to train DeepCoder. Training consists of the following steps:

generate the dataset (examples/medium/generate_dataset.ipynb)
- It may take more than 1 hour in Colab.
- In the above example, I used a local runtime then uploaded the dataset file to the Google Drive (DeepCoder/dataset/length_3).
generate the baseline result (examples/medium/generate_baseline_results.ipynb)
train the DNN model by using the training dataset and validate the model (examples/medium/train.ipynb)
compare the results of the DNN model with the baseline (examples/medium/comparison_with_baseline.ipynb)

Run unit tests

$ python -m unittest discover test

Result

Small Scale Experiment

examples/small/integer_embeddings.ipynb shows the learned embedding of integers. The embedding was trained by using the dataset with length=1 programs and E=2 model.

It does not show the clear trend shown in Figure 8 in the paper. There are many possible causes (e.g., the procedure of dataset generation, training hyperparameters) and I don't know what the root cause of this difference is.

Medium Scale Experiment (Smaller-scale experiment in the paper)

Timeout needed to solve	20%	40%	60%
Baseline	53ms	122ms	375ms
DeepCoder	5ms	24ms	87ms
Speedup (this implementation)	10.8x	5.0x	3.6x
Speedup (Table 1 in the paper)	62.2x	54.6x	31.5x

The trained model speeds up the program synthesize. However, the performance of this implementation is worse than which of the paper. I think the reason for this difference is the same as the reason for the integer-embedding difference, but there is no basis.

The details of the results is in examples/medium/comparison_with_baseline.ipynb.

Loss Function

The binary attribute that is predicted by DNN is heavily imbalanced because each program in the dataset contains only 1-3 functions. For example, the attribute of a <- int | b <- [int] | c <- TAKE a b contains 33 False and only 1 True.

I doubted that this imbalance decreases the performance of the DNN model, and introduced cost-sensitive loss function (weighted_sigmoid_cross_entropy in src/model.py).

However, I cannot see the performance improvement in the medium scale experiment. examples/medium/loss_function_comparison.ipynb shows the results of the training using the cost-sensitive loss function. examples/medium/train_w0_{0.25|0.5|0.75}.ipynb shows the training logs.

Todos

Investigate the difference from the original paper
Run the large scale experiment (train with the program length of 4 dataset)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

deep-coder

Notice (Sep/1/2019)

Requirements

Colab

Local Runtime

Usage

Quickstart

Training

1. Setup (if you use local runtimes)

2. Training

Run unit tests

Result

Small Scale Experiment

Medium Scale Experiment (Smaller-scale experiment in the paper)

Loss Function

Todos

Files

README.md

Latest commit

History

README.md

File metadata and controls

deep-coder

Notice (Sep/1/2019)

Requirements

Colab

Local Runtime

Usage

Quickstart

Training

1. Setup (if you use local runtimes)

2. Training

Run unit tests

Result

Small Scale Experiment

Medium Scale Experiment (Smaller-scale experiment in the paper)

Loss Function

Todos