MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

• 🤗 Data • 🤗 Model • 🐱 Code • 📃 Paper

Large Language Models (LLMs) have showcased impressive capabilities in handling straightforward programming tasks. However, their performance tends to falter when confronted with more challenging programming problems. We observe that conventional models often generate solutions as monolithic code blocks, restricting their effectiveness in tackling intricate questions. To overcome this limitation, we present Modular-of-Thought Coder (MoTCoder). We introduce a pioneering framework for MoT instruction tuning, designed to promote the decomposition of tasks into logical sub-tasks and sub-modules. Our investigations reveal that, through the cultivation and utilization of sub-modules, MoTCoder significantly improves both the modularity and correctness of the generated solutions, leading to substantial relative pass@1 improvements of 12.9% on APPS and 9.43% on CodeContests.

Performance

Performance on APPS

Model	Size	Introductory	Interview	Competition	All
CodeT5	770M	6.60	1.03	0.30	2.00
CodeRL+CodeT5	770M	7.08	1.86	0.75	2.69
text-davinci-002	-	-	-	-	7.48
Self-edit+text-davinci-002	-	-	-	-	7.94
GPT-2	0.1B	5.64	6.93	4.37	6.16
	1.5B	7.40	9.11	5.05	7.96
GPT-Neo	2.7B	14.68	9.85	6.54	10.15
GPT-3	175B	0.57	0.65	0.21	0.55
StarCoder	15B	7.25	6.89	4.08	6.40
WizardCoder	15B	26.04	4.21	0.81	7.90
CodeChain+WizardCoder	15B	26.29	7.49	3.75	10.50
Octocoder	16B	16.50	7.92	4.61	8.97
Codellama	7B	14.15	6.63	4.00	7.61
	13B	23.94	13.50	9.80	14.85
	34B	32.01	18.61	10.19	19.61
Codellama-Python	7B	18.83	8.62	4.47	9.83
	13B	26.40	13.44	6.86	14.72
	34B	26.45	16.61	8.77	17.01
Codellama-Instruct	7B	14.20	6.63	4.43	7.70
	13B	22.41	14.34	6.62	15.21
	34B	28.64	16.80	10.51	17.91
Deepseek-Coder-Base	6.7B	40.23	22.12	13.04	23.92
Deepseek-Coder-Instruct	6.7B	44.65	23.86	12.89	25.83
MoTCoder	6.7B	50.01	29.81	14.36	30.76
GPT-4	-	34.97	13.75	14.52	18.15

Performance on CodeContests

Model	Size	pass@1	pass@5
code-davinci-002	-	1.00	-
code-davinci-002 + CodeT	-	3.20	-
WizardCoder	15B	1.98	3.27
WizardCoder + CodeChain	15B	2.48	3.30
Octocoder	16B	4.95	13.03
Codellama	7B	0.30	1.11
	13B	2.12	6.26
	34B	5.35	12.02
Codellama-Python	7B	4.75	10.30
	13B	4.75	12.32
	34B	5.86	14.85
Codellama-Instruct	7B	2.12	6.26
	13B	5.96	12.02
	34B	6.46	14.24
Deepseek-Coder-Base	6.7B	6.46	15.25
Deepseek-Coder-Instruct	6.7B	6.87	8.18
MoTCoder	6.7B	9.29	16.97
GPT-4	-	16.36	-

Environment

Install the dependencies.

python -m pip install -e .

Evaluation Datasets

APPS Dataset

The APPs dataset [github] can be download from huggingface.

CodeContests Dataset

The CodeContests dataset [github] can be download from huggingface. For CodeContests, convert your dataset to the same format as APPs for utilizting APPs evaluation metrics:

python src/convert_codecontest_dataset.py $SRC_DIR $DST_DIR

Inference

You can download our MoTCoder for evaluation from huggingface. We provide the inference command to reproduce the results in our paper.

If you want to use modular-of-thought inference prompt, set prompt_type=FORMAT_PROMPT.
If you want to use normal inference prompt, set prompt_type=NORMAL_FORMAT_PROMPT.

First generate the solutions for you targeted evaluation dataset. Choice 1: VLLM (Recommended) To install the requreiments:

pip install vllm

Inference:

python src/inference_vllm.py \
    --model_path $model_path \
    --data_path $data_path \
    --solution_path $solution_path \
    --prompt_type $prompt_type

Choice 2: transformers Inference:

python src/inference.py \
    $model_path \
    $data_path \
    $solution_path \
    $prompt_type

APPs Evaluation

For APPs evaluation, choices of $level$ include $introductory, interview, competition$.

python src/test_leetcode.py \
    --solutions_path $solution_path \
    --data_path $data_path \
    --save_path $result_path \
    --level $level

CodeContests Evaluation

python src/test_apps.py \
    --solutions_path $solution_path \
    --data_path $data_path \
    --save_path $result_path

Training

Modular-of-Thought Training Dataset

We provide an example python file to evolution a MoT dataset. Run the following command:

python src/generate_MoT_dataset.py \
    --data_path $data_path \
    --save_path $MoT_data_path \
    --api_base $api_base \
    --api_key $api_key

MoTCode Dataset

Or, you can download our generated modular-of-thought code dataset.

from datasets import load_dataset
load_dataset("JingyaoLi/MoTCode-Data")

Modular-of-Thought Training

Run the following command to train the model

deepspeed src/train.py \
    --model_name_or_path $model_path \
    --data_path $MoT_data_path \
    --output_dir $output_dir \
    --num_train_epochs 3 \
    --model_max_length 2048 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --warmup_steps 30 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True \
    --prompt_type FORMAT_PROMPT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

Performance

Environment

Evaluation Datasets

APPS Dataset

CodeContests Dataset

Inference

APPs Evaluation

CodeContests Evaluation

Training

Modular-of-Thought Training Dataset

MoTCode Dataset

Modular-of-Thought Training

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

Performance

Environment

Evaluation Datasets

APPS Dataset

CodeContests Dataset

Inference

APPs Evaluation

CodeContests Evaluation

Training

Modular-of-Thought Training Dataset

MoTCode Dataset

Modular-of-Thought Training