Skip to content

Commit

Permalink
Merge pull request #153 from codelion/feat-new-release-cepo
Browse files Browse the repository at this point in the history
The CePO approach
  • Loading branch information
codelion authored Jan 24, 2025
2 parents 63b25fb + d7b4591 commit 6663c8f
Show file tree
Hide file tree
Showing 6 changed files with 114 additions and 658 deletions.
53 changes: 31 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# optillm

optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries. It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time.
optillm is an OpenAI API compatible optimizing inference proxy which implements several state-of-the-art techniques that can improve the accuracy and performance of LLMs. The current focus is on implementing techniques that improve reasoning over coding, logical and mathematical queries.

It is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the [CePO approach](optillm/cepo) from Cerebras.

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/codelion/optillm)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing)
Expand Down Expand Up @@ -46,9 +48,16 @@ source .venv/bin/activate
pip install -r requirements.txt
```

Set up the `OPENAI_API_KEY` environment variable (for OpenAI)
or the `AZURE_OPENAI_API_KEY`, `AZURE_API_VERSION` and `AZURE_API_BASE` environment variables (for Azure OpenAI)
or the `AZURE_API_VERSION` and `AZURE_API_BASE` environment variables and login using `az login` for Azure OpenAI with managed identity (see [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity)).
We support all major LLM providers and models for inference. You need to set the correct environment variable and the proxy will pick the corresponding client.

| Provider | Required Environment Variables | Additional Notes |
|----------|-------------------------------|------------------|
| OptiLLM | `OPTILLM_API_KEY` | Uses the inbuilt local server for inference, supports logprobs and decoding techniques like `cot_decoding` & `entropy_decoding` |
| OpenAI | `OPENAI_API_KEY` | You can use this with any OpenAI compatible endpoint (e.g. OpenRouter) by setting the `base_url` |
| Cerebras | `CEREBRAS_API_KEY` | You can use this for fast inference with supported models, see [docs for details](https://inference-docs.cerebras.ai/introduction) |
| Azure OpenAI | `AZURE_OPENAI_API_KEY`<br>`AZURE_API_VERSION`<br>`AZURE_API_BASE` | - |
| Azure OpenAI (Managed Identity) | `AZURE_API_VERSION`<br>`AZURE_API_BASE` | Login required using `az login`, see [docs for details](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity) |
| LiteLLM | depends on the model | See [docs for details](https://docs.litellm.ai/docs/providers) |

You can then run the optillm proxy as follows.

Expand Down Expand Up @@ -325,7 +334,7 @@ Authorization: Bearer your_secret_api_key

## SOTA results on benchmarks with optillm

### CePO on math and code benchmarks
### CePO on math and code benchmarks (Jan 2025)

| Method | Math-L5 | MMLU-Pro (Math) | GPQA | CRUX | LiveCodeBench (pass@1) | Simple QA |
| -------------------------: | :-----: | :-------------: | :--: | :--: | :--------------------: | :-------: |
Expand Down Expand Up @@ -380,23 +389,23 @@ called patchflows. We saw huge performance gains across all the supported patchf
![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)

## References

- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](https://github.com/codelion/optillm/blob/main/optillm/plugins/coc_plugin.py)
- [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/entropy_decoding.py)
- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](https://github.com/codelion/optillm/blob/main/scripts/eval_frames_benchmark.py)
- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https://www.arxiv.org/abs/2408.14906) - [Inspired the implementation of the memory plugin](https://github.com/codelion/optillm/blob/main/optillm/plugins/memory_plugin.py)
- [Chain-of-Thought Reasoning Without Prompting](https://arxiv.org/abs/2402.10200) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/cot_decoding.py)
- [Re-Reading Improves Reasoning in Large Language Models](https://arxiv.org/abs/2309.06275) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/reread.py)
- [In-Context Principle Learning from Mistakes](https://arxiv.org/abs/2402.05403) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/leap.py)
- [Planning In Natural Language Improves LLM Search For Code Generation](https://arxiv.org/abs/2409.03733) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/plansearch.py)
- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/self_consistency.py)
- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://arxiv.org/abs/2408.06195) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/rstar.py)
- [Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692) - [Inspired the implementation of moa](https://github.com/codelion/optillm/blob/main/optillm/moa.py)
- [Prover-Verifier Games improve legibility of LLM outputs](https://arxiv.org/abs/2407.13692) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/pvg.py)
- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https://arxiv.org/abs/2405.00451) - [Inspired the implementation of mcts](https://github.com/codelion/optillm/blob/main/optillm/mcts.py)
- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](https://github.com/codelion/optillm/blob/main/optillm/rto.py)
- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/moa.py)
- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](https://github.com/codelion/optillm/blob/main/optillm/rto.py)
- [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)
- [Entropy Based Sampling and Parallel CoT Decoding](https://github.com/xjdr-alt/entropix) - [Implementation](optillm/entropy_decoding.py)
- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https://arxiv.org/abs/2409.12941) - [Evaluation script](scripts/eval_frames_benchmark.py)
- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https://www.arxiv.org/abs/2408.14906) - [Inspired the implementation of the memory plugin](optillm/plugins/memory_plugin.py)
- [Chain-of-Thought Reasoning Without Prompting](https://arxiv.org/abs/2402.10200) - [Implementation](optillm/cot_decoding.py)
- [Re-Reading Improves Reasoning in Large Language Models](https://arxiv.org/abs/2309.06275) - [Implementation](optillm/reread.py)
- [In-Context Principle Learning from Mistakes](https://arxiv.org/abs/2402.05403) - [Implementation](optillm/leap.py)
- [Planning In Natural Language Improves LLM Search For Code Generation](https://arxiv.org/abs/2409.03733) - [Implementation](optillm/plansearch.py)
- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https://arxiv.org/abs/2203.11171) - [Implementation](optillm/self_consistency.py)
- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https://arxiv.org/abs/2408.06195) - [Implementation](optillm/rstar.py)
- [Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692) - [Inspired the implementation of moa](optillm/moa.py)
- [Prover-Verifier Games improve legibility of LLM outputs](https://arxiv.org/abs/2407.13692) - [Implementation](optillm/pvg.py)
- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https://arxiv.org/abs/2405.00451) - [Inspired the implementation of mcts](optillm/mcts.py)
- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](optillm/rto.py)
- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](optillm/moa.py)
- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](ptillm/rto.py)

## Citation

Expand Down
39 changes: 35 additions & 4 deletions optillm.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,26 @@ def load_plugins():
if not plugin_approaches:
logger.warning("No plugins loaded from any location")

def get_config_path():
# Get installed package config directory
import optillm
package_config_dir = os.path.join(os.path.dirname(optillm.__file__), 'cepo', 'configs')
package_config_path = os.path.join(package_config_dir, 'cepo_config.yaml')

# Get local project config directory
current_dir = os.getcwd() if server_config.get("config_dir", "") == "" else server_config["config_dir"]
local_config_dir = os.path.join(current_dir, 'optillm', 'cepo', 'configs')
local_config_path = os.path.join(local_config_dir, 'cepo_config.yaml')

# If local config exists and is different from package config, use local
if os.path.exists(local_config_path) and local_config_path != package_config_path:
logger.debug(f"Using local config from: {local_config_path}")
return local_config_path

# Otherwise use package config
logger.debug(f"Using package config from: {package_config_path}")
return package_config_path

def parse_combined_approach(model: str, known_approaches: list, plugin_approaches: dict):
if model == 'auto':
return 'SINGLE', ['none'], model
Expand Down Expand Up @@ -701,13 +721,24 @@ def parse_args():
base_url_default = os.environ.get("OPTILLM_BASE_URL", "")
parser.add_argument("--base-url", "--base_url", dest="base_url", type=str, default=base_url_default,
help="Base url for OpenAI compatible endpoint")

# Use the function to get the default path
default_config_path = get_config_path()

# Special handling of all the CePO Configurations
for field in fields(CepoConfig):
parser.add_argument(f"--cepo_{field.name}", dest=f"cepo_{field.name}", type=field.type, default=None, help=f"CePO configuration for {field.name}")

parser.add_argument(f"--cepo_config_file", dest=f"cepo_config_file", type=str, default="./optillm/cepo/configs/cepo_config.yaml", help="Path to CePO configuration file")

parser.add_argument(f"--cepo_{field.name}",
dest=f"cepo_{field.name}",
type=field.type,
default=None,
help=f"CePO configuration for {field.name}")

parser.add_argument("--cepo_config_file",
dest="cepo_config_file",
type=str,
default=default_config_path,
help="Path to CePO configuration file")

args = parser.parse_args()

# Convert argument names to match server_config keys
Expand Down
4 changes: 2 additions & 2 deletions optillm/cepo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

CePO is an inference-time computation method designed to enhance the accuracy of large language models (LLMs) on tasks requiring reasoning and planning, such as solving math or coding problems. It integrates several advanced techniques, including Best of N, Chain of Thought (CoT), Self-Reflection, Self-Improvement, and Prompt Engineering.

If you have any questions or want to contribute, please reach out to us on [cerebras.ai/discord](cerebras.ai/discord)
If you have any questions or want to contribute, please reach out to us on [cerebras.ai/discord](https://cerebras.ai/discord)

## CePO Methodology

Expand Down Expand Up @@ -41,4 +41,4 @@ Interestingly, the self-critique and quality improvement capabilities of existin
| 3 | 5 | 8 | absolute | 69.4 | 84.3 | 55.6 | 81.1 | |
| 5 | 3 | 6 | absolute | 68.7 | 85.4 | 54.8 | 79.9 | |
| 7 | 3 | 6 | absolute | 69.6 | 82.8 | 54.7 | 78.4 | |
| 9 | 3 | 6 | absolute | 68.9 | 83.4 | 55.7 | 80.6 | |
| 9 | 3 | 6 | absolute | 68.9 | 83.4 | 55.7 | 80.6 | |
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@

setup(
name="optillm",
version="0.0.36",
version="0.1.0",
packages=find_packages(),
py_modules=['optillm'],
package_data={
'optillm': ['plugins/*.py'], # Include plugin files
'optillm': ['cepo/configs/*.yaml'], # Include yaml files in the package
},
include_package_data=True, # This is important
install_requires=[
Expand Down Expand Up @@ -36,6 +37,7 @@
"gradio",
# Constrain spacy version to avoid blis build issues on ARM64
"spacy<3.8.0",
"cerebras_cloud_sdk",
],
entry_points={
'console_scripts': [
Expand Down
8 changes: 5 additions & 3 deletions test.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,20 @@
import logging
from openai import OpenAI

from litellm_wrapper import LiteLLMWrapper
from optillm.litellm_wrapper import LiteLLMWrapper
from optillm.mcts import chat_with_mcts
from optillm.bon import best_of_n_sampling
from optillm.moa import mixture_of_agents
from optillm.rto import round_trip_optimization
from optillm.self_consistency import advanced_self_consistency_approach
from optillm.pvg import inference_time_pv_game
from optillm.z3_solver import Z3SolverSystem
from optillm.z3_solver import Z3SymPySolverSystem
from optillm.rstar import RStar
from optillm.cot_reflection import cot_reflection
from optillm.plansearch import plansearch
from optillm.leap import leap
from optillm.reread import re2_approach
from optillm.cepo.cepo import cepo, CepoConfig, init_cepo_config

# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
Expand All @@ -44,12 +45,13 @@ def __init__(self):
'rto': round_trip_optimization,
'self_consistency': advanced_self_consistency_approach,
'pvg': inference_time_pv_game,
'z3': lambda s, q, c, m: Z3SolverSystem(s, c, m).process_query(q),
'z3': lambda s, q, c, m: Z3SymPySolverSystem(s, c, m).process_query(q),
'rstar': lambda s, q, c, m: RStar(s, c, m).solve(q),
'cot_reflection': cot_reflection,
'plansearch': plansearch,
'leap': leap,
're2': re2_approach,
'cepo': lambda s, q, c, m: cepo(s,q,c,m,init_cepo_config({'cepo_config_file': './optillm/cepo/configs/cepo_config.yaml'})),
}

def load_test_cases(file_path: str) -> List[Dict]:
Expand Down
Loading

0 comments on commit 6663c8f

Please sign in to comment.