Releases: pvs-hd-tea/23ws-LLMcoder
Releases · pvs-hd-tea/23ws-LLMcoder
LLMcoder v0.4: Tree of Completions with Backtracking
New Features
- LLMcoder now keeps all conversations in a sorted list and chooses the best one for the next feedback-loop
- New
backtracking
parameter to theLLMCoder
class False
: old greedy tree search algorithmTrue
: new backtracking algorithm- Sample the conversation for the next feedback-loop via softmax-sampling and a new
meta_temperature
parameter (by default 0, i.e. always pick the best conversation)
Chores
- Rewrite tests for new tree search algorithm
- Improve code clarity
Example
...
[LLMcoder] Have 3 conversations:
[LLMcoder] Passing Score Prob Path
[LLMcoder] False 4.42 1.0 ['R', 2]
[LLMcoder] False 3.42 0.0 ['R', 0]
[LLMcoder] False 0.42 0.0 ['R', 1]
[LLMcoder] Choosing conversation R-2 with score 4.42
...
[LLMcoder] Passing Score Prob Path
[LLMcoder] True 7.42 1.0 ['R', 2, 1]
[LLMcoder] False 5.42 0.0 ['R', 2, 0]
[LLMcoder] False 5.42 0.0 ['R', 2, 2]
[LLMcoder] False 3.42 0.0 ['R', 0]
[LLMcoder] False 0.42 0.0 ['R', 1]
[LLMcoder] Code is correct. Stopping early...
LLMcoder v0.3.1: Evaluation Upgrade, Bug Fixes
Bug Fixes
- Fix a bug where a cnofiguration would not be loaded correctly if it was passed as a string to a file to the
Evaluate
andMetrics
classes - Add critical analyzers to configs such that the pass criterion can be computed
- Fix tests failing due to removed addition of system prompt at init time
New Features
- Add new boxplot and violin plot to evaluation
LLMcoder v0.3: Evaluation Upgrade
LLMcoder now features an improved Evaluation
and Metrics
class that evaluate multiple configurations a specified number of times, storing them in an improved file structure:
- data
- my_dataset
- eval
- my_config
- time_of_evaluation of run 1
- readable_logs
- readable log of example 1
- readable log of example 2
...
- metrics.csv
- results.json
- time_of_evaluation of run 2
...
- my_second_config
...
- my_second_dataset
...
LLMcoder v0.2: Tree Feedback Loop
LLMcoder now has three analyzers:
- MypyAnalyzer: Detect type errors in the completion and provide the output as a feedback
- SignatureAnalyzer: Add signature hints for the relevant mypy errors for more guidance
- GPTScoreAnalyzer: Scores the code and completion by quality, plausibility, consistency, and readability
The original ("linear") feedback is now a special case of the more general "Tree of Feedback" approach:
LLMCoder.complete
now accepts an additional parameter n
that specifies the number of completion candidates to generate and analyze in parallel. The one with the best score and least mypy errors is chosen for the next feedback loop. The n_procs
parameter of the LLMCoder
specifies the number of processes to use for parallel analysis.
LLMcoder v0.1
First release of the LLMcoder, featuring a basic feedback-loop with mypy errors and signature hints.