Refactor LLMs (#37)

* WIP: commited partially refactored llms to merge from main * Removed Chunk. Now LiteLLMModel's methods return a LLMResult * Refactored LiteLLMModel and MultipleCompletionsLLMModel MultipleCompletionsLLMModel is deprecated * updated tests * Updated uv.lock for new uv version * Updated test checking if an asyncIterator was a list * Reverted uv.lock changes * calling async callbacks with asyncio.gather * Dropped support to run_prompt * Updated cassettes for test_call * Fixed typing of llm_result_callback * Fix reburb error * added missing new cassette * Removed support to completion models This also renamed achat to acompletion to align better with litellm interface * Updated uv.lock It seems some entries had the old 'platform_system' marker due my old uv version. Now it is updated to 'sys_platform' * added typeguard to pyproject.toml * Fixed rate_limited typing * Fixed typing check in rate_limited * Avoided vcr for test_call_w_figure * Casting results in rate_limited to avoid type ignoring * Prepared to get deepseek reasoning from litellm Waiting for their release to validate this commit * Added atext_completion back to PassThroughRouter * Renamed LLM models in tests with more caution gpt-4o-mini was renamed to OPENAI_TEST, gpt-4o to GPT_4O, and gpt-3.5-turbo to GPT_35. As support to gpt-3.5-turbo-instruct was dropped, these tests were adapted to ANTHROPIC_TEST * Ruff fix * Implemented logprobs calculation in streaming response * Added .mailmap so pre-commit passes * Many fixes to typing in llms * Formatted cassettes * added deepseek test * Bumped litellm version for deepseek * Getting reasoning_content from deepseek models * Resolving PR comments (#39) * Added .mailmap so pre-commit passes * Many fixes to typing in llms * Formatted cassettes * Removed litellm from dev and removed python version checking * Changed attribute description for reasoning_content * removed deprecated MultipleCompletionLLMModel from llm.__all__ * adding formatted uv.lock * Fixing LiteLLM `Router.acompletion` typing issue (#43) * Cleaned up cassettes --------- Co-authored-by: James Braza <[email protected]>
Future-House · Jan 24, 2025 · dac0469 · dac0469
1 parent bac1b5f
commit dac0469
Show file tree

Hide file tree

Showing 39 changed files with 8,355 additions and 1,276 deletions.
diff --git a/llmclient/__init__.py b/llmclient/__init__.py
@@ -20,12 +20,10 @@
     CommonLLMNames,
     LiteLLMModel,
     LLMModel,
-    MultipleCompletionLLMModel,
     sum_logprobs,
     validate_json_completion,
 )
 from .types import (
-    Chunk,
     Embeddable,
     LLMResult,
 )
@@ -38,7 +36,6 @@
     "EXTRA_TOKENS_FROM_USER_ROLE",
     "GLOBAL_COST_TRACKER",
     "MODEL_COST_MAP",
-    "Chunk",
     "CommonLLMNames",
     "Embeddable",
     "EmbeddingModel",
@@ -49,7 +46,6 @@
     "LLMResult",
     "LiteLLMEmbeddingModel",
     "LiteLLMModel",
-    "MultipleCompletionLLMModel",
     "SentenceTransformerEmbeddingModel",
     "SparseEmbeddingModel",
     "configure_llm_logs",

diff --git a/llmclient/cost_tracker.py b/llmclient/cost_tracker.py
@@ -20,7 +20,12 @@ def __init__(self):
         self.report_every_usd = 1.0
 
     def record(
-        self, response: litellm.ModelResponse | litellm.types.utils.EmbeddingResponse
+        self,
+        response: (
+            litellm.ModelResponse
+            | litellm.types.utils.EmbeddingResponse
+            | litellm.types.utils.ModelResponseStream
+        ),
     ) -> None:
         self.lifetime_cost_usd += litellm.cost_calculator.completion_cost(
             completion_response=response