Tech-stack: Vue3 -> Python (langchain/openai) -> Ollama
This project implements a local AI chat, with :
- Response pretty printing: markdown to html & code highlight
- Formatted response for code blocks (through ability prompt).
- Backend chat history with persistence (file store)
- Delete question/answer in file history.
- Delete all history.
- User history selector.
- LLM response live streaming: chunked streaming
- Stop current streaming response.
- Langchain callbacks logging (with truncated text in logs)
- File uploader
- File uploaded to
uploads
folder as files. - Mention file in question to LLM by pressing
@
and choose file from uploaded files assistant.
- File uploaded to
- Full conversation history export:
- Advanced prompts
- Multiple requests parametrization:
-
The frontend allows to trigger several questions (sequentially) to the LLM. You only need to provide a {variable} in the question & set the variable values in a single line, f.ex.:
Generate a full example code with {variable} in python. variable=Django, Flask, NumPy, Pandas, Matplotlib,Scikit-learn, Requests
-
- Multiple requests parametrization:
See README_OLLAMA.md
sudo service ollama start
See respective README.md docs: backend & frontend
- https://python.langchain.com/docs/concepts/
- https://python.langchain.com/docs/how_to/
- https://python.langchain.com/docs/integrations/platforms/openai/
- https://python.langchain.com/docs/integrations/chat/ollama/
- https://python.langchain.com/api_reference/ollama/index.html
- https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html
- https://api.python.langchain.com/en/latest/ollama/index.html
- https://api.python.langchain.com/en/latest/openai_api_reference.html
- https://github.com/openai/openai-python
- https://platform.openai.com/docs/api-reference/introduction
- https://platform.openai.com/docs/libraries/python-library
-
User persistence
- Options: model, history, question.
- Save generation time in history metadata.
-
Allow multiple question separated by ------------------- f.ex.
-
Collapse all responses.
-
Prompt:
@openai-local-ollama-chat Explicame el proyecto
ServiceException: op not found, upload it first! RuntimeError: Error loading uploads/openai-local-ollama-chat IsADirectoryError: [Errno 21] Is a directory: 'uploads/openai-local-ollama-chat'
-
K-shift error (see Known-issues):
- Continue doesn't generates K-shift error, checkout how.
- Option (front/back) to disable passing all history to LLM.
When context gets bigger and can't fit into VRAM (after several q/a f.ex.), Ollama throws the following error because Deepseek2 does not support K-shift
ollama[17132]: /go/src/github.com/ollama/ollama/llm/llama.cpp/src/llama.cpp:15110: Deepseek2 does not support K-shift
ollama[17132]: Could not attach to process. If your uid matches the uid of the target
ollama[17132]: process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
ollama[17132]: again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ollama[17132]: ptrace: Inappropriate ioctl for device.
ollama[17132]: No stack.
ollama[17132]: The program is not being run.