Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Ollama Response Handling with Retries and Streaming #305

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

7shi
Copy link
Contributor

@7shi 7shi commented Dec 20, 2024

This PR improves the reliability and user experience of the OllamaTranslator by addressing several issues I've encountered.

Current Issues:

  1. The current implementation sometimes gets stuck generating infinite responses, which can halt the translation process.

  2. Additionally, I've observed that certain models perform better than others for specific types of text, but the system currently lacks the ability to leverage multiple models effectively.

Implemented Solutions:

To address these issues, I've implemented several key improvements:

  1. I've added a length limitation that caps responses at either 2000 characters or three times the input length, whichever is greater. This threshold is based on typical translation length ratios and effectively prevents infinite responses while allowing for natural translation expansion.

  2. I've also added support for multiple models, allowing users to specify several models separated by semicolons. For example:

    OLLAMA_MODEL=gemma2:2b-instruct-q4_K_M;aya-expanse
    

    Each model gets two retry attempts, as my experience shows that additional retries rarely improve results. However, I've found that different models often succeed where others fail, making this multi-model approach particularly effective.

  3. I've implemented streaming responses to match the existing prompt display functionality. This provides immediate feedback during the translation process and helps identify where and when translation issues occur.

  4. I've added a fallback mechanism that returns the original text if all translation attempts fail, ensuring the program continues to operate without hanging while preserving the content.

These changes significantly improve translation reliability while maintaining compatibility with existing debug output patterns.

@Byaidu
Copy link
Owner

Byaidu commented Dec 20, 2024

Thanks for your contribution! However, maybe we can adjust the temperature parameter (currently 0) to avoid generating infinite response, which is more maintainable and elegant.

@7shi
Copy link
Contributor Author

7shi commented Dec 20, 2024

Thank you for your suggestion about adjusting the temperature parameter. I understand and respect your approach.

However, I'd like to clarify the current critical issue: When Ollama enters an infinite loop state, it completely blocks the translation process and prevents pdf2zh from even attempting to retry. If I forcibly terminate Ollama in this situation, pdf2zh enters an infinite reconnection loop, requiring me to forcibly terminate the Python process.

While adjusting the temperature may help reduce the frequency of infinite responses, I think we still need a safety mechanism to handle cases where the model gets stuck.

Would you consider accepting the timeout/length limitation part of the PR separately from the multi-model features? This would provide a essential safety net while we explore temperature-based solutions.

@Byaidu
Copy link
Owner

Byaidu commented Dec 20, 2024

Sure, the timeout/length limitation and multi-model are acceptable.

However, we might not accept a fallback mechanism that returns the original text, considering that program exceptions could be triggered not only by length limitations but also by issues within the ollama module itself (e.g., the user failing to correctly configure the ollama server, resulting in an HTTP error). This could complicate error diagnosis, especially considering the history of frequent ollama configuration errors reported in the issue tracker.

@7shi
Copy link
Contributor Author

7shi commented Dec 20, 2024

I agree with removing the fallback mechanism. Clear error messages would be more helpful for troubleshooting.

Currently, when OllamaTranslator raises an error, the system keeps retrying indefinitely. I understand this retry behavior might be related to the overall error handling policy of the project, so I'll leave the specific implementation to your judgment.

@Byaidu Byaidu merged commit 86c1869 into Byaidu:main Dec 20, 2024
2 checks passed
@7shi
Copy link
Contributor Author

7shi commented Dec 20, 2024

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants