Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache): use sqlite to store translation cache #324

Merged
merged 34 commits into from
Dec 25, 2024

Conversation

awwaawwa
Copy link
Contributor

@awwaawwa awwaawwa commented Dec 23, 2024

  • Implement database-related operations
  • Modify translator.py, have BaseTranslator handle cache matters
  • Add relevant tests

In addition, I modified the Test and Build Python Package Action to be triggered by any branch.

- add tortoise-orm>=0.22.2 to dependencies for database support
- update runner image to python 3.12 bookworm-slim for better compatibility
- use uv instead of pip
- update runner to 'ubuntu-latest'
- update uv pip install command to include --system flag
- include creation of virtual environment using `uv venv`
- set ignore rules for specific flake8 checks
- configure max line length to 88 characters
- include pep8-naming in the dev dependencies array
- aim to improve code style adherence to PEP 8 naming conventions
- rename findSID to find_sid to align with snake_case naming convention
- remove tortoise-orm>=0.22.2 from dependencies
- add peewee>=3.17.8 to dependencies
- introduce peewee ORM for database management
- add `_TranslationCache` model with unique constraints
- implement `TranslationCache` class for clean API usage
- add `init_db` function to initialize database
- improve translation cache handling with database storage
- bind and initialize in-memory test database for isolated testing
- add tests for translation cache operations and engine distinction
- include additional scenarios for parameter-based cache separation
- ensure cleanup and teardown after tests to maintain test consistency
…test has no practical significance

Although this test set the number of threads to 2, it did not test parallel translation paths.

This test was actually directly testing BaseTranslator.
@awwaawwa
Copy link
Contributor Author

For the issue described in #214, I have included both the translation engine and its parameters in the cache, and only when the engine, parameters, and original text are all identical, the cache is returned. However, parameters affecting the translation results need to be manually maintained in the translator.py file.

- implement a method to recursively sort dictionaries and lists in translation parameters
- ensure translation engine parameters are serialized consistently for caching
- include tests for non-string parameter serialization to JSON
- test consistent serialization of dicts regardless of key order
- validate cache behavior with sorted parameter inputs
- add tests for nested and complex parameter structures
- fix issue with incorrect handling of translate_engine_params types
- ensure nested dictionaries are sorted recursively before conversion to JSON
- separate `update_params` method for better readability
- update parameter type annotations for clarity
- improve default parameter handling in update_params method
- ensure thread safety considerations are noted in comments
- add tests for thread safety in cache operations
- ensure that all operations are consistent and correct under load
- rename update_params to replace_params for clarity
- refactor related method calls to maintain consistency
- change method name for clarity and consistency
- update related test cases to reflect the new method name
- rename the `translate` function to `do_translate`, making it easier for subclasses to override.
- Implement cache-related processing in BaseTranslator.
- Add parameters that affect translation results to the cache key.
@awwaawwa awwaawwa closed this Dec 23, 2024
@awwaawwa awwaawwa reopened this Dec 23, 2024
- implement unit tests for translation caching functionalities
- ensure cache interacts correctly with different input cases
- verify cache behavior with overridden translations and parameters
@awwaawwa awwaawwa marked this pull request as ready for review December 23, 2024 15:24
@awwaawwa
Copy link
Contributor Author

I tested OpenAITranslator locally, and it can cache correctly. The other translation engines have not been tested individually, but they should be fine.

- add test case to check translation without cache
- assert results are not equal when ignoring cache
- implement test to ensure BaseTranslator raises NotImplementedError
- verify that the translate method is not implemented for base class
@awwaawwa
Copy link
Contributor Author

awwaawwa commented Dec 23, 2024

I haven't written the ignore_cache CLI argument and web UI parameter. If anyone is interested in writing it, I hope it can be added after #329 is merged. Thank you very much. This is quite a big task, it will need some more time to modify.

@Byaidu Byaidu merged commit a1eb6c8 into Byaidu:main Dec 25, 2024
4 checks passed
@Byaidu
Copy link
Owner

Byaidu commented Dec 25, 2024

Thanks for contribution!

@awwaawwa awwaawwa deleted the better_cache branch December 25, 2024 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants