-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translating SRT subtitle files #23
Comments
Hi @dgoryeo So not off of the bat no. But I managed to get it working by editing the custom instructions and putting your text in a TXT file. Using GPT with the following settings {
"base translation settings": {
"prompt_assembly_mode": 1,
"number_of_lines_per_batch": 48,
"sentence_fragmenter_mode": 2,
"je_check_mode": 2,
"number_of_malformed_batch_retries": 1,
"batch_retry_timeout": 700,
"number_of_concurrent_batches": 2,
"gender_context_insertion": false,
"is_cote": false
},
"openai settings": {
"openai_model": "gpt-4-turbo",
"openai_system_message": "As a Japanese to English subtitle translator, translate Japanese into English, everything else should remain in its original tense. You will receive text in roughly the format of '1 [newline] 00:00:01,000 --> 00:00:04,000 [newline] こんにちは。', in which you will only translate the Japanese and keep the rest as it was. In that case you would return '1 [newline] 00:00:01,000 --> 00:00:04,000 [newline] Hello.' The real text would have newlines which you would preserve. Keep pre-translated terms and anticipate names not replaced. Match the output's line count to the input's.",
"openai_temperature": 0.3,
"openai_top_p": 1.0,
"openai_n": 1,
"openai_stream": false,
"openai_stop": null,
"openai_logit_bias": null,
"openai_max_tokens": null,
"openai_presence_penalty": 0.0,
"openai_frequency_penalty": 0.0
},
"gemini settings": {
"gemini_model": "gemini-pro",
"gemini_prompt": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
"gemini_temperature": 0.3,
"gemini_top_p": null,
"gemini_top_k": null,
"gemini_candidate_count": 1,
"gemini_stream": false,
"gemini_stop_sequences": null,
"gemini_max_output_tokens": null
},
"deepl settings":{
"deepl_context": "",
"deepl_split_sentences": "ALL",
"deepl_preserve_formatting": true,
"deepl_formality": "default"
}
} I was able to give your text as a TXT file and got this as output 1
00:00:01,000 --> 00:00:04,000
This is an example of the first subtitle.
2
00:00:05,000 --> 00:00:08,000
The next subtitle is this one. To do this more effectively, I imagine you'd have to change Kudasai quite a bit, you'd have to make it work with other files than TXT. Which should be easy, and maybe add an SRT mode that just takes the Japanese out and puts it back in if it is known. I unfortunately don't have the time to fully look into it, but I'd welcome anyone else to do it. |
I won't be personally revisiting Kudasai for some time, as I am super busy and have other commitments but I'll try to add direct support whenever I do as it's an interesting use case. But until then that should be a good workaround |
Thanks @Bikatr7. Much appreciated. I'll give it a try. Will let you know. |
Hi, I just came acrosss Kudasai. It looks promising.
Is there a way to use Kudasai to translate SRT subtitle files efficiently?
For example in the case below, only the text lines are packed to be sent to the translation and then repacked to map back to the timelines:
The text was updated successfully, but these errors were encountered: