Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translating SRT subtitle files #23

Open
dgoryeo opened this issue Aug 28, 2024 · 3 comments
Open

Translating SRT subtitle files #23

dgoryeo opened this issue Aug 28, 2024 · 3 comments

Comments

@dgoryeo
Copy link

dgoryeo commented Aug 28, 2024

Hi, I just came acrosss Kudasai. It looks promising.
Is there a way to use Kudasai to translate SRT subtitle files efficiently?
For example in the case below, only the text lines are packed to be sent to the translation and then repacked to map back to the timelines:

1
00:00:01,000 --> 00:00:04,000
これは最初の字幕の例です。

2
00:00:05,000 --> 00:00:08,000
次の字幕はこれです。
@Bikatr7
Copy link
Owner

Bikatr7 commented Aug 28, 2024

Hi @dgoryeo

So not off of the bat no.

But I managed to get it working by editing the custom instructions and putting your text in a TXT file.

Using GPT with the following settings

{
    "base translation settings": {
        "prompt_assembly_mode": 1,
        "number_of_lines_per_batch": 48,
        "sentence_fragmenter_mode": 2,
        "je_check_mode": 2,
        "number_of_malformed_batch_retries": 1,
        "batch_retry_timeout": 700,
        "number_of_concurrent_batches": 2,
        "gender_context_insertion": false,
        "is_cote": false
    },

    "openai settings": {
        "openai_model": "gpt-4-turbo",
        "openai_system_message": "As a Japanese to English subtitle translator, translate Japanese into English, everything else should remain in its original tense. You will receive text in roughly the format of '1 [newline] 00:00:01,000 --> 00:00:04,000 [newline] こんにちは。', in which you will only translate the Japanese and keep the rest as it was. In that case you would return '1 [newline] 00:00:01,000 --> 00:00:04,000 [newline] Hello.' The real text would have newlines which you would preserve. Keep pre-translated terms and anticipate names not replaced. Match the output's line count to the input's.",
        "openai_temperature": 0.3,
        "openai_top_p": 1.0,
        "openai_n": 1,
        "openai_stream": false,
        "openai_stop": null,
        "openai_logit_bias": null,
        "openai_max_tokens": null,
        "openai_presence_penalty": 0.0,
        "openai_frequency_penalty": 0.0
    },

    "gemini settings": {
        "gemini_model": "gemini-pro",
        "gemini_prompt": "As a Japanese to English translator, translate narration into English simple past, everything else should remain in its original tense. Maintain original formatting, punctuation, and paragraph structure. Keep pre-translated terms and anticipate names not replaced. Preserve terms and markers marked with >>><<< and match the output's line count to the input's. Note: 〇 indicates chapter changes.",
        "gemini_temperature": 0.3,
        "gemini_top_p": null,
        "gemini_top_k": null,
        "gemini_candidate_count": 1,
        "gemini_stream": false,
        "gemini_stop_sequences": null,
        "gemini_max_output_tokens": null
    },

    "deepl settings":{
        "deepl_context": "",
        "deepl_split_sentences": "ALL",
        "deepl_preserve_formatting": true,
        "deepl_formality": "default"
    }
    
}

I was able to give your text as a TXT file and got this as output

1
00:00:01,000 --> 00:00:04,000
This is an example of the first subtitle.
2
00:00:05,000 --> 00:00:08,000
The next subtitle is this one.

To do this more effectively, I imagine you'd have to change Kudasai quite a bit, you'd have to make it work with other files than TXT. Which should be easy, and maybe add an SRT mode that just takes the Japanese out and puts it back in if it is known.

I unfortunately don't have the time to fully look into it, but I'd welcome anyone else to do it.

@Bikatr7
Copy link
Owner

Bikatr7 commented Aug 28, 2024

I won't be personally revisiting Kudasai for some time, as I am super busy and have other commitments but I'll try to add direct support whenever I do as it's an interesting use case.

But until then that should be a good workaround

@dgoryeo
Copy link
Author

dgoryeo commented Aug 29, 2024

Thanks @Bikatr7. Much appreciated. I'll give it a try. Will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants