Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persistent sync problem #21

Closed
GGMaia opened this issue Feb 8, 2024 · 9 comments
Closed

persistent sync problem #21

GGMaia opened this issue Feb 8, 2024 · 9 comments
Assignees

Comments

@GGMaia
Copy link

GGMaia commented Feb 8, 2024

I am reporting that even after updating to GPT-4, the synchronization error in the speeches remains similar to what happened before (At least when I try to translate into my language, which is Portuguese).
What it seems is that ChatGPT tends to eat some lines but keep the timestamps, which causes the rest of the lines in the entire file to be extremely out of sync. I don't know if this is a fixable problem, since ChatGPT is the one making the mistake, not your code.

The original text:
image

the translated one:
image

@yazinsai
Copy link
Owner

hey @NeroQuill, thanks for the detailed report. I'm going to be adding a validation step to ensure output segments from GPT-4 always match the number of input segments.

@yazinsai yazinsai self-assigned this Feb 10, 2024
@GGMaia
Copy link
Author

GGMaia commented Feb 11, 2024

Thanks for your response. I tested it on 2 different files to try to do the translation and both show the synchronization error always at a similar timestamp.
Here's a side-by-side comparison of the exact timestamp that ChatGPT "eats" a line:

image

I wanted to know if this error only happens to me or if it is happening to more people in other languages or even in Portuguese.
I say this because there are some factors that may be interfering with ChatGPT to do the translation correctly without eating lines, which is the fact that I take an original .ASS file and transform it into an .SRT file, and then I delete around 500 initial lines of the subtitle, as they are the opening of the episode (I'm translating One Pace into Portuguese, which is a summary project for the One Piece anime that only has English subtitles). Other than that, I don't change anything else in the file that appears to maintain a reliable structure of an original .SRT file.

Thank you in advance for your work, which is brilliant and very important for the subtitles niche in the world. If you can make this work completely, it will be a perfect job.

@yamanbaris
Copy link

I have the same issue. It always skips some lines so the order number and sentences are not correct. How can we fix it?

@yazinsai
Copy link
Owner

I'm working on a fix, will keep this post updated

@yazinsai
Copy link
Owner

Streamed my attempt here: https://www.youtube.com/live/ScnHkYKvtRE

Made some progress by converting the response to JSON, but it still occasionally skips/merges some lines! 🫤

@yamanbaris
Copy link

Is there any update on this issue?

@Sptzzz
Copy link

Sptzzz commented Apr 14, 2024

Having the same issue. Seems to not be fixed so it's not reliable for day to day use atm :(

@einsteinx2
Copy link

I'm getting the same exact issue using the latest from main branch with gpt-4o model on the very first srt file I tried to translate (haven't tried other models or srt files yet). Somewhere in the middle it gets out of sync and makes the whole translated SRT useless unless I manually go through all of it and fix it which is not reasonable. Other than also eating all line breaks in the original subtitles which also requires manual fixing, it seems to work pretty well at the actual translation part.

Too bad it isn't really usable due to these issues without hours of cleanup work which defeats the purpose (also I really don't want to have to read all the lines of something I haven't watched yet since the whole point is to watch with my wife in English with Spanish subtitles).

This is sort of the thing with LLMs in general though isn't it... They're incredible when they work, but they only work like 80% on pretty much any task. So close yet so far...

@yazinsai
Copy link
Owner

Good news! I think we finally got this fixed with the latest (experimental) Gemini Flash 2.0 model. Merged and deploying now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants