persistent sync problem #21

GGMaia · 2024-02-08T19:38:05Z

I am reporting that even after updating to GPT-4, the synchronization error in the speeches remains similar to what happened before (At least when I try to translate into my language, which is Portuguese).
What it seems is that ChatGPT tends to eat some lines but keep the timestamps, which causes the rest of the lines in the entire file to be extremely out of sync. I don't know if this is a fixable problem, since ChatGPT is the one making the mistake, not your code.

The original text:

the translated one:

yazinsai · 2024-02-10T07:09:17Z

hey @NeroQuill, thanks for the detailed report. I'm going to be adding a validation step to ensure output segments from GPT-4 always match the number of input segments.

GGMaia · 2024-02-11T04:31:46Z

Thanks for your response. I tested it on 2 different files to try to do the translation and both show the synchronization error always at a similar timestamp.
Here's a side-by-side comparison of the exact timestamp that ChatGPT "eats" a line:

I wanted to know if this error only happens to me or if it is happening to more people in other languages or even in Portuguese.
I say this because there are some factors that may be interfering with ChatGPT to do the translation correctly without eating lines, which is the fact that I take an original .ASS file and transform it into an .SRT file, and then I delete around 500 initial lines of the subtitle, as they are the opening of the episode (I'm translating One Pace into Portuguese, which is a summary project for the One Piece anime that only has English subtitles). Other than that, I don't change anything else in the file that appears to maintain a reliable structure of an original .SRT file.

Thank you in advance for your work, which is brilliant and very important for the subtitles niche in the world. If you can make this work completely, it will be a perfect job.

yamanbaris · 2024-02-21T22:49:16Z

I have the same issue. It always skips some lines so the order number and sentences are not correct. How can we fix it?

yazinsai · 2024-02-22T11:29:36Z

I'm working on a fix, will keep this post updated

yazinsai · 2024-02-22T12:33:03Z

Streamed my attempt here: https://www.youtube.com/live/ScnHkYKvtRE

Made some progress by converting the response to JSON, but it still occasionally skips/merges some lines! 🫤

yamanbaris · 2024-03-14T13:02:28Z

Is there any update on this issue?

Sptzzz · 2024-04-14T18:24:55Z

Having the same issue. Seems to not be fixed so it's not reliable for day to day use atm :(

einsteinx2 · 2024-12-07T17:07:03Z

I'm getting the same exact issue using the latest from main branch with gpt-4o model on the very first srt file I tried to translate (haven't tried other models or srt files yet). Somewhere in the middle it gets out of sync and makes the whole translated SRT useless unless I manually go through all of it and fix it which is not reasonable. Other than also eating all line breaks in the original subtitles which also requires manual fixing, it seems to work pretty well at the actual translation part.

Too bad it isn't really usable due to these issues without hours of cleanup work which defeats the purpose (also I really don't want to have to read all the lines of something I haven't watched yet since the whole point is to watch with my wife in English with Spanish subtitles).

This is sort of the thing with LLMs in general though isn't it... They're incredible when they work, but they only work like 80% on pretty much any task. So close yet so far...

yazinsai · 2024-12-22T18:09:12Z

Good news! I think we finally got this fixed with the latest (experimental) Gemini Flash 2.0 model. Merged and deploying now..

yazinsai self-assigned this Feb 10, 2024

yazinsai mentioned this issue Dec 22, 2024

Switch to Gemini Flash 2.0 #45

Merged

yazinsai closed this as completed Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persistent sync problem #21

persistent sync problem #21

GGMaia commented Feb 8, 2024

yazinsai commented Feb 10, 2024

GGMaia commented Feb 11, 2024 •

edited

Loading

yamanbaris commented Feb 21, 2024

yazinsai commented Feb 22, 2024

yazinsai commented Feb 22, 2024

yamanbaris commented Mar 14, 2024

Sptzzz commented Apr 14, 2024

einsteinx2 commented Dec 7, 2024

yazinsai commented Dec 22, 2024

persistent sync problem #21

persistent sync problem #21

Comments

GGMaia commented Feb 8, 2024

yazinsai commented Feb 10, 2024

GGMaia commented Feb 11, 2024 • edited Loading

yamanbaris commented Feb 21, 2024

yazinsai commented Feb 22, 2024

yazinsai commented Feb 22, 2024

yamanbaris commented Mar 14, 2024

Sptzzz commented Apr 14, 2024

einsteinx2 commented Dec 7, 2024

yazinsai commented Dec 22, 2024

GGMaia commented Feb 11, 2024 •

edited

Loading