WebVTT, Text and SRT support #6

nodomain · 2023-01-14T09:33:42Z

Are there plans yet to support other output formats like WebVTT, plain text or SRT?

Still digging through the solution and thinking about adding a converter but I am not sure about the correct approach.

eoinsha · 2023-01-16T09:54:37Z

Yes, this is definitely something we have discussed and would help improve things like YouTube subtitles.
AWS Transcribe supports generation of WebVTT and SRT subtitles already. Of course, we would prefer to use Whisper segments instead of Transcribe for the subtitle text. The challenge here is that the timing granularity of Whisper output is not as fine as with Transcribe, so you end up with longish segments for each timestamp. This may not be desirable for many subtitle uses.

It may be possible to improve the merging algorithm to match the Transcribe timings to the Whisper output, but that seems not so trivial.

Perhaps the simplest thing initially is to just generate VTT or SRT from the Whisper output. This could be done in a separate Lambda function using the merged transcript output (after the Process Transcripts state).
It could use the JSON object located at processedTranscriptKey as its input.

If you are interested in contributing this feature, that would be very welcome, @nodomain. We are happy to review and support of course.

nodomain · 2023-01-18T15:41:55Z

Looking into it already :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebVTT, Text and SRT support #6

WebVTT, Text and SRT support #6

nodomain commented Jan 14, 2023

eoinsha commented Jan 16, 2023

nodomain commented Jan 18, 2023

WebVTT, Text and SRT support #6

WebVTT, Text and SRT support #6

Comments

nodomain commented Jan 14, 2023

eoinsha commented Jan 16, 2023

nodomain commented Jan 18, 2023