You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Whisper output includes a JSON file which is used to generate the VTT, SRT and TXT transcripts. When word_timestamps are requested the JSON file includes the start/end times for each word which include a probability that indicates how confident Whisper is about the transcription. This information is lost when converting to VTT.
I think it would be useful to always request word timestamps of the speech-to-text service, and to at least preserve the JSON transcript in addition to the JSON and Text outputs. The JSON file would be useful down the road for determining transcripts that are in need of review and correction.
One downside is that, unlike the text file, if a VTT file is corrected the JSON file cannot be regnerated to match it. So it could lead to confusion down the road?
Attached is a full JSON transcript if it's helpful to see: audio.json
The text was updated successfully, but these errors were encountered:
The Whisper output includes a JSON file which is used to generate the VTT, SRT and TXT transcripts. When word_timestamps are requested the JSON file includes the start/end times for each word which include a probability that indicates how confident Whisper is about the transcription. This information is lost when converting to VTT.
The JSON file also includes other segment level stats about the
temperature
and likelihood of the audio containing speech:I think it would be useful to always request word timestamps of the speech-to-text service, and to at least preserve the JSON transcript in addition to the JSON and Text outputs. The JSON file would be useful down the road for determining transcripts that are in need of review and correction.
One downside is that, unlike the text file, if a VTT file is corrected the JSON file cannot be regnerated to match it. So it could lead to confusion down the road?
Attached is a full JSON transcript if it's helpful to see: audio.json
The text was updated successfully, but these errors were encountered: