Request word timestamps and preserve the Whisper JSON transcript #1430

edsu · 2024-12-03T20:47:16Z

The Whisper output includes a JSON file which is used to generate the VTT, SRT and TXT transcripts. When word_timestamps are requested the JSON file includes the start/end times for each word which include a probability that indicates how confident Whisper is about the transcription. This information is lost when converting to VTT.

{
  "word": " see",
  "start": 16.12,
  "end": 16.3,
  "probability": 0.9966936111450195
}

The JSON file also includes other segment level stats about the temperature and likelihood of the audio containing speech:

{
  "temperature": 0.0,
  "avg_logprob": -0.2748254439410041,
  "compression_ratio": 1.5263157894736843,
  "no_speech_prob": 0.15187102556228638
}

I think it would be useful to always request word timestamps of the speech-to-text service, and to at least preserve the JSON transcript in addition to the JSON and Text outputs. The JSON file would be useful down the road for determining transcripts that are in need of review and correction.

One downside is that, unlike the text file, if a VTT file is corrected the JSON file cannot be regnerated to match it. So it could lead to confusion down the road?

Attached is a full JSON transcript if it's helpful to see: audio.json

The text was updated successfully, but these errors were encountered:

peetucket · 2024-12-09T17:56:49Z

You can verify these params are in the perserved json files with this object that has the preserved json: https://argo-qa.stanford.edu/view/druid:ys412hp1997

So once #1431 is merged, this will also be done.

edsu added the enhancement label Dec 3, 2024

jmartin-sul mentioned this issue Dec 9, 2024

preserve json output file from whisper #1431

Merged

edsu closed this as completed in #1431 Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request word timestamps and preserve the Whisper JSON transcript #1430

Request word timestamps and preserve the Whisper JSON transcript #1430

edsu commented Dec 3, 2024 •

edited

Loading

peetucket commented Dec 9, 2024

Request word timestamps and preserve the Whisper JSON transcript #1430

Request word timestamps and preserve the Whisper JSON transcript #1430

Comments

edsu commented Dec 3, 2024 • edited Loading

peetucket commented Dec 9, 2024

edsu commented Dec 3, 2024 •

edited

Loading