Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request word timestamps and preserve the Whisper JSON transcript #1430

Closed
edsu opened this issue Dec 3, 2024 · 1 comment · Fixed by #1431
Closed

Request word timestamps and preserve the Whisper JSON transcript #1430

edsu opened this issue Dec 3, 2024 · 1 comment · Fixed by #1431

Comments

@edsu
Copy link
Contributor

edsu commented Dec 3, 2024

The Whisper output includes a JSON file which is used to generate the VTT, SRT and TXT transcripts. When word_timestamps are requested the JSON file includes the start/end times for each word which include a probability that indicates how confident Whisper is about the transcription. This information is lost when converting to VTT.

{
  "word": " see",
  "start": 16.12,
  "end": 16.3,
  "probability": 0.9966936111450195
}

The JSON file also includes other segment level stats about the temperature and likelihood of the audio containing speech:

{
  "temperature": 0.0,
  "avg_logprob": -0.2748254439410041,
  "compression_ratio": 1.5263157894736843,
  "no_speech_prob": 0.15187102556228638
}

I think it would be useful to always request word timestamps of the speech-to-text service, and to at least preserve the JSON transcript in addition to the JSON and Text outputs. The JSON file would be useful down the road for determining transcripts that are in need of review and correction.

One downside is that, unlike the text file, if a VTT file is corrected the JSON file cannot be regnerated to match it. So it could lead to confusion down the road?

Attached is a full JSON transcript if it's helpful to see: audio.json

@peetucket
Copy link
Member

You can verify these params are in the perserved json files with this object that has the preserved json: https://argo-qa.stanford.edu/view/druid:ys412hp1997

So once #1431 is merged, this will also be done.

@edsu edsu closed this as completed in #1431 Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants