使用whisperx调用时的问题 #589

fangquinlan · 2024-07-23T03:22:47Z

使用

ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v3-zh --output_dir belle-whisper-large-v3-zh-ct2 \
    --copy_files tokenizer.json --quantization float16

转换模型后，使用whisperx 1.mp3 --model belle-whisper-large-v3-zh-ct2 --language zh --batch_size 12时提示报错如下：

root@autodl-container-6******dc2:~# whisperx 1.mp3 --model belle-whisper-large-v3-zh-ct2 --language zh --batch_size 12
/root/miniconda3/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.3.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint .cache/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.3.1+cu121. Bad things might happen unless you revert torch to 1.x.
>>Performing transcription...
Traceback (most recent call last):
  File "/root/miniconda3/bin/whisperx", line 8, in <module>
    sys.exit(cli())
  File "/root/miniconda3/lib/python3.10/site-packages/whisperx/transcribe.py", line 176, in cli
    result = model.transcribe(audio, batch_size=batch_size, chunk_size=chunk_size, print_progress=print_progress)
  File "/root/miniconda3/lib/python3.10/site-packages/whisperx/asr.py", line 218, in transcribe
    for idx, out in enumerate(self.__call__(data(audio, vad_segments), batch_size=batch_size, num_workers=num_workers)):
  File "/root/miniconda3/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/root/miniconda3/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/root/miniconda3/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1112, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/root/miniconda3/lib/python3.10/site-packages/whisperx/asr.py", line 152, in _forward
    outputs = self.model.generate_segment_batched(model_inputs['inputs'], self.tokenizer, self.options)
  File "/root/miniconda3/lib/python3.10/site-packages/whisperx/asr.py", line 47, in generate_segment_batched
    encoder_output = self.encode(features)
  File "/root/miniconda3/lib/python3.10/site-packages/whisperx/asr.py", line 86, in encode
    return self.model.encode(features, to_cpu=to_cpu)
ValueError: Invalid input features shape: expected an input with shape (11, 128, 3000), but got an input with shape (11, 80, 3000) instead

The text was updated successfully, but these errors were encountered:

shuaijiang · 2024-11-25T03:08:27Z

check if tokenizer.json comes from whisper-large-v2?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用whisperx调用时的问题 #589

使用whisperx调用时的问题 #589

fangquinlan commented Jul 23, 2024

shuaijiang commented Nov 25, 2024

使用whisperx调用时的问题 #589

使用whisperx调用时的问题 #589

Comments

fangquinlan commented Jul 23, 2024

shuaijiang commented Nov 25, 2024