Regarding ASR testing #40

Simplesss · 2024-08-28T02:44:38Z

Hello, thank you very much for your work. I would like to reproduce the ASR performance of the AnyGPT Base model on the Librispeech test clean. I noticed that your paper stated a WER of 8.5, but my test result was 14.5 (using the command format speech | text | {speech file path}). Therefore, I am speculating whether this result is caused by randomly selecting a prompt for ASR during each inference in the ASR task? If possible, could you share the relevant code for calculating WER (I used 7 Composers from jiwer for calculation), as well as the text result obtained from ASR of the model. Looking forward to your reply.

JunZhan2000 · 2024-09-30T15:13:08Z

Hello, I think it's probably not an issue with the prompt, each prompt has been seen many times during training.
I would like to confirm two things: First, are you using beam search as your decoding strategy? This strategy generally produces the best results. Second, it's necessary to perform some post-processing on the transcription results to standardize them, because the output format of the LLM is very different from the ground truth, including punctuation and words like "you're" which shoud be "you are" in the groundtruth.
I also use jiwer for caculating wer.
Regarding the test code, unfortunately, it was lost during an environment migration, but I believe if you use GPT to write some standardization code, you should be able to achieve the results mentioned in the paper.(I didn't handle all the standardization cases)

Changhao-Xiang · 2025-01-04T14:48:25Z

Hello, thank you very much for your work. I would like to reproduce the ASR performance of the AnyGPT Base model on the Librispeech test clean. I noticed that your paper stated a WER of 8.5, but my test result was 14.5 (using the command format speech | text | {speech file path}). Therefore, I am speculating whether this result is caused by randomly selecting a prompt for ASR during each inference in the ASR task? If possible, could you share the relevant code for calculating WER (I used 7 Composers from jiwer for calculation), as well as the text result obtained from ASR of the model. Looking forward to your reply.

Hello, have you reproduced the results successfully? My reproduced performance on LibriSpeech test-clean is also a WER around 15 with the following configs:

{
    "do_sample": false,
    "max_new_tokens": 100,
    "min_new_tokens": 1,
    "repetition_penalty": 1.0,
    "num_beams": 5
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding ASR testing #40

Regarding ASR testing #40

Simplesss commented Aug 28, 2024

JunZhan2000 commented Sep 30, 2024 •

edited

Loading

Changhao-Xiang commented Jan 4, 2025

Regarding ASR testing #40

Regarding ASR testing #40

Comments

Simplesss commented Aug 28, 2024

JunZhan2000 commented Sep 30, 2024 • edited Loading

Changhao-Xiang commented Jan 4, 2025

JunZhan2000 commented Sep 30, 2024 •

edited

Loading