-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AssertionError: CutSet has cuts with duplicated IDs. #1850
Comments
I have also tried it for another dataset. It was giving me the following error after running the CLI as mentioned in point 4.
|
Hi @mukherjeesougata, |
hi, since you are directly importing a kaldi-fmt data dir, you are suggested to use utils/fix_data_dir.sh (of sorts, i cannot recall the exact name of the script at the time) to remove entries with duplicated keys to begin with. best |
I have already used utils/fix_data_dir.sh script to sort the train, dev, and test folders which contained text, wav.scp and utt2spk text files to remove duplicates. In addition to this, I have used the following code to find duplicates ids from the Kui_cuts_train.jsonl, Kui_cuts_dev.jsonl and Kui_cuts_test.jsonl :-
The above code is giving the o/p |
I am trying to run Zipformer model using my custom dataset. For that the steps that I have followed are given below:-
I have prepared the data by running the command
lhotse kaldi import {train, dev, test}/ 16000 manifests/{train, dev, test}_manifest
.I have completed the fbank extraction stage (stage 3) of prepare.sh script. which generated the following files and folders which is shown in the figure below:-
After this I have prepared BPE based lang which generated the folder
lang_bpe_500
containingbpe.model
,tokens.txt
,transcript_word.txt
,unigram_500.model
,unigram_500.vocab
filesFinally I have run the CLI which is given below:-
./pruned_transducer_stateless7_streaming/train.py --world-size 2 --num-epochs 30 --start-epoch 1 --use-fp16 1 --exp-dir pruned_transducer_stateless7_streaming/exp --max-duration 200 --enable-musan False
I am getting the following error:-
The text was updated successfully, but these errors were encountered: