Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Need Help] segment syllables (mandarin pinyin) for syllable-level voice recognition or syllable-level VAD #9

Open
diyism opened this issue Nov 3, 2024 · 2 comments

Comments

@diyism
Copy link

diyism commented Nov 3, 2024

I'm trying to use your segmentation-3.0.onnx for syllable segmentaion(mandarin pinyin),
for sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01_test_wavs_4.wav,
it can correctly segment the first 7 syllables, but the last 5 syllables are not so accurate,
could you help me to improve it?

$ git clone https://github.com/diyism/pyannote_segment_syllables
$ cd pyannote_segment_syllables/
$ python main.py sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01_test_wavs_4.wav
Found 12 syllables:
0.560s - 0.742s
0.742s - 1.066s
1.066s - 1.298s
1.645s - 1.920s
2.035s - 2.203s
2.203s - 2.470s
2.555s - 2.725s
2.725s - 2.960s
3.150s - 3.250s
3.250s - 3.475s
3.550s - 3.760s
3.760s - 3.975s
Saved syllable 001: 0.560s - 0.742s (duration: 0.182s)
Saved syllable 002: 0.742s - 1.066s (duration: 0.324s)
Saved syllable 003: 1.066s - 1.298s (duration: 0.232s)
Saved syllable 004: 1.645s - 1.920s (duration: 0.275s)
Saved syllable 005: 2.035s - 2.203s (duration: 0.167s)
Saved syllable 006: 2.203s - 2.470s (duration: 0.267s)
Saved syllable 007: 2.555s - 2.725s (duration: 0.170s)
Saved syllable 008: 2.725s - 2.960s (duration: 0.235s)
Saved syllable 009: 3.150s - 3.250s (duration: 0.100s)
Saved syllable 010: 3.250s - 3.475s (duration: 0.225s)
Saved syllable 011: 3.550s - 3.760s (duration: 0.210s)
Saved syllable 012: 3.760s - 3.975s (duration: 0.215s)

$ aplay syllables/001.wav
$ aplay syllables/002.wav
$ aplay syllables/003.wav


https://github.com/diyism/pyannote_segment_syllables

ref: k2-fsa/sherpa-onnx#920

I guess that since the segmentation-3.0.onnx can segment syllables(mandarin pinyin), maybe a very small model (even a simple SVM, support vector machine) can recognize all the 1300 mono-syllable pinyins after segmentation-3.0.onnx preprocessing. While the segmentation-3.0.onnx is only 5.8MB, amazing small!

@diyism diyism changed the title [Need Help] segment syllables (mandarin pinyin) for syllable-level voice recognition [Need Help] segment syllables (mandarin pinyin) for syllable-level voice recognition or syllable-level VAD Nov 3, 2024
@diyism
Copy link
Author

diyism commented Nov 15, 2024

Any hints to improve it?

@pengzhendong
Copy link
Owner

Sorry. I haven't try pyannote-segmentation with pinyin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants