-
-
Notifications
You must be signed in to change notification settings - Fork 786
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
110 changed files
with
8,413 additions
and
14,658 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
|
||
# Frequently Asked Questions | ||
|
||
{%- for question in questions %} | ||
- [{{ question.title }}](#{{ question.slug }}) | ||
{%- endfor %} | ||
|
||
|
||
{%- for question in questions %} | ||
|
||
<a name="{{ question.slug }}"></a> | ||
## {{ question.title }} | ||
|
||
{{ question.body }} | ||
|
||
{%- endfor %} | ||
|
||
<hr> | ||
|
||
Generated by [FAQtory](https://github.com/willmcgugan/faqtory) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
Thank you for your issue. | ||
|
||
{%- if questions -%} | ||
{% if questions|length == 1 %} | ||
We found the following entry in the [FAQ]({{ faq_url }}) which you may find helpful: | ||
{%- else %} | ||
We found the following entries in the [FAQ]({{ faq_url }}) which you may find helpful: | ||
{%- endif %} | ||
|
||
{% for question in questions %} | ||
- [{{ question.title }}]({{ faq_url }}#{{ question.slug }}) | ||
{%- endfor %} | ||
|
||
{%- else -%} | ||
You might want to check the [FAQ]({{ faq_url }}) if you haven't done so already. | ||
{%- endif %} | ||
|
||
Feel free to close this issue if you found an answer in the FAQ. | ||
|
||
If your issue is a feature request, please read [this](https://xyproblem.info/) first and update your request accordingly, if needed. | ||
|
||
If your issue is a bug report, please provide a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) as a link to a self-contained [Google Colab](https://colab.research.google.com/) notebook containing everthing needed to reproduce the bug: | ||
- installation | ||
- data preparation | ||
- model download | ||
- etc. | ||
|
||
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). | ||
|
||
Companies relying on `pyannote.audio` in production may contact [me](https://herve.niderb.fr) via email regarding: | ||
* paid scientific consulting around speaker diarization and speech processing in general; | ||
* custom models and tailored features (via the local tech transfer office). | ||
|
||
> This is an automated reply, generated by [FAQtory](https://github.com/willmcgugan/faqtory) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
name: issues | ||
on: | ||
issues: | ||
types: [opened] | ||
jobs: | ||
add-comment: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
issues: write | ||
steps: | ||
- uses: actions/checkout@v3 | ||
with: | ||
ref: develop | ||
- name: Install FAQtory | ||
run: pip install FAQtory | ||
- name: Run Suggest | ||
env: | ||
TITLE: ${{ github.event.issue.title }} | ||
run: faqtory suggest "$TITLE" > suggest.md | ||
- name: Read suggest.md | ||
id: suggest | ||
uses: juliangruber/read-file-action@v1 | ||
with: | ||
path: ./suggest.md | ||
- name: Suggest FAQ | ||
uses: peter-evans/create-or-update-comment@a35cf36e5301d70b76f316e867e7788a55a31dae | ||
with: | ||
issue-number: ${{ github.event.issue.number }} | ||
body: ${{ steps.suggest.outputs.content }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
# Changelog | ||
|
||
## Version 3.0.0 (2023-09-26) | ||
|
||
### Features and improvements | ||
|
||
- feat(pipeline): send pipeline to device with `pipeline.to(device)` | ||
- feat(pipeline): add `return_embeddings` option to `SpeakerDiarization` pipeline | ||
- feat(pipeline): make `segmentation_batch_size` and `embedding_batch_size` mutable in `SpeakerDiarization` pipeline (they now default to `1`) | ||
- feat(pipeline): add progress hook to pipelines | ||
- feat(task): add [powerset](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html) support to `SpeakerDiarization` task | ||
- feat(task): add support for multi-task models | ||
- feat(task): add support for label scope in speaker diarization task | ||
- feat(task): add support for missing classes in multi-label segmentation task | ||
- feat(model): add segmentation model based on torchaudio self-supervised representation | ||
- feat(pipeline): check version compatibility at load time | ||
- improve(task): load metadata as tensors rather than pyannote.core instances | ||
- improve(task): improve error message on missing specifications | ||
|
||
### Breaking changes | ||
|
||
- BREAKING(task): rename `Segmentation` task to `SpeakerDiarization` | ||
- BREAKING(pipeline): pipeline defaults to CPU (use `pipeline.to(device)`) | ||
- BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (use `SpeakerDiarization` pipeline) | ||
- BREAKING(pipeline): remove `segmentation_duration` parameter from `SpeakerDiarization` pipeline (defaults to `duration` of segmentation model) | ||
- BREAKING(task): remove support for variable chunk duration for segmentation tasks | ||
- BREAKING(pipeline): remove support for `FINCHClustering` and `HiddenMarkovModelClustering` | ||
- BREAKING(setup): drop support for Python 3.7 | ||
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed) | ||
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default. | ||
You should update how `pyannote.audio.core.io.Audio` is instantiated: | ||
* replace `Audio()` by `Audio(mono="downmix")`; | ||
* replace `Audio(mono=True)` by `Audio(mono="downmix")`; | ||
* replace `Audio(mono=False)` by `Audio()`. | ||
- BREAKING(model): get rid of (flaky) `Model.introspection` | ||
If, for some weird reason, you wrote some custom code based on that, | ||
you should instead rely on `Model.example_output`. | ||
- BREAKING(interactive): remove support for Prodigy recipes | ||
|
||
|
||
### Fixes and improvements | ||
|
||
- fix(pipeline): fix reproducibility issue with Ampere CUDA devices | ||
- fix(pipeline): fix support for IOBase audio | ||
- fix(pipeline): fix corner case with no speaker | ||
- fix(train): prevent metadata preparation to happen twice | ||
- fix(task): fix support for "balance" option | ||
- improve(task): shorten and improve structure of Tensorboard tags | ||
|
||
### Dependencies update | ||
|
||
- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+ | ||
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+ | ||
- setup: switch to speechbrain 0.5.14+ | ||
|
||
## Version 2.1.1 (2022-10-27) | ||
|
||
- BREAKING(pipeline): rewrite speaker diarization pipeline | ||
- feat(pipeline): add option to optimize for DER variant | ||
- feat(clustering): add support for NeMo speaker embedding | ||
- feat(clustering): add FINCH clustering | ||
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering | ||
- feat(hub): add support for private/gated models | ||
- setup(hub): switch to latest hugginface_hub API | ||
- fix(pipeline): fix support for missing reference in Resegmentation pipeline | ||
- fix(clustering) fix corner case where HMM.fit finds too little states | ||
|
||
## Version 2.0.1 (2022-07-20) | ||
|
||
- BREAKING: complete rewrite | ||
- feat: much better performance | ||
- feat: Python-first API | ||
- feat: pretrained pipelines (and models) on Huggingface model hub | ||
- feat: multi-GPU training with pytorch-lightning | ||
- feat: data augmentation with torch-audiomentations | ||
- feat: Prodigy recipe for model-assisted audio annotation | ||
|
||
## Version 1.1.2 (2021-01-28) | ||
|
||
- fix: make sure master branch is used to load pretrained models (#599) | ||
|
||
## Version 1.1 (2020-11-08) | ||
|
||
- last release before complete rewriting | ||
|
||
## Version 1.0.1 (2018-07-19) | ||
|
||
- fix: fix regression in Precomputed.__call__ (#110, #105) | ||
|
||
## Version 1.0 (2018-07-03) | ||
|
||
- chore: switch from keras to pytorch (with tensorboard support) | ||
- improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators) | ||
- feat: add tunable speaker diarization pipeline (with its own tutorial) | ||
- chore: drop support for Python 2 (use Python 3.6 or later) | ||
|
||
## Version 0.3.1 (2017-07-06) | ||
|
||
- feat: add python 3 support | ||
- chore: rewrite neural speaker embedding using autograd | ||
- feat: add new embedding architectures | ||
- feat: add new embedding losses | ||
- chore: switch to Keras 2 | ||
- doc: add tutorial for (MFCC) feature extraction | ||
- doc: add tutorial for (LSTM-based) speech activity detection | ||
- doc: add tutorial for (LSTM-based) speaker change detection | ||
- doc: add tutorial for (TristouNet) neural speaker embedding | ||
|
||
## Version 0.2.1 (2017-03-28) | ||
|
||
- feat: add LSTM-based speech activity detection | ||
- feat: add LSTM-based speaker change detection | ||
- improve: refactor LSTM-based speaker embedding | ||
- feat: add librosa basic support | ||
- feat: add SMORMS3 optimizer | ||
|
||
## Version 0.1.4 (2016-09-26) | ||
|
||
- feat: add 'covariance_type' option to BIC segmentation | ||
|
||
## Version 0.1.3 (2016-09-23) | ||
|
||
- chore: rename sequence generator in preparation of the release of | ||
TristouNet reproducible research package. | ||
|
||
## Version 0.1.2 (2016-09-22) | ||
|
||
- first public version |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
|
||
# Frequently Asked Questions | ||
- [Can I apply pretrained pipelines on audio already loaded in memory?](#can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory) | ||
- [Can I use gated models (and pipelines) offline?](#can-i-use-gated-models-(and-pipelines)-offline) | ||
- [Does pyannote support streaming speaker diarization?](#does-pyannote-support-streaming-speaker-diarization) | ||
- [How can I improve performance?](#how-can-i-improve-performance) | ||
- [How does one spell and pronounce pyannote.audio?](#how-does-one-spell-and-pronounce-pyannoteaudio) | ||
|
||
<a name="can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory"></a> | ||
## Can I apply pretrained pipelines on audio already loaded in memory? | ||
|
||
Yes: read [this tutorial](tutorials/applying_a_pipeline.ipynb) until the end. | ||
|
||
<a name="can-i-use-gated-models-(and-pipelines)-offline"></a> | ||
## Can I use gated models (and pipelines) offline? | ||
|
||
**Short answer**: yes, see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines. | ||
|
||
**Long answer**: gating models and pipelines allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. So, please fill gating forms as precisely as possible. | ||
|
||
For instance, before gating `pyannote/speaker-diarization`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! Maintaining open source libraries is time consuming. | ||
|
||
That being said, this whole authentication process does not prevent you from using official `pyannote.audio` models offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production): see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines. | ||
|
||
<a name="does-pyannote-support-streaming-speaker-diarization"></a> | ||
## Does pyannote support streaming speaker diarization? | ||
|
||
**Short answer:** not out of the box, no. | ||
|
||
**Long answer:** [I](https://herve.niderb.fr) am looking for sponsors to add this feature. In the meantime, [`diart`](https://github.com/juanmc2005/StreamingSpeakerDiarization) is the closest you can get from a streaming `pyannote.audio`. You might also be interested in [this blog post](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html) about streaming voice activity detection based on `pyannote.audio`. | ||
|
||
<a name="how-can-i-improve-performance"></a> | ||
## How can I improve performance? | ||
|
||
**Long answer:** | ||
|
||
1. Manually annotate dozens of conversations as precisely as possible. | ||
2. Separate them into train (80%), development (10%) and test (10%) subsets. | ||
3. Setup the data for use with [`pyannote.database`](https://github.com/pyannote/pyannote-database#speaker-diarization). | ||
4. Follow [this recipe](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb). | ||
5. Enjoy. | ||
|
||
**Also:** [I am available](https://herve.niderb.fr) for contracting to help you with that. | ||
|
||
<a name="how-does-one-spell-and-pronounce-pyannoteaudio"></a> | ||
## How does one spell and pronounce pyannote.audio? | ||
|
||
📝 Written in lower case: `pyannote.audio` (or `pyannote` if you are lazy). Not `PyAnnote` nor `PyAnnotate` (sic). | ||
📢 Pronounced like the french verb `pianoter`. `pi` like in `pi`ano, not `py` like in `py`thon. | ||
🎹 `pianoter` means to play the piano (hence the logo 🤯). | ||
|
||
<hr> | ||
|
||
Generated by [FAQtory](https://github.com/willmcgugan/faqtory) |
Oops, something went wrong.