Merge branch 'release/3.0.0'

pyannote · Sep 26, 2023 · 795b92a · 795b92a
2 parents 7ead17e + 9a5a902
commit 795b92a
Show file tree

Hide file tree

Showing 110 changed files with 8,413 additions and 14,658 deletions.
diff --git a/.faq/FAQ.md b/.faq/FAQ.md
@@ -0,0 +1,20 @@
+
+# Frequently Asked Questions
+
+{%- for question in questions %}
+- [{{ question.title }}](#{{ question.slug }})
+{%- endfor %}
+
+
+{%- for question in questions %}
+
+<a name="{{ question.slug }}"></a>
+## {{ question.title }}
+
+{{ question.body }}
+
+{%- endfor %}
+
+<hr>
+
+Generated by [FAQtory](https://github.com/willmcgugan/faqtory)
diff --git a/.faq/suggest.md b/.faq/suggest.md
@@ -0,0 +1,34 @@
+Thank you for your issue. 
+
+{%- if questions -%}
+{% if questions|length == 1 %}
+We found the following entry in the [FAQ]({{ faq_url }}) which you may find helpful:
+{%- else %}
+We found the following entries in the [FAQ]({{ faq_url }}) which you may find helpful:
+{%- endif %}
+
+{% for question in questions %}
+- [{{ question.title }}]({{ faq_url }}#{{ question.slug }})
+{%- endfor %}
+
+{%- else -%}
+You might want to check the [FAQ]({{ faq_url }}) if you haven't done so already.
+{%- endif %}
+
+Feel free to close this issue if you found an answer in the FAQ. 
+
+If your issue is a feature request, please read [this](https://xyproblem.info/) first and update your request accordingly, if needed.
+
+If your issue is a bug report, please provide a [minimum reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) as a link to a self-contained [Google Colab](https://colab.research.google.com/) notebook containing everthing needed to reproduce the bug: 
+  - installation
+  - data preparation
+  - model download
+  - etc.
+
+Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).
+
+Companies relying on `pyannote.audio` in production may contact [me](https://herve.niderb.fr) via email regarding:
+* paid scientific consulting around speaker diarization and speech processing in general;
+* custom models and tailored features (via the local tech transfer office).
+
+> This is an automated reply, generated by [FAQtory](https://github.com/willmcgugan/faqtory)
diff --git a/.github/stale.yml b/.github/stale.yml
@@ -1,7 +1,7 @@
 # Number of days of inactivity before an issue becomes stale
-daysUntilStale: 60
+daysUntilStale: 180
 # Number of days of inactivity before a stale issue is closed
-daysUntilClose: 7
+daysUntilClose: 30
 # Issues with these labels will never be considered stale
 exemptLabels:
   - pinned

diff --git a/.github/workflows/new_issue.yml b/.github/workflows/new_issue.yml
@@ -0,0 +1,29 @@
+name: issues
+on:
+  issues:
+    types: [opened]
+jobs:
+  add-comment:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          ref: develop
+      - name: Install FAQtory
+        run: pip install FAQtory
+      - name: Run Suggest
+        env: 
+           TITLE: ${{ github.event.issue.title }}
+        run: faqtory suggest "$TITLE" > suggest.md
+      - name: Read suggest.md
+        id: suggest
+        uses: juliangruber/read-file-action@v1
+        with:
+          path: ./suggest.md
+      - name: Suggest FAQ
+        uses: peter-evans/create-or-update-comment@a35cf36e5301d70b76f316e867e7788a55a31dae
+        with:
+          issue-number: ${{ github.event.issue.number }}
+          body: ${{ steps.suggest.outputs.content }}
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -2,9 +2,9 @@ name: Tests
 
 on:
   push:
-    branches: [ develop ]
+    branches: [develop]
   pull_request:
-    branches: [ develop ]
+    branches: [develop]
 
 jobs:
   build:
@@ -13,28 +13,21 @@ jobs:
     strategy:
       matrix:
         os: [ubuntu-latest]
-        python-version: [3.7, 3.8, 3.9]
+        python-version: [3.8, 3.9, "3.10"]
     steps:
-    - uses: actions/checkout@v2
-    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
-      with:
-        python-version: ${{ matrix.python-version }}
-    - name: Install libsndfile
-      if: matrix.os == 'ubuntu-latest'
-      run: |
-        sudo apt-get install libsndfile1
-    - name: Install pyannote.audio
-      run: |
+      - uses: actions/checkout@v2
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install libsndfile
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          sudo apt-get update
+          sudo apt-get install libsndfile1
+      - name: Install pyannote.audio
+        run: |
           pip install -e .[dev,testing]
-    - name: Test with pytest
-      run: |
-          export PYANNOTE_DATABASE_CONFIG=$GITHUB_WORKSPACE/tests/data/database.yml
-          pytest --cov-report=xml
-    - name: Upload coverage to Codecov
-      uses: codecov/codecov-action@v1
-      with:
-        file: ./coverage.xml
-        env_vars: PYTHON
-        name: codecov-pyannote-audio
-        fail_ci_if_error: false
+      - name: Test with pytest
+        run: |
+          pytest
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -20,7 +20,7 @@ repos:
         args: ["--profile", "black"]
 
     # Formatting, Whitespace, etc
-    - repo: git://github.com/pre-commit/pre-commit-hooks
+    - repo: https://github.com/pre-commit/pre-commit-hooks
       rev: v2.2.3
       hooks:
       - id: trailing-whitespace

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,128 @@
+# Changelog
+
+## Version 3.0.0 (2023-09-26)
+
+### Features and improvements
+
+  - feat(pipeline): send pipeline to device with `pipeline.to(device)`
+  - feat(pipeline): add `return_embeddings` option to `SpeakerDiarization` pipeline
+  - feat(pipeline): make `segmentation_batch_size` and `embedding_batch_size` mutable in `SpeakerDiarization` pipeline (they now default to `1`)
+  - feat(pipeline): add progress hook to pipelines
+  - feat(task): add [powerset](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html) support to `SpeakerDiarization` task
+  - feat(task): add support for multi-task models
+  - feat(task): add support for label scope in speaker diarization task
+  - feat(task): add support for missing classes in multi-label segmentation task
+  - feat(model): add segmentation model based on torchaudio self-supervised representation
+  - feat(pipeline): check version compatibility at load time
+  - improve(task): load metadata as tensors rather than pyannote.core instances
+  - improve(task): improve error message on missing specifications
+
+### Breaking changes
+
+  - BREAKING(task): rename `Segmentation` task to `SpeakerDiarization`
+  - BREAKING(pipeline): pipeline defaults to CPU (use `pipeline.to(device)`)
+  - BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (use `SpeakerDiarization` pipeline)
+  - BREAKING(pipeline): remove `segmentation_duration` parameter from `SpeakerDiarization` pipeline (defaults to `duration` of segmentation model)
+  - BREAKING(task): remove support for variable chunk duration for segmentation tasks
+  - BREAKING(pipeline): remove support for `FINCHClustering` and `HiddenMarkovModelClustering`
+  - BREAKING(setup): drop support for Python 3.7
+  - BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
+  - BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
+    You should update how `pyannote.audio.core.io.Audio` is instantiated:
+    * replace `Audio()` by `Audio(mono="downmix")`;
+    * replace `Audio(mono=True)` by `Audio(mono="downmix")`;
+    * replace `Audio(mono=False)` by `Audio()`.
+  - BREAKING(model): get rid of (flaky) `Model.introspection`
+    If, for some weird reason, you wrote some custom code based on that,
+    you should instead rely on `Model.example_output`.
+  - BREAKING(interactive): remove support for Prodigy recipes
+
+
+### Fixes and improvements
+
+  - fix(pipeline): fix reproducibility issue with Ampere CUDA devices
+  - fix(pipeline): fix support for IOBase audio
+  - fix(pipeline): fix corner case with no speaker
+  - fix(train): prevent metadata preparation to happen twice
+  - fix(task): fix support for "balance" option
+  - improve(task): shorten and improve structure of Tensorboard tags
+
+### Dependencies update
+
+  - setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
+  - setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
+  - setup: switch to speechbrain 0.5.14+
+
+## Version 2.1.1 (2022-10-27)
+
+  - BREAKING(pipeline): rewrite speaker diarization pipeline
+  - feat(pipeline): add option to optimize for DER variant
+  - feat(clustering): add support for NeMo speaker embedding
+  - feat(clustering): add FINCH clustering
+  - feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
+  - feat(hub): add support for private/gated models
+  - setup(hub): switch to latest hugginface_hub API
+  - fix(pipeline): fix support for missing reference in Resegmentation pipeline
+  - fix(clustering) fix corner case where HMM.fit finds too little states
+
+## Version 2.0.1 (2022-07-20)
+
+  - BREAKING: complete rewrite
+  - feat: much better performance
+  - feat: Python-first API
+  - feat: pretrained pipelines (and models) on Huggingface model hub
+  - feat: multi-GPU training with pytorch-lightning
+  - feat: data augmentation with torch-audiomentations
+  - feat: Prodigy recipe for model-assisted audio annotation
+
+## Version 1.1.2 (2021-01-28)
+
+  - fix: make sure master branch is used to load pretrained models (#599)
+
+## Version 1.1 (2020-11-08)
+
+  - last release before complete rewriting
+
+## Version 1.0.1 (2018-07-19)
+
+  - fix: fix regression in Precomputed.__call__ (#110, #105)
+
+## Version 1.0 (2018-07-03)
+
+  - chore: switch from keras to pytorch (with tensorboard support)
+  - improve: faster & better traning (`AutoLR`, advanced learning rate schedulers, improved batch generators)
+  - feat: add tunable speaker diarization pipeline (with its own tutorial)
+  - chore: drop support for Python 2 (use Python 3.6 or later)
+
+## Version 0.3.1 (2017-07-06)
+
+  - feat: add python 3 support
+  - chore: rewrite neural speaker embedding using autograd
+  - feat: add new embedding architectures
+  - feat: add new embedding losses
+  - chore: switch to Keras 2
+  - doc: add tutorial for (MFCC) feature extraction
+  - doc: add tutorial for (LSTM-based) speech activity detection
+  - doc: add tutorial for (LSTM-based) speaker change detection
+  - doc: add tutorial for (TristouNet) neural speaker embedding
+
+## Version 0.2.1 (2017-03-28)
+
+  - feat: add LSTM-based speech activity detection
+  - feat: add LSTM-based speaker change detection
+  - improve: refactor LSTM-based speaker embedding
+  - feat: add librosa basic support
+  - feat: add SMORMS3 optimizer
+
+## Version 0.1.4 (2016-09-26)
+
+  - feat: add 'covariance_type' option to BIC segmentation
+
+## Version 0.1.3 (2016-09-23)
+
+  - chore: rename sequence generator in preparation of the release of
+    TristouNet reproducible research package.
+
+## Version 0.1.2 (2016-09-22)
+
+  - first public version
diff --git a/FAQ.md b/FAQ.md
@@ -0,0 +1,54 @@
+
+# Frequently Asked Questions
+- [Can I apply pretrained pipelines on audio already loaded in memory?](#can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory)
+- [Can I use gated models (and pipelines) offline?](#can-i-use-gated-models-(and-pipelines)-offline)
+- [Does pyannote support streaming speaker diarization?](#does-pyannote-support-streaming-speaker-diarization)
+- [How can I improve performance?](#how-can-i-improve-performance)
+- [How does one spell and pronounce pyannote.audio?](#how-does-one-spell-and-pronounce-pyannoteaudio)
+
+<a name="can-i-apply-pretrained-pipelines-on-audio-already-loaded-in-memory"></a>
+## Can I apply pretrained pipelines on audio already loaded in memory?
+
+Yes: read [this tutorial](tutorials/applying_a_pipeline.ipynb) until the end.
+
+<a name="can-i-use-gated-models-(and-pipelines)-offline"></a>
+## Can I use gated models (and pipelines) offline?
+
+**Short answer**: yes, see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines.
+
+**Long answer**: gating models and pipelines allows [me](https://herve.niderb.fr) to know a bit more about `pyannote.audio` user base and eventually help me write grant proposals to make `pyannote.audio` even better. So, please fill gating forms as precisely as possible.
+
+For instance, before gating `pyannote/speaker-diarization`, I had no idea that so many people were relying on it in production. Hint: sponsors are more than welcome! Maintaining open source libraries is time consuming.
+
+That being said, this whole authentication process does not prevent you from using official `pyannote.audio` models offline (i.e. without going through the authentication process in every `docker run ...` or whatever you are using in production): see [this tutorial](tutorials/applying_a_model.ipynb) for models and [that one](tutorials/applying_a_pipeline.ipynb) for pipelines.
+
+<a name="does-pyannote-support-streaming-speaker-diarization"></a>
+## Does pyannote support streaming speaker diarization?
+
+**Short answer:** not out of the box, no.
+
+**Long answer:** [I](https://herve.niderb.fr) am looking for sponsors to add this feature. In the meantime, [`diart`](https://github.com/juanmc2005/StreamingSpeakerDiarization) is the closest you can get from a streaming `pyannote.audio`. You might also be interested in [this blog post](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html) about streaming voice activity detection based on `pyannote.audio`.
+
+<a name="how-can-i-improve-performance"></a>
+## How can I improve performance?
+
+**Long answer:**
+
+1. Manually annotate dozens of conversations as precisely as possible.
+2. Separate them into train (80%), development (10%) and test (10%) subsets.
+3. Setup the data for use with [`pyannote.database`](https://github.com/pyannote/pyannote-database#speaker-diarization).
+4. Follow [this recipe](https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/adapting_pretrained_pipeline.ipynb).
+5. Enjoy.
+
+**Also:** [I am available](https://herve.niderb.fr) for contracting to help you with that.
+
+<a name="how-does-one-spell-and-pronounce-pyannoteaudio"></a>
+## How does one spell and pronounce pyannote.audio?
+
+📝 Written in lower case: `pyannote.audio` (or `pyannote` if you are lazy). Not `PyAnnote` nor `PyAnnotate` (sic).
+📢 Pronounced like the french verb `pianoter`. `pi` like in `pi`ano, not `py` like in `py`thon.
+🎹 `pianoter` means to play the piano (hence the logo 🤯).
+
+<hr>
+
+Generated by [FAQtory](https://github.com/willmcgugan/faqtory)