Skip to content

Useses Openai whsiper and pyannot.audio to transcribe audio from interviews and focus groups with speaker identification.

License

Notifications You must be signed in to change notification settings

J0ekr/InterviewSpeakerTranscription

Repository files navigation

InterviewSpeakerTranscription

Useses Openai whsiper and pyannot.audio to transcribe audio from interviews and focus groups with speaker identification.

Setup

  • Create a dataFolder and outFolder folder in the directory next to the notebooks and put your audio files into the dataFolder (current default is m4a but should work with any file format - have to change things in the ffmpeg code line).

  • Run the notebook 00-TranscribeInterviewsSpeakerdetection.ipynb - pip install missing packages

  • Transcripts are then in a transcriptfolder in the outFolder

Citations

Cite pyannot audio if you use it for a paper:

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Year = {2020},
}
@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Year = {2021},
}

TODO

  • Make sure pyannot.audio pipeline uses GPU, as currently preprocessing takes way too long
  • Add pip installs for all packages to the notebook
  • Improve README

About

Useses Openai whsiper and pyannot.audio to transcribe audio from interviews and focus groups with speaker identification.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published