Tools for aligning speech audio with a transcript using the Sphinx4 speech recognition engine.

Running make will:

Create a JSON transcript without speech timings
Convert the MP3 audio to WAV (PCM signed 16-bit little-endian, mono, 16kHz)
Align the WAV audio and the transcript, updating the transcript file
Convert the WAV audio back to MP3 suitable for web streaming
Prepare transcript for use in react-transcript-player

It relies on libav, LAME, and:

create-transcript-json.py

Creates an untimed transcript JSON file from a file with information on speakers and a file with one transcript sentence per line.

build/libs/alignment.jar

This jar can be built using Gradle, see build.gradle. It takes three arguments: the path to an audio file, the path to an alignment file, and the path to a transcript file. The audio file and the transcript file must exist. If the alignment file does not exist, it will be created, otherwise the existing alignments will be used. An aligned transcript will be printed to standard out. See the Makefile for an example of usage.

prepare-transcript.py

Prepares a transcript JSON file for use with react-transcript-player , adding a link to the audio file, titlecasing speakers' names, and fixing issues in speech timings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls