at16k

Pronounced as at sixteen k

What is at16k?

at16k is a Python library to perform automatic speech recognition or speech to text conversion. The goal of this project is to provide the community with a production quality speech-to-text library.

Installation

It is recommended that you install at16k in a virtual environment.

Prerequisites

Python >= 3.6
Tensorflow = 1.14
Scipy (for reading wav files)

Install via pip

$ pip install at16k

Install from source

Requires: poetry

$ git clone https://github.com/at16k/at16k.git
$ poetry env use python3.6
$ poetry install

Download models

Currently, two models are available for speech to text conversion.

en_8k (Trained on english audio recorded at 8 KHz)
en_16k (Trained on english audio recorded at 16 KHz)

To download all the models:

$ python -m at16k.download all

Alternatively, you can download only the model you need. For example:

$ python -m at16k.download en_8k
$ python -m at16k.download en_16k

By default, the models will be downloaded and stored at <HOME_DIR>/.at16k. To override the default, set the environment variable AT16K_RESOURCES_DIR. For example:

$ export AT16K_RESOURCES_DIR=/path/to/my/directory

You will need to reuse this environment variable while using the API via command-line, library or REST API.

Preprocessing audio files

at16k accepts wav files with the following spces:

Channels: 1
Bits per sample: 16
Sample rate: 8000 (en_8k) or 16000 (en_16k)

Use ffmpeg to convert your audio/video files to an acceptable format. For example,

# For 8 KHz
$ ffmpeg -i <input_file> -ar 8000 -ac 1 -ab 16 <output_file>

# For 16 KHz
$ ffmpeg -i <input_file> -ar 16000 -ac 1 -ab 16 <output_file>

Usage

There are three ways to invoke at16k speech-to-text converter.

Command line

at16k-convert -i <input_wav_file> -m <model_name>

Alternatively,

python -m at16k.bin.speech_to_text -i <input_wav_file> -m <model_name>

Library API

from at16k.api import SpeechToText

# One-time initialization
STT = SpeechToText('en_16k') # or en_8k

# Run STT on an audio file, returns a dict
print(STT('./samples/test_16k.wav'))

Check example.py for details on how to use the API.

REST API server

at16k-serve -p <port> -m <model_name>

Alternatively,

python -m at16k.bin.serve -i <input_wav_file> -m <model_name>

Lastly, via Docker -

$ docker pull at16k/at16k:0.1.3
$ docker run -it at16k/at16k:0.1.3 -p <port> -m <model_name>

Check API Docs for details on how to use the REST API.

Limitations

The max duration of your audio file should be less than 30 seconds when using en_8k, and less than 15 seconds when using en_16k. An error will not be thrown if the duration exceeds the limits, however, your transcript may contain errors and missing text.

License

This software is distributed under the MIT license.

Acknowledgements

We would like to thank Google TensorFlow Research Cloud (TFRC) program for providing access to cloud TPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
at16k		at16k
samples		samples
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
example.py		example.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

at16k

What is at16k?

Installation

Prerequisites

Install via pip

Install from source

Download models

Preprocessing audio files

Usage

Command line

Library API

REST API server

Limitations

License

Acknowledgements

About

Releases

Packages

Languages

License

SohanTirpude/at16k

Folders and files

Latest commit

History

Repository files navigation

at16k

What is at16k?

Installation

Prerequisites

Install via pip

Install from source

Download models

Preprocessing audio files

Usage

Command line

Library API

REST API server

Limitations

License

Acknowledgements

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages