GitHub - joelnathanbradley/dnn-tidigits-isolated-digits-classification: This project trains a deep neural network to predict which digits are being spoken from the TIDIGITS isolated digits dataset.

joelnathanbradley / dnn-tidigits-isolated-digits-classification Public

Notifications You must be signed in to change notification settings
Fork 1
Star 2

This project trains a deep neural network to predict which digits are being spoken from the TIDIGITS isolated digits dataset.

2 stars 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
lib		lib
README.txt		README.txt

Repository files navigation

This project was done while taking CS682 - Speech Processing during Fall 2020 at San Diego State University.

This project uses tensorflow, keras, numpy, scipy, librosa, soundfile, and sklearn. It takes isolated digits wav files from the TIDIGITS dataset, and trains a deep neural network to predict which digit is being spoken. Initially, the project does various data augmentation methods on the training set to generate more data for training. Afterwards, the code trains a deep neural network to predict which digit is being spoken in each wav file. It does this by breaking each wav file down into audio frames, computing a fourier transform, using a two mixture gaussian to predict which audio frames contain speech vs which don't. It then implements a batch generator to select the audio frames that contain speech, which is in turn used in order to train a deep neural network. This deep neural network uses a categorical cross entropy loss, adam optimizer, and softmax activation function for the output layer. 5-fold cross validation is used while training.

The entry point for this project is driver.py. The dataset base directory is hard coded as “../tidigits-isolated-digits-wav/wav/“, so if the TIDIGITS dataset used is stored in a different directory, make sure to set it appropriately in the variable dataset_basedir in driver.py. This project will create a directory under “../lib/output/“ to store different results. It will also create a directory within the TIDIGITS dataset, which will be “../tidigits-isolated-digits-wav/wav/train/augmented_data/“. It will further put subdirectories within the augmented_data folder based on the augmentation type. Make sure digit sequences are not within TIDIGITS subset, since this project only implemented on isolated digits.