HW3P2- Automatic Speech Recognition

Speech recognition by predicting the phonemes in a recording. Implemented RNNs and the dynamic programming algorithm, Connectionist Temporal Classification, to generate such labels.

Dependencies

Make sure you have the following dependencies installed:

Python 3.6+
PyTorch 2.0
Numpy
Matplotlib
Wandb
DataLoader, TensorDataset

Running the Code

To execute the code, run all the cells in the notebook.

Ablation Strategies

Different architectures were considered to achieve a high cutoff:

Encoder Block

Simple Conv1D with BatchNorm and 2 pBLSTM layers

Decoder Block

Four linear layers with BatchNorm, Dropout, and GELU activation

Training Details

Epochs

Trained for 50 epochs in total. The performance significantly improved even after 50 epochs.

Hyperparameters

Learning Rate: 1e-4
Batch Size: 64
Criterion: CTC Loss
Optimizer: AdamW
Scheduler: ReduceLR on Plateau

Data Loading Scheme

PyTorch's DataLoader was used to load the data. The following transforms were applied:

TimeMasking with time_mask_param as 200
FrequencyMasking with freq_mask_param as 5

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Automatic_Speech_Recognition.ipynb		Automatic_Speech_Recognition.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HW3P2- Automatic Speech Recognition

Dependencies

Running the Code

Ablation Strategies

Encoder Block

Decoder Block

Training Details

Epochs

Hyperparameters

Data Loading Scheme

About

Releases

Packages

Languages

License

adrita78/Automatic-Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

HW3P2- Automatic Speech Recognition

Dependencies

Running the Code

Ablation Strategies

Encoder Block

Decoder Block

Training Details

Epochs

Hyperparameters

Data Loading Scheme

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages