Whisper Fine-Tuning Project

This project focuses on fine-tuning the Whisper model for automatic speech recognition (ASR). The goal is to enhance the performance of Whisper on a custom dataset by using transfer learning and optimizing model parameters. The project uses a Jupyter Notebook for fine-tuning, training, and evaluating the model, leveraging the Hugging Face transformers library.

Features

Fine-tuning Whisper for speech-to-text tasks on custom audio data.
Supports mixed precision training (fp16) for faster training on GPUs.
Integration with Weights & Biases for experiment tracking and monitoring.
Tokenization of both audio and text inputs for model training.
Evaluation during each epoch to track performance improvements.
Model checkpoint saving to allow resumption of training.

Requirements

Python 3.7 or higher
Hugging Face Transformers
PyTorch
Weights & Biases (for experiment tracking)
Additional libraries: datasets, tqdm, torchaudio, transformers

Installation

Step 1: Clone the repository

git clone https://github.com/AliiAhmadi/speech_to_text.git
cd speech_to_text/

Step 2: Setup Weights & Biases (optional)

For experiment tracking, create a W&B account and set up the API key:

wandb login

Usage

Step 1: Open the Jupyter Notebook

This project uses a Jupyter Notebook for fine-tuning the Whisper model. Open the notebook file whisper_finetuning.ipynb in your Jupyter environment.

Step 2: Prepare the Dataset

Ensure you have a dataset for fine-tuning. You can use any ASR dataset in the correct format (e.g., audio files with transcriptions). The dataset should have fields such as audio and transcript.

Step 3: Tokenize the Dataset

The notebook includes a custom tokenization function to preprocess the dataset:

def encode_audio(examples):
    audio_input = tokenizer(examples["audio"], return_tensors="pt", padding=True, truncation=True)
    return audio_input

train_dataset = train_dataset.map(encode_audio, batched=True)

Step 4: Fine-Tune the Model

Run the training cells in the Jupyter Notebook to start fine-tuning the Whisper model. Model checkpoints are saved during training.

Step 5: Evaluate the Model

Evaluation occurs at the end of each epoch, and model checkpoints are saved automatically within the notebook.

Step 6: Model Inference

After training, use the fine-tuned model to make predictions on new audio data, which is also covered in the notebook.

Project Structure

speech_to_text/
├── data/                    # Dataset (audio and transcriptions)
├── logs/                    # Logs for experiment tracking (via Weights & Biases)
├── model/                   # Fine-tuned model checkpoints
├── whisper_finetuning.ipynb  # Jupyter notebook for training and evaluation
├── README.md                # Project documentation
└── ...                      # Other helper files and scripts

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
speech_to_text.ipynb		speech_to_text.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Fine-Tuning Project

Features

Requirements

Installation

Step 1: Clone the repository

Step 2: Setup Weights & Biases (optional)

Usage

Step 1: Open the Jupyter Notebook

Step 2: Prepare the Dataset

Step 3: Tokenize the Dataset

Step 4: Fine-Tune the Model

Step 5: Evaluate the Model

Step 6: Model Inference

Project Structure

License

Acknowledgements

About

Releases

Packages

Languages

License

AliiAhmadi/persian_speech_to_text

Folders and files

Latest commit

History

Repository files navigation

Whisper Fine-Tuning Project

Features

Requirements

Installation

Step 1: Clone the repository

Step 2: Setup Weights & Biases (optional)

Usage

Step 1: Open the Jupyter Notebook

Step 2: Prepare the Dataset

Step 3: Tokenize the Dataset

Step 4: Fine-Tune the Model

Step 5: Evaluate the Model

Step 6: Model Inference

Project Structure

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages