- 📚 Table of Contents
- 📍Overview
- ⚙️ Project Structure
- 💻 Modules
- 🚀 Getting Started
- 🤝 Contributing
- License
- Acknowledgments
The Cricket Classification GitHub project is an audio classification system that utilizes deep learning techniques to identify and categorize cricket species based on their sound recordings. The project leverages the PyTorch Lightning framework and the ASTForAudioClassification model from Hugging Face's Transformers library to build and train the classifier. The code includes data preprocessing, model training, and evaluation, providing a complete end-to-end solution for cricket sound classification tasks.
Experiment | Test Accuracy |
---|---|
5 genus classification | 97.00% |
8 genus classification | 94.40% |
10 genus classification | 89.51% |
These results are obtained on test data using an 80:20 train:test split. The train and test waveforms are split into 10-second segments with a 5-second overlap.
.
├── config.json
├── data
│ ├── final_features
│ ├── raw_all_data
│ └── vad_processed
├── dataset.py
├── feature_extractor.py
├── helpers
│ ├── data.txt
│ ├── make_data.py
│ └── make_data_dir.sh
├── main.py
├── preprocess.py
├── readme.md
├── requirements.txt
├── run_pipeline.sh
├── utils.py
└── val.py
File | Summary |
---|---|
run_pipeline.sh | Runs complete pipeline. (preprocess, feature extraction and trains the model) |
preprocess.py | This script processes a set of audio files for machine learning purposes, using the Silero Voice Activity Detector (VAD) model to extract relevant speech segments. |
dataset.py | This script defines a CustomDataset class that inherits from PyTorch's Dataset class, tailored for processing audio data related to cricket sounds. |
utils.py | This script demonstrates how to remove human voice from an audio file using the Silero Voice Activity Detector (VAD) model. |
feature_extractor.py | This script extracts features from audio samples using a pre-trained feature extractor from the transformers library. The process_samples_in_batches function processes audio samples in batches, applying the feature extractor to each sample and storing the extracted features along with the sample's label. |
main.py | This script trains a cricket audio classifier using a pre-trained ASTForAudioClassification model from the transformers library. |
- Clone the readme-ai repository:
git clone https://github.com/pvbhanuteja/cricket-classification
- Change to the project directory:
cd cricket-classification
- Install the dependencies:
pip install -r requirements.txt
# Update config.json with correct paths then run shell script
sh run_pipeline.sh
Check out CONTRIBUTING.md for best practices and instructions on how to contribute to this project.
This project is licensed under the MIT
License.
- Professor Dr. Yoonsuck Choe.
- This work was supported in part by the Texas Virtual Data Library (ViDaL) funded by the Texas A&M University Research Development Fund.