This repository implements a distributed task-aware source coding framework focused on encoding and decoding correlated audio signals using perceptual loss. The main approach involves encoding audio files as spectrograms, using an autoencoder model for compact representation, and applying audio denoising for improved reconstruction.
This project leverages task-aware source coding techniques to efficiently encode audio signals with high correlation. The process includes converting audio into spectrograms, applying perceptual loss for enhanced quality, and using an autoencoder to achieve low-dimensional representations of audio data, allowing for more efficient storage and transmission. Additionally, audio denoising is performed post-decoding to enhance clarity and quality.
.
├── data/ # Folder for input audio files
├── data_loaders/ # Dataloading pipeline
├── dtac/ # Core distributed task-aware coding functions
├── summary/ # Summary files and experiment logs
├── audio_DAE.py # Script for audio denoising autoencoder
├── spectrogram.ipynb # Notebook for spectrogram conversion
├── testing_script.py # Script for encoder-decoder testing
├── testing_scripts.ipynb # Notebook for model testing and evaluation
├── train_de_noising_audio.py # Training script for denoising audio
├── train_de_noising_images.py # Training script for denoising images
├── requirements.txt # Project dependencies
├── README.md # This readme file
└── LICENSE # License information
- Spectrogram Generation: The
spectrogram.ipynb
notebook converts audio files to spectrograms for feature extraction. - Denoising Autoencoder: The
audio_DAE.py
script and thetrain_de_noising_audio.py
file work together to denoise audio, minimizing noise while retaining essential information.
The core coding pipeline (dtac
) encodes correlated audio signals and leverages perceptual loss to ensure the reconstructed audio maintains high intelligibility. The process:
- Encodes audio to spectrograms, with perceptual loss guiding model focus on important features.
- Enhances decoded audio using speech enhancement techniques.
Clone this repository and install dependencies:
git clone https://github.com/ahmd-mohsin/distributed-task-aware-source-coding.git
cd distributed-task-aware-source-coding
pip install -r requirements.txt
Convert audio files into spectrograms for encoding:
python spectrogram.ipynb --input data/ --output data/spectrograms
To train the denoising autoencoder, use:
python train_de_noising_audio.py --data data/spectrograms --output models/
Run the testing script to evaluate the autoencoder’s performance:
python testing_script.py --input data/spectrograms --output data/enhanced_audio
Generated audio and visual results are saved in the data/
and summary/
folders. Key files include:
- Reconstructed Audio:
reconstructed_audio.wav
- Visual Representations: Images such as
original_clean_image.png
,transformed_noisy_image_1.png
Contributions to improve the functionality and efficiency of this project are welcome. Please submit issues or pull requests with improvements.
pip install --upgrade setuptools
pip install --use-pep517 pesq
This project is licensed under the MIT License.