My project to separate audio with the noise using k-means clustering based on the idea from the paper L. Marchegiani and I. Posner, "Leveraging the urban soundscape: Auditory perception for smart vehicles," 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 6547-6554, doi: 10.1109/ICRA.2017.7989774 [1].
- numpy
- cv2
- IPython
- matplotlib
- skimage
- sklearn
- pandas
- argparse
- librosa
- scipy
The system consists of two main scripts:
spec.py
- This script create a grayscale mel-spectrogram images with bandpassfilter from an audio using the datalist of a csv that you already created.
k-means.py
- This script cluster the grayscale mel-spectrogram images into several cluster then make a binary mask based on the threshold that you have decided.
- Run
spec.py
first to create grayscale mel-spectrogram images, then runk-means.py
to create the binary mask. - Individual implementation using Jupyter Notebook is also provided on
note_masking.ipynb
andnote_filter.ipynb
.
You can check the spectrogram and k-means result in the image files in the directory Ground_truth/mel_spec and Ground_truth/mask respectively.