Skip to content

Latest commit

 

History

History
31 lines (21 loc) · 1.72 KB

README.md

File metadata and controls

31 lines (21 loc) · 1.72 KB

AudioFeature

This is a Swift port of the featurization portion of FAIR's wav2letter++, including implementations & tests for PowerSpectrum, Mfsc & Mfcc. These functions are part of a larger system described in their 2018 paper.

Background

I could not find a good spectrogram implementation in Swift, so I decided to port the /feature section of W2l. This will likely never be as fast as the C++ version, but I'm hoping to get as close as I can to performance parity.

Usage/Notes

This relies on BaseMath and SwiftyMKL for vector math. Adding the following flags to your SwiftPM command will yield the best performance. (See BaseMath documenation for details).

-Xswiftc -Ounchecked -Xcc -ffast-math -Xcc -O2 -Xcc -march=native

You will also need to have fftw, libsndfile and MKL installed and visible to the compiler & linker. The SwiftyMKL Makefile has a target that will download and uzip the appropriate Intel libraries for convenience.

Mfsc and Mfcc support Double and Float. For example:

let input = try! loadSound("/any/file/name.wav", as: Float.self)
let mfsc = Mfsc<Float>()
mfsc.apply(on: input)

// or

let input = try! loadSound("/any/file/name.wav", as: Double.self)
let mfcc Mfcc<Double>()
mfcc.apply(on: input)

Benchmarks

To run the benchmark for MFCC:

$ swift run -Xswiftc -Ounchecked -Xcc -ffast-math -Xcc -O3 -Xcc -march=native -c release