This is a Swift port of the featurization portion of FAIR's wav2letter++, including implementations & tests for PowerSpectrum, Mfsc & Mfcc. These functions are part of a larger system described in their 2018 paper.
I could not find a good spectrogram implementation in Swift, so I decided to port the /feature section of W2l. This will likely never be as fast as the C++ version, but I'm hoping to get as close as I can to performance parity.
This relies on BaseMath and SwiftyMKL for vector math. Adding the following flags to your SwiftPM command will yield the best performance. (See BaseMath documenation for details).
-Xswiftc -Ounchecked -Xcc -ffast-math -Xcc -O2 -Xcc -march=native
You will also need to have fftw, libsndfile and MKL installed and visible to the compiler & linker. The SwiftyMKL Makefile has a target that will download and uzip the appropriate Intel libraries for convenience.
Mfsc
and Mfcc
support Double and Float. For example:
let input = try! loadSound("/any/file/name.wav", as: Float.self)
let mfsc = Mfsc<Float>()
mfsc.apply(on: input)
// or
let input = try! loadSound("/any/file/name.wav", as: Double.self)
let mfcc Mfcc<Double>()
mfcc.apply(on: input)
To run the benchmark for MFCC:
$ swift run -Xswiftc -Ounchecked -Xcc -ffast-math -Xcc -O3 -Xcc -march=native -c release