Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for NAM files #143

Closed
christoph-hart opened this issue Jul 10, 2024 · 39 comments
Closed

Add support for NAM files #143

christoph-hart opened this issue Jul 10, 2024 · 39 comments

Comments

@christoph-hart
Copy link

Hi Jatin,

how hard would it be to add support for parsing the NAM file format?

https://github.com/sdatkinson/NeuralAmpModelerCore

Just from a quick peek at both sources the required layers are almost there (except for the wavenet layer which seems like a high level abstraction of existing low level layers).

I would love to avoid adding too many neural network engines to my project so if you think it‘s doable I‘ll give it a shot.

@brummer10
Copy link

@christoph-hart
I've created a project were I implemented both engines. So users could load nam and json/aidax files without take care which one to load. Implementation of both engines is straight forward.
https://github.com/brummer10/Ratatouille.lv2

@christoph-hart
Copy link
Author

Thanks for the input and adding both engines is definitely an option but I would love to avoid adding the big fat Eigen library and RTNeural is already in there with what looks to me 95% of the required feature set.

@jatinchowdhury18
Copy link
Owner

Hi All!

I think it should be possible to construct a NAM-style model using RTNeural's layers. If I remember correctly NAM uses a "Temporal Convolutional Network", and I have implemented a couple of those in the past using RTNeural's layers, although there are sometimes variations between those types of networks. Here's an example of a "micro-TCN" implementation that we use as part of RTNeural's test suite. Probably the best route forward would be to use that implementation as a starting point, add whatever might be missing from the NAM model architecture, and probably adapt the mechanism for loading model weights to match whatever format NAM models use to store their weight. I'd be happy to help with this process as my time allows.

That said, I'm not sure it would make sense to add support for NAM models directly to RTNeural, since I think it falls a little bit outside the scope of what RTNeural does. I do have some future plans for a sort of "model library" which would have example implementations of several neural network architectures that are commonly used in real-time audio (and maybe other real-time domains as well), and I think having NAM models as part of the model library would be great. However, there's some other changes I want to make to RTNeural before starting on that, so it may be a while before I get there.

@olilarkin
Copy link

Also interested in this. sdatkinson/NeuralAmpModelerCore#49

@RustoMCSpit
Copy link

RustoMCSpit commented Jul 17, 2024

maybe relevant? Chowdhury-DSP/BYOD#363

@brummer10
Copy link

Thanks for the input and adding both engines is definitely an option but I would love to avoid adding the big fat Eigen library and RTNeural is already in there with what looks to me 95% of the required feature set.

Just out of curiosity I checked if we could build NeuralAmpModelerCore against the Eigen library comes with RTNeural, and yes, it works flawless. We could even share the jsaon header.

@MaxPayne86
Copy link
Contributor

My 2 cents: we want to avoid boilerplate code on engine side or even inside plugin, i.e. we don't really want RTNeural to have methods to parse .nam (or .aidax or whatever) model files (torch weights), we want to adjust the model file that is coming out of a training repo and port to the format used by RTNeural.
For example: Automated-GuitarAmpModelling repo uses torch and creates a model file that is not directly supported by RTNeural, mostly because RTNeural is using keras implementation as a reference. I've created this script that simply adapt the model file from Automated-GuitarAmpModelling into what's expected on RTNeural side. The same could be done for .nam.

@baptistejamin
Copy link

Since the two systems use mostly the same model types, I think NAM would have a very big incentive implementing RTNeural. It would remove 90% of the complexity from: https://github.com/sdatkinson/NeuralAmpModelerCore/tree/main/NAM

The only blocker seems WaveNet is not implemented (yet) in RTNeural

@christoph-hart
Copy link
Author

Coming back to this as the request from my users keeps popping up.

The only blocker seems WaveNet is not implemented (yet) in RTNeural

I've tried to naively port over the wavenet.h and wavenet.cpp files from here to use the Layer<T> interface class and it looks like a very simple copy & paste job for someone who is proficient with either one of the libraries, however it's outside my comfort zone to offer a serious contribution here. I think we can even stay at a high level that doesn't even require branching into the different backends (Eigen, XSIMD, etc) since it just combines the existing layers.

I've created this script that simply adapt the model file from Automated-GuitarAmpModelling into what's expected on RTNeural side. The same could be done for .nam.

Yes, that could also work, no ambitions here to bloat up the RTNeural project from my side. Whether it's a python script or a conditionally compileable C++ class shouldn't make a big difference in the end (I would prefer the latter, but that's something I can add to my project then).

@jatinchowdhury18
Copy link
Owner

Hi All,

Sorry for the long delay, I've been a bit busy the past few months. I had a look at implementing one of the NAM architectures in RTNeural a little while ago, and was able to make some progress, but haven't gotten it fully working just yet.

The main issue with re-exporting the weights of a NAM model into RTNeural's JSON format is that RTNeural's JSON format currently only supports "sequential" models, and I don't believe the WaveNet architecture is sequential.

Hopefully I can finish up the WaveNet architecture sometime in the next week or two. That said, in order to fully utilize RTNeural's performance capabilities, it would be preferable to be able to know the network architecture at compile-time, which could pose a problem if the intent is to create a "generic" NAM model loader. I have some ideas about this, but I'll worry about that after the basic implementation work is done.

Thanks,
Jatin

@mikeoliphant
Copy link
Contributor

That said, in order to fully utilize RTNeural's performance capabilities, it would be preferable to be able to know the network architecture at compile-time, which could pose a problem if the intent is to create a "generic" NAM model loader.

The vast majority of NAM models use the "standard" architecture. There are also three other, less commonly used, official WaveNet presets. Very, very few models will use any other architecture.

If both compile-time and dynamic architectures are supported, then compile-time architectures can be provided for the official presets, along with a dynamic fallback for less common architectures.

@baptistejamin
Copy link

baptistejamin commented Oct 3, 2024

Aida-X could be the right approach; there is no need for thousands of model types so that architectures can be hard coded. LSTM is right for 99.5% of use cases.

The rationale is most users download their models from Tonehunt.

A script could remap Wavenet NAM files and retrain to LSTMs; if the system is optimized on a GPU, converting everything on ToneHunt in a reasonable amount of time could be possible.

This way, RTNeural does not need to change anything, nor NAM. We need a fat GPU for a month :)

I just made a Proof of concept: https://github.com/baptistejamin/nam-to-rtneural, I haven't tested the result yet, but that should be compatible

@rossbalch
Copy link

Would that script preserve the expected sample rate of the model? NAM supports whatever sample rate your input and output pair is, whereas the Aida-X script forces 48,000hz. Also does this shrink the model? In my testing NAM is more accurate than the Aida-X models.

@baptistejamin
Copy link

I tested with two different models (clean and high gain), and it sounded exactly the same.
RTNeural is so optimized that the LSTM model size could be increased a lot, but I don't think it's required to do this.
I own a Kemper, and to be an LSTM of size 16 does a way better job.

Yes, 48kHz sampling rate is required; however, the way NAM handles this is by resampling. It's not actually the model that does this, but the Plugin host handling this.

The script does not actually shrink the model. It just generates sound from the NAM core using a NAM model, and then retrains to an LSTM compatible with RTNeural (Aida-x implementation).

So at the end, you have an LSTM model that is compatible with NAM + with RTNeural.

RTNeural Models are running 10x faster, enabling running models on lower-end CPUs / or hardware such as Raspberry Pi, with super low latency, and with a lot of extra CPU cycles available to run effects, cab simulation, etc.

Most nam models are using Wavenets lately, while RTNeural models will be LSTMs, this is the key difference. Originally NAM was only LSTM based, and people were pretty happy with it ;)

@rossbalch
Copy link

Ok that's good to know. For a lot of my projects I'm cool with LSTM, however I often want to stack models at various points of the DSP chain and "oversampling" by feeding high sample rate models reduces aliasing, which isn't an issue with a single model, but does add up when you start sequencing them.

Just a clarification with how NAM works currently. If you feed the trainer with say 96khz files, it trains the model based on this sample rate, notes the sample rate in the meta data, then NAM will resample the incoming audio to that sample rate. The model itself requires the audio to be at the sample rate of the input / output files, whether that's 48, 96, 192 etc, because the weights are all based on that sample rates. Higher sample rate models show reduced aliasing which as I mentioned above does have reasons to exist.

So your script actually generates a model of a model? That would introduce even more loss wouldn't it? As now you are a further step away from the original capture files. Isn't this similar to converting an AAC into an Mp3?

Is there any reason we can't train LSTM models at 96 or 192khz and have RTNeural interpret those models? Having tried both NAM and AIDA-X I've got to say, training NAM models is a lot easier, with many more options. This is what makes me think it is worth implementing NAM models rather than just converting them.

@rossbalch
Copy link

Anyway, I don't mean to sound ungrateful, it's actually nice to have a pretty hands off way to re-train NAMs to Aida-X.

@baptistejamin
Copy link

We could retrain to 96khz if it's something you are willing to explore. We can do this.

Implementing NAM models in RTNeural, and RTNeural in NAM seems out of scope. Unless RTNeural implements wavelets. RTNeural is just a core system for Machine learning, while NAM is dedicated to guitar amps and re-implements models in pure CPP, but in a less optimized way.

The more optimized the inference engine is, the more powerful models we can get, allowing more layers, etc.

For instance, with RTNeural, it could be possible (and Keith Bloemer already did) also to capture knobs effects. For instance to simulate a Fuzz pedal with Stab, Gain, etc. It's 's something that is not possible with NAM, and that will likely never be.

@rossbalch
Copy link

Is there a better place to continue this discussion that isn't clogging up the issue token?

@jatinchowdhury18
Copy link
Owner

Is there a better place to continue this discussion that isn't clogging up the issue token?

I'd be happy to open a channel/thread on the RTNeural Discord if people want to chat more on there?

Also, I know I've been saying this for a while, but I think I should finally have time this weekend to finish my NAM-style WaveNet implementation in RTNeural... we'll see how it goes!

@rossbalch
Copy link

Sounds good, I'm already on the Discord.

@mikeoliphant
Copy link
Contributor

Also, I know I've been saying this for a while, but I think I should finally have time this weekend to finish my NAM-style WaveNet implementation in RTNeural... we'll see how it goes!

👍 I'm happy to test integrating it as soon as you've got something functional.

@MaxPayne86
Copy link
Contributor

@baptistejamin @rossbalch for AIDA-X related topics in this thread you may be interested in moving here

https://github.com/AidaDSP/Automated-GuitarAmpModelling/issues/9

To all: once RTNeural engine has the support for Wavenet (and elaborations of it), then I can expand my current script to generate a json model for RTNeural. I still think it's the best way to support it, but as @jatinchowdhury18 pointed out, support for this arch needs to be implemented in the engine (I thought this was immediately possible since conv1d layers are already present in RTNeural, I was wrong). So the scenario could be having a python script with only torch as major dep, that does:

  • Automated-GuitarAmpModelling to RTNeural ✅
  • Automated-GuitarAmpModelling (Aida DSP fork) to RTNeural ✅ all models variants supported in AIDA-X ✅
  • Automated-GuitarAmpModelling (GuitarML fork) to RTNeural ✅
  • NAM to RTNeural 🔨
  • ToneX to RTNeural 🤞 🤞 🤞

For example, .aidax models on ToneHunt are just RTNeural compatible json files with the extension changed from .json to .aidax and a metadata section added as requested by ToneHunt. I would love to see something like this happen, if you have other ideas let's discuss on RTNeural Discord, and thanks again @jatinchowdhury18 for bringing this outstanding engine to life!

@rossbalch
Copy link

ToneX would be an interesting one. I think their weights are encrypted based on my poking around in the SQL library.

@jatinchowdhury18
Copy link
Owner

jatinchowdhury18 commented Oct 27, 2024

Okay, I finally have something useful to share on this. Thanks all for your patience. I've put together a repo with a demo of a NAM-style WaveNet model implemented in RTNeural: https://github.com/jatinchowdhury18/RTNeural-NAM

At the moment there seems to be some discrepancies between NAM's convolution layer and RTNeural's, so I'll need to debug that. There's also some missing bits (e.g. gated activations), but I don't think those should be too hard to add, now that the base implementation is in place.

The main "issue" I'm imagining for people wanting to use this is that the RTNeural model needs to be defined at compile-time, with parameters taken from the model configuration. For example:

wavenet::Wavenet_Model<float,
                       RTNeural::DefaultMathsProvider,
                       wavenet::Layer_Array<float, 1, 1, 8, 16, 3, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512>,
                       wavenet::Layer_Array<float, 16, 1, 1, 8, 3, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512>>
    rtneural_wavenet;

That type definition could be auto-generated without too much trouble, but that doesn't help much if you're planning to load the model at run-time. The RTNeural-Variant repo shows one way to deal with this issue, but it may not work well in this instance given how many parameter configurations NAM's WaveNet supports.

On the bright side, in my test example, the RTNeural implementation is running ~2x faster than the NAM implementation on my M1 Macbook. Of course this isn't a fair comparison since the RTNeural implementation isn't correct yet, but it's a good sign! So far I've been using RTNeural's Eigen backend, but I'd love to get the wavenet working with the XSIMD backend as well to see if that might run a bit faster.

Anyway, if anyone's got some time and wants to help with testing/debugging the RTNeural WaveNet, feel free to jump in to the other repo. I'm hoping to have time to get back to it later this week or next weekend.

@MaxPayne86
Copy link
Contributor

MaxPayne86 commented Oct 27, 2024

XSIMD backend

I think would be a blast, at the same time from my experience it seems this largely depends on toolchain and target arch. For example I get zipper noise with XSIMD on Mod Dwarf, I cannot compile at all on Chaos Audio Stratus. I get XSIMD working fine bulding AIDA-X in Yocto. So I would be happy to see it running in Eigen, then of course the more the better!

@mikeoliphant
Copy link
Contributor

On the bright side, in my test example, the RTNeural implementation is running ~2x faster than the NAM implementation on my M1 Macbook.

That is definitely encouraging!

What Tanh implementation are you using? When we switched to using a Tanh approximation in NAM it made a huge performance difference.

@jatinchowdhury18
Copy link
Owner

What Tanh implementation are you using? When we switched to using a Tanh approximation in NAM it made a huge performance difference.

At the moment, the RTNeural implementation is using Eigen's built-in tanh() method, which I think is what NAM uses by default as well. The idea with the MathsProvider template argument is to make it easy to drop in different implementations of tanh() (or other activation functions), without any run-time cost. However, for the moment I'd prefer to use the same operations on both sides (to the extent possible), just to make it easier to compare both accuracy and performance.

@mikeoliphant
Copy link
Contributor

At the moment, the RTNeural implementation is using Eigen's built-in tanh() method, which I think is what NAM uses by default as well.

The NAM fast tanh is optional, but the official plugin has been enabling it since I added the option:

https://github.com/sdatkinson/NeuralAmpModelerPlugin/blob/feafd19ffa025c1c54e51f626cb9b2cf64cc5cd4/NeuralAmpModeler/NeuralAmpModeler.cpp#L73

At the time, applying the tanh activation function was the top hit in the hot path and switching to the fast tanh approximation gave about a 40% performance improvement.

@mikeoliphant
Copy link
Contributor

Building on Windows (Visual Studio x64 Release) I get:

RTNeural is: 3.95238x faster

If I enable fast tanh for NAM, I get:

RTNeural is: 2.12809x faster

@mikeoliphant
Copy link
Contributor

PR is here to fix the model weight loading:

jatinchowdhury18/RTNeural-NAM#1

@mikeoliphant
Copy link
Contributor

@jatinchowdhury18 The latest version of RTNeural-NAM is looking very close to producing the same output as NAM - particularly if you let it warm up a bit.

If I run through the 2048 samples of data 4 times, and just look at the difference on the 4th run, the MSE is 2.36664e-10.

@jatinchowdhury18
Copy link
Owner

@jatinchowdhury18 The latest version of RTNeural-NAM is looking very close to producing the same output as NAM - particularly if you let it warm up a bit.

Yeah, in my local tests I've turned off pre-warming for the NAM implementation, and it's looking like the output is very close. If we add a similar "pre-warming" kind of thing to the RTNeural implementation, then we should be just about there (at least from an accuracy standpoint).

@mikeoliphant
Copy link
Contributor

The need to pre-warm isn't unique to WaveNet - it is required for LSTM models as well. Probably makes sense to be consistent in how it is handled across network types.

Speaking of that, do you have a good feel for why the pre-warming is required for WaveNet? I get it for LSTM because of the recurrent state, but WaveNet doesn't have that issue...

@jatinchowdhury18
Copy link
Owner

The need to pre-warm isn't unique to WaveNet - it is required for LSTM models as well. Probably makes sense to be consistent in how it is handled across network types.

Agreed... I found it easier to test and compare sone things with pre-warming turned off, but now that I'm getting equivalent output from both implementations, it's to bring it back :).

Speaking of that, do you have a good feel for why the pre-warming is required for WaveNet? I get it for LSTM because of the recurrent state, but WaveNet doesn't have that issue...

The way I've been interpreting it is that the pre-warming gives time for the bias in some of the layers to propagate through the whole network... although there might be a more general way to look at it that I'm not seeing.

@mikeoliphant
Copy link
Contributor

The way I've been interpreting it is that the pre-warming gives time for the bias in some of the layers to propagate through the whole network... although there might be a more general way to look at it that I'm not seeing.

I think the NAM code may also being doing a take on this:

https://github.com/tomlepaine/fast-wavenet

It seems to take advantage of the fact that the input is a moving sequence to cache convolution results.

I know that the NAM pre-warm sample size is determined by the receptive field, so that would fit.

@mikeoliphant
Copy link
Contributor

Just did my first integration testing of the current RTNeural-NAM code with my nam-lv2 plugin. It works, and sounds right!

Performance-wise, it is currently about the same as the NAM core implementation (with NAM core using fast tanh, and RTNeural WaveNet using the eigen fast tanh). At least on Windows/x64.

@jatinchowdhury18
Copy link
Owner

Okay, I think the implementation provided in RTNeural-NAM is mostly done. The only thing missing (at least that I'm aware of) is "gated" activations, but I don't think that should be too hard to implement. At the moment, both RTNeural's XSIMD and Eigen backends are supported.

From an accuracy perspective, I'm getting an RMS error of ~6.8e-8 when using std::tanh as the Tanh activation function in both implementations. In my final tests I was using different Tanh implementations, and ended up with an RMS error of 0.0012, which I think is acceptable.

From a performance perspective, I did some tests on my Windows machine (CPU: AMD Ryzen 7 7840HS), and was seeing somewhere between 1.5-2x improvement over the NAM implementation, depending on the backend. I also set up a little test plugin to compare the two implementations. Testing on my M1 Mac, I'm seeing a nice performance boost there as well, especially with the XSIMD backend.

I do think there's some room for more performance improvements... That said, I think I've done about all I've got time for at the moment. I'm pretty busy with some other projects, and I don't really have an immediate use-case for this NAM code. Besides there's some other things in RTNeural that I'd like to spend some more time on.

Unless there's any other concerns, I'm going to go ahead close this issue.

@mikeoliphant
Copy link
Contributor

While I'm seeing about a 1.5x advantage for the RTNeural implementation in the tests, I'm not seeing that advantage in actual practice (ie: running as a plugin).

It took me a while to figure out why. Turns out that having the number of samples the test is running over being a constant makes a big difference - presumably because it allow the compiler to unroll the forward() loop, and/or do a better job of vectorization.

If I use a variable for N instead of a constant, I see slightly worse performance for RTNeural (Windows/x64/Eigen).

Obviously, the sample buffer size is not going to be fixed in practice, but determined by the context. I wonder, though, if there is any way to tell the compiler that the buffer will be at least a certain size.

@mikeoliphant
Copy link
Contributor

mikeoliphant commented Nov 13, 2024

Actually, it seems like the const vs variable number of samples was a red herring.

The real difference seems to be that while RTNeural performs linearly with sample buffer size, NAM starts to fall down as you go above 2048 or so. The current test is using a 32768 sample buffer.

In my plugin testing, I'm using a 64 sample buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants