Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is training supposed to take weeks? #50

Open
jhlq opened this issue Jul 4, 2024 · 4 comments
Open

Is training supposed to take weeks? #50

jhlq opened this issue Jul 4, 2024 · 4 comments

Comments

@jhlq
Copy link

jhlq commented Jul 4, 2024

Hi!
I cloned your repo (couldn't add it from the package manager because of conflicting compat requirements involving CUDA) and after some tinkering managed to get it to work. However the initial example in the README takes 10 hours per step (callback triggered once overnight). If I change the strategy max_iter from 1000 to 1 it takes about 5 minutes per step, also I simplified the model chain to just 3 small layers so it is surprising it takes that long...

My laptop is a few years old with 8GB RAM, is the initial example supposed to take weeks to complete on such hardware, or did I introduce a bug with my tinkering?

@killah-t-cell
Copy link
Owner

you'll have more luck with a GPU but this code is old and unmaintained. You can probably write something better from scratch these days!

@jhlq
Copy link
Author

jhlq commented Jul 5, 2024

I simplified your equations and now it finds a solution in seconds.

Are you still involved in plasma research, even if not this repo?

@killah-t-cell
Copy link
Owner

How did you simplify them? I'm no longer doing physics work :)

@jhlq
Copy link
Author

jhlq commented Jul 6, 2024

I'm actually not sure, at first I simply set E and B to 0 and then it was lightning fast, but even when activating E in the 1D case training takes about a second per step. I did use the approximation f(t,x,|v|) and set the integral domain to (0,1), but that couldn't possibly account for an improvement factor of 30000... The BFGS algorithm starts out 5x slower than ADAM and sometimes spikes to 100x slower whereas ADAM is consistent at 1s/step, so I used ADAM instead.

You can inspect the code here: https://gitlab.com/marcus.appelros/fusion/-/blob/main/neural/Vlasov0.jl

I'm still working on the |v| approximation in 3D, if I add v_norm(vs...) as a variable and the equation v_norm=norm(vs) then the problem construction complains that v_norm in f(t,xs...,v_norm(vs...)) in sys.dvs doesn't have a name... IE nameof(v_norm) errors. Do you know of a workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants