Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derivatives are different from one run to the next #403

Open
abaillod opened this issue Apr 17, 2024 · 4 comments
Open

Derivatives are different from one run to the next #403

abaillod opened this issue Apr 17, 2024 · 4 comments
Assignees

Comments

@abaillod
Copy link
Contributor

Hi!

I realized that getting twice the same derivatives is not guaranteed in some cases. For example, running

surf.dgamma_by_dcoeff_vjp(np.ones((1,))) - surf.dgamma_by_dcoeff_vjp(np.ones((1,))) 

for some surf that is a SurfaceRZFourier instance, I do not get zero as expected, for example it would returns

array([ 0.00000000e+00, -5.68434189e-14,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -3.15544362e-30,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -1.57772181e-30,  0.00000000e+00, -6.31088724e-30,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -6.31088724e-30,  6.31088724e-30,  0.00000000e+00,
        0.00000000e+00,  6.31088724e-30,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  6.31088724e-30,
        0.00000000e+00,  6.31088724e-30,  0.00000000e+00,  0.00000000e+00,
        2.52435490e-29,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  3.15544362e-30,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -3.15544362e-30,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  1.57772181e-30,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00])

Now this difference is very negligible; however, when running the single stage example, examples/3_Advanced/single_stage_optimization.py, these errors accumulate, and after 100 iterations the optimum found can differ by 10^-3 in the degrees of freedom from one run to the other, which is an issue.

To be clear - when running twice the same script, with the same initial guess, I obtain different optima.

Is this a known problem? Does anyone know how to have reproducible runs?

@abaillod abaillod added the bug Something isn't working label Apr 17, 2024
@andrewgiuliani
Copy link
Contributor

andrewgiuliani commented Apr 18, 2024

Hi Antoine,

I wouldn't say that this is a bug, simply an artifact of using a multithreaded code. The discrepancy is due to OpenMP and goes away when you only allow simsopt to use one thread.
export OMP_NUM_THREADS=1

Specifically, the critical region here is the problem.
https://github.com/hiddenSymmetries/simsopt/blob/38a20cb8a0bcb6aab6d7250e595fd47b5f720ecf/src/simsoptpp/surfacerzfourier.cpp#L609C2-L609C21

The critical region doesn't guarantee that the threads all write to resptr in the same order, which causes the small floating point discrepancies. This sort of thing would be fixed if we added something to guarantee that the threads always write in the same order.

@abaillod
Copy link
Contributor Author

You are right, setting export OMP_NUM_THREADS=1 solves the problem.

I think it would be worth fixing, as it is currently impossible to get twice the same result, which is obvisouly an issue for reproducibility. As an example, consider the examples/3_Advanced/single_stage_optimization.py example, and change the line 35 to MAXITER_single_stage = 100.

First run gives

...
fun#131: Objective function = 0.2159
fun#132: Objective function = 0.2159
Aspect ratio after optimization: 7.048877300334167
Mean iota after optimization: -0.9001021369022315
Quasisymmetry objective after optimization: 0.21345053416150483
Magnetic well after optimization: -0.14651386077848788
Squared flux after optimization: 8.485373878526792e-05

While running a second time gives

...
fun#121: Objective function = 0.2159
fun#122: Objective function = 0.2159
Aspect ratio after optimization: 7.04875160522698
Mean iota after optimization: -0.9002470575161438
Quasisymmetry objective after optimization: 0.21346284023142553
Magnetic well after optimization: -0.14653190866717544
Squared flux after optimization: 8.81152422891433e-05

Note that the aspect ratio, mean iota, QS error, magnetic well and squared flux are different in both runs. In addition, first run required 132 iterations to converge, while the second one took only 122. If you increase the weight on the coils, coils_objective_weight = ..., the difference between both results grows even more.

Any suggestions on how to fix that? Something like this? https://stackoverflow.com/questions/27199255/openmp-ordering-critical-sections

@andrewgiuliani
Copy link
Contributor

andrewgiuliani commented Apr 20, 2024

There are a couple of ways of remedying this.

  1. Replacing the critical regions with something that guarantees an ordering. I haven't tested this, but you could check if something like:
#pragma omp for ordered
for( int t=0; t<omp_get_num_threads(); ++t )
        {
            #pragma omp ordered
                {
                for(int i=0; i<num_dofs(); ++i) {
                    resptr[i] += resptr_private[i];
                }
            }
        }

might work and fits the bill. I'd also like to see how much things slow down now that threads must wait and write in the same order.

  1. each thread could saves their contribution to a separate array, e.g. resptr_private[thread_id, i] in the multithreaded region of the code. Then, after the multi-threaded region of the code is exited, thread 0 sums the contributions into resptr. This approach uses more buffer memory.

@abaillod abaillod removed the bug Something isn't working label Apr 22, 2024
@abaillod
Copy link
Contributor Author

Thank you @andrewgiuliani, what you proposed fixed the issue. When evaluating twice the same derivative, they match up to machine precision.

However, when running the examples/3_Advanced/single_stage_optimization.py example with MAXITER_single_stage = 1E3, I still get discrepancies. For example, running with sbatch -N 1 --ntasks-per-node=32, I get the two following results:

< fun#1001: Objective function = 0.1230
< fun#1002: Objective function = 0.1230
< Aspect ratio after optimization: 7.033478399904086
< Mean iota after optimization: -0.0007434078919211339
< Quasisymmetry objective after optimization: 0.09954032247447002
< Magnetic well after optimization: -0.001651371860370606
< Squared flux after optimization: 2.2158571643393266e-05
---
> fun#1070: Objective function = 0.1231
> fun#1071: Objective function = 0.1231
> Aspect ratio after optimization: 7.033602930037235
> Mean iota after optimization: -0.0007454067291868244
> Quasisymmetry objective after optimization: 0.09966106730240758
> Magnetic well after optimization: -0.0016879256487739636
> Squared flux after optimization: 2.207314699559636e-05

Interestingly, running rith --ntasks-per-node=1, I get the same result on two different runs, down to machine precision, even after 10k iterations.

There is thus somewhere else in simsopt an issue with the OpenMP parallelization. I would be interested to know if anyone has a suggestion to where to look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants