-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to recent versions of TensorFlow #90
Comments
After updating the optimizer to the legacy version, running the validation test on shape metrics resulted in the following error: assert ratio_rmse_e1 < tol
E assert 2.450450287128092e-09 < 1e-09 Lowering the tolerance to 1e-8 gives this error: assert ratio_rel_rmse_e1 < tol
E assert 1.647135320670401e-05 < 1e-08 |
Thanks @nadamoukaddem for this update. Just to confirm did you use the same random seed and learning rates for both runs? |
Yes, I used the same random seed and learning rates for both runs. Here is the training_config.yaml file: training:
# Training hyperparameters
|
I remember I used the same TensorFlow 2.15.0 version for both optimizers. It's been a while since I ran these tests, I need to rerun them. |
Okay. Could you run the tests 2-3 more times using different random numbers to see if the results are consistent? Make sure to use the same set of random numbers for both optimizers. |
I ran the tests twice with different random numbers, and the metrics changed slightly. However, I noticed something strange: when I used Adam or Rectified Adam, I'm getting the same numbers for the metrics. I'm using TensorFlow 2.15.0 |
I don't understand what you mean as your first statement (metrics differ) seems contrary to the second statement (metrics are the same). Try explaining more clearly and post the outputs. |
I have included the configurations I used in the attached text document. These include configs.yaml, training_config.yaml, and training_config_1.yaml. I didn't make any changes to the other configuration files.
'rmse_e1': 0.023490250341625767, 'std_rmse_e1': 0.015843164999311765, 'rel_rmse_e1': 555.6691819599646, 'std_rel_rmse_e1': 544.2162301831149, 'rmse_e2': 0.009168282438918063, 'std_rmse_e2': 0.008922320388766475, 'rel_rmse_e2': 346.9258300561109, 'std_rel_rmse_e2': 346.38851755041367, 'rmse_R2_meanR2': 0.03669683267550114, 'std_rmse_R2_meanR2': 0.025587624399757175, 'pix_rmse': 9.126631e-05, 'pix_rmse_std': 2.2697419e-05, 'rel_pix_rmse': 6.096496433019638, 'rel_pix_rmse_std': 2.035357244312763, 'rmse_e1': 0.0241800061185133, 'std_rmse_e1': 0.014729668133574656, 'rel_rmse_e1': 550.9202217189359, 'std_rel_rmse_e1': 538.483941515418, 'rmse_e2': 0.00846779738001486, 'std_rmse_e2': 0.008098060249920496, 'rel_rmse_e2': 336.6814546254379, 'std_rel_rmse_e2': 332.86960115973426, 'rmse_R2_meanR2': 0.10814865342870499, 'std_rmse_R2_meanR2': 0.04149496242444061, 'pix_rmse': 9.000804e-05, 'pix_rmse_std': 2.2791739e-05, 'rel_pix_rmse': 6.022337824106216, 'rel_pix_rmse_std': 1.9995957612991333,
rmse_e1': 0.023490250341625767, 'std_rmse_e1': 0.015843164999311765, 'rel_rmse_e1': 555.6691819599646, 'std_rel_rmse_e1': 544.2162301831149, 'rmse_e2': 0.009168282438918063, 'std_rmse_e2': 0.008922320388766475, 'rel_rmse_e2': 346.9258300561109, 'std_rel_rmse_e2': 346.38851755041367, 'rmse_R2_meanR2': 0.03669683267550114, 'std_rmse_R2_meanR2': 0.025587624399757175, 'pix_rmse': 9.126631e-05, 'pix_rmse_std': 2.2697419e-05, 'rel_pix_rmse': 6.096496433019638, 'rel_pix_rmse_std': 2.035357244312763, 'rmse_e1': 0.0241800061185133, 'std_rmse_e1': 0.014729668133574656, 'rel_rmse_e1': 550.9202217189359, 'std_rel_rmse_e1': 538.483941515418, 'rmse_e2': 0.00846779738001486, 'std_rmse_e2': 0.008098060249920496, 'rel_rmse_e2': 336.6814546254379, 'std_rel_rmse_e2': 332.86960115973426, 'rmse_R2_meanR2': 0.10814865342870499, 'std_rmse_R2_meanR2': 0.04149496242444061, 'pix_rmse': 9.000804e-05, 'pix_rmse_std': 2.2791739e-05, 'rel_pix_rmse': 6.022337824106216, 'rel_pix_rmse_std': 1.9995957612991333, |
Can you confirm whether you rebuilt the package
Else, create two branches one with Adam and the other with Rectified Adam. |
wf-psf_Adam.log |
The first log seems like a GPU issue. Not sure how you're loading Tensor Flow 2.11. Second log states that you need to update the optimizer to be an instance of a TensorFlow 2.11+ compatible optimizer. This means replacing any legacy optimizers with their TensorFlow 2.11+ counterparts. And, what worked for 2.15 doesn't work for 2.11. |
If it's a GPU issue, I should have the same problem when using the legacy optimizer. |
True, I am too busy right now to assist you. You can try looking online for some clues. |
Have a look here: https://stackoverflow.com/questions/71153492/invalid-argument-error-graph-execution-error But sorry at the moment, this is as much as I can help right now. |
I understand. Thank you. I'll take a look. |
psf_pytest_tf2.9.log |
If you look carefully at your log, you will see that the validation tests did not run. They were skipped. |
And it seems you solved your TensorFlow 2.11 issue, but didn't update this issue with how you solved it. It's really important that you share the solution to a reported problem in the issue. |
No, I haven't solved it yet. I read (link) that the following steps can fix the error:
I couldn't install it via Conda. Is this a good solution, so I keep working on it? |
If you didn't solve it, I don't understand how you were able to run the tests for TensorFlow 2.11 unless you ran it on a different system. You can submit a ticket to Jean-Zay/Idris Support. |
Following the instructions of Idris support, the problem was resolved by adding these lines:
|
WaveDiff runs without error using TensorFlow 2.11 and Adam optimizer, but with Rectified and the legacy optimizer, there is this error at the end of the cycles: |
The message is cut-off. Are you stuck here and unsure how to update the optimiser? |
Aren't we looking to compare the performance of the Adam optimizer with that of the Rectified optimizer in this issue ? |
The complete error message was already reported in #88 by Ezequiel. I tried to reproduce it but for me I don't encounter this error when I launched some test runs. Below is my implementation
I looked at the TensorFlow-Addons documentation and issues. Their optimizer points to the
I went through and changed all I decided to use
and did these replacements (although I don't think they are used)
Could you try this and report an update? |
I don't have the |
No, either way you can search for it in your branch and modify it there. |
This function doesn't exist. I can work on what you mentioned yesterday in the meeting so we can discuss this issue later. |
it does exist: wf-psf/src/wf_psf/psf_models/tf_psf_field.py Line 1214 in 87e0c8e
|
ok, so it's in |
yes, I meant for you to search for the function in your branch to find what module it is in. Apologies if that wasn't clear to you. I've done some refactoring in the branch where I tested and I moved the function to |
Thank you. I tried the changes that you made in the |
Thanks. can you work on a Pull Request to perform an update to TensorFlow to 2.11 with Rectified Adam optimiser as well as associated package dependencies, i.e. Keras, TensorFlow-Addons, etc? Make sure to run the validation tests on training and metrics, which you have to run locally with |
The train_test is failing even though I can run WaveDiff normally. |
I am unable to reproduce your error nor have I ever encountered it. My steps were:
See the output here:psf_pytest1.txt |
Hi Nada, I can open the PR since I was able to implement the needed change without an issue. Could you work on #133 which is more urgently needed? Let Tobias and me know if you have questions by either asking directly in #133 or in #sgs-sdc-fr-psf. |
Hi Jennifer, Ok. |
WaveDiff implements the Rectified Adam Optimiser from the TensorFlow Addons library that has stopped development (see details in link). Minimal maintenance releases will continue until May 2024. As a result, it is not compatible with the latest versions of TensorFlow 2.11+. Interestingly, 2.9+ is also affected resulting in the following error reported in Issue #88 which results when loading a saved checkpoint. The Rectified Adam Optimiser is currently not part of the core library of TensorFlow 2. The fix is use the
tf.keras.optimizers.legacy
namespace (see here) that allows the old optimizers to work.This issue to do a couple of tasks:
The text was updated successfully, but these errors were encountered: