-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis tests no longer pass. #1123
Comments
Tensorflow recently updated to 2.0, probably related :) |
Sounds plausible ^^ |
Any chance necessary changes will be made for TF2.0 support? |
Yes I expect this week or next week I'll push an update to fix compatibility with tensorflow 2.0. |
If I try to run the tests locally on a machine without CUDA the tests also hang forever. In my case, all 8GB of RAM was exhausted and the whole computer froze. Perhabs we have a memory leak with TF2 somewhere, or perhaps TF2 or its TF1 compatibility layer is doing something very inefficiently. I didn't try to pinpoint it to anything specific. I also don't think I'll have time for that anywhere soon. |
You need an upgrade :) I was testing only import tensorflow as tf
tf.compat.v1.disable_v2_behavior() Seems to make everything "normal". I don't want to add that code everywhere, but so far it's the only solution I have. I don't have much time to look into this issue further, but I will see if I can find out the underlying issue. In the meantime, @de-vri-es could you verify that disabling v2 behavior fixes the tests on your system too? |
I feel like this could be related: tensorflow/tensorflow#32052 . Sounds like there are some memory issues with tf2 and the travis build most likely gets killed because it uses too much memory. Maybe as a temporary fix we can disable the regression tests (the ones that actually train on a bit of data) and only test specific functions for now? |
It was actually 16 GB, though maybe I still should :) Anyway, to set a baseline I tried running the tests again without any modifications, and python segfaulted in some multiprocessing operation 💃
Ran it again, no segfault, but memory usage crept up in However, then I copied the change into
So it looks like we can't just do this for everything blindly. |
hmm, for some reason I'm now getting that error even without modifications. It may not be related. Besides that error, the test suite is passing with the suggested change added only to the densenet and mobilenet tests. |
Ah, I see, pytest isn't running each tests in a clean, isolated environment. It appears it already imported the densenet and mobilenet tests, which already called I glanced at |
Actually I think we had that a few commits ago with python-xdist but I thought at first that that was causing issues so I removed it to test in Travis but then it got merged a bit prematurely. |
I put the forking of tests back, it stops the cumulating of memory usage, but it still doesn't work on travis (it does run fine locally for the record). Also, on two comparable systems, but one running tensorflow 2 and the other running tensorflow 1.14, the times for Disabling eager execution in tensorflow reduces the time to 70s, disabling v2 behavior gives approximately the same time (47s) as 1.14. |
So what is advantage of using FT 2.x? |
It's going to be the version of tensorflow many people will use, so we have to support it. But the main advantage of tf 2 is eager execution (which is being annoying at the moment) and the move to tf.keras for their API. |
according to experimenting in #1137 it is kind of magic circle, you disable v2 behaviour which contains eager execution which is needed for #1123 (comment) and enabling this execution overflow memory and time limit... so we would need to disable the behaviour and replace |
Yep you're right :) I'm hoping keras-team/keras#13476 will get merged soon, then we can just continue with disabling v2 behavior. Seems to me the most easy way forward. |
Well, the time of being merged and being release can be different... 8-) |
Yes, but merging that will break usage when using tensorflow 2 :) At least now it should work with tensorflow 1.14/1.15. |
should is not very convincing, I would merge #1137 even it not fully fixed, but it has tests for TF 1.x and 2.x so it is clear what passes and what not... |
There's no point in merging it yet without having the fix in Keras to go with it. |
let me clarify... what you expect from Keras fix, that the |
Excuse me, when I read .travis.yml I see the line |
|
Thanks for the quick reply @Borda, when I run the file train.py, I met this problem |
You are probably missing a paremeter |
When I find the parameter |
same error here /usr/local/lib/python3.6/dist-packages/keras/callbacks/tensorboard_v2.py:92: UserWarning: The TensorBoard callback |
I just reverted some commits, can you try again? Note: You still need keras-team/keras#13476 to pass the tests and in case you are using tensorflow 2.1, you also need tensorflow/tensorflow#34870 (or remove Tensorboard). |
What's about the next warning? |
The travis tests no longer pass as can be seen here: https://travis-ci.org/fizyr/keras-retinanet/builds/592460276 .
At a glance, it seems to be a problem with an incompatible version of tensorflow, but I didn't dig very deep.
The text was updated successfully, but these errors were encountered: