Replies: 1 comment 1 reply
-
Hi @m-volk thanks for the question! Your observation is correct, the reason is that the workflows ends BEFORE all records have been sent. for example
We will work on an enhancement later that will be easier to all users. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When using experiment tracking, I notice that every once in a while messages get lost. The underlying problem seems to be that events are not propagated. The problem exists in latest main (commit d152605). Can you give an advice on how to avoid this in a good way? My actual intend is to send log-messages from the clients to the server (my code is similar to the experiment tracking code), and I want to make sure the messages arrive.
To reproduce the problem, add following into NVFlare/examples/advanced/experiment-tracking/pt/learner_with_tb.py, line 154.
Reduce the number of epochs to 1 in
examples/advanced/experiment-tracking/tensorboard/jobs/tensorboard-streaming/app/config/config_fed_client.json
, line 30.Run the example
NVFlare/examples/advanced/experiment-tracking/tensorboard/jobs/tensorboard-streaming
in a simulator. (See example instructions.)Then explore the tensorboard results from Python.
An example output is
I would expect to find all numbers from 0 to 99, but a lot is missing.
When I wait a bit between submission of analytic information (e.g. 0.01s), messages don't get lost. I did so by adding following to
learner_with_tb.py
, line 154.Beta Was this translation helpful? Give feedback.
All reactions