Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting stuck after confirm joining federated training session #31

Open
luckystarufo opened this issue Oct 29, 2022 · 0 comments
Open

Comments

@luckystarufo
Copy link

luckystarufo commented Oct 29, 2022

Hello there,

I am trying to run the code on three different machines within a network. The three machines can communicate properly but the training never starts.

After digging a little bit, I find the the code stucks at the run() function within federated-xgboost/dmlc-core/tracker/dmlc_tracker/rpc.py:

Screen Shot 2022-10-28 at 5 00 47 PM

Here's what I saw on the server side:
Screen Shot 2022-10-28 at 5 02 13 PM

And here are the ones on the two clients side:

Screen Shot 2022-10-28 at 5 02 52 PM

Screen Shot 2022-10-28 at 5 03 02 PM

Any suggestions? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant