Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic dataflows can not be stopped on Linux and MacOS for 0.3.6 #647

Open
Hennzau opened this issue Sep 9, 2024 · 5 comments
Open

Dynamic dataflows can not be stopped on Linux and MacOS for 0.3.6 #647

Hennzau opened this issue Sep 9, 2024 · 5 comments
Labels
bug Something isn't working coordinator daemon

Comments

@Hennzau
Copy link
Collaborator

Hennzau commented Sep 9, 2024

Describe the bug

Issue #575 has resurfaced, even though it was previously resolved by #583.

When I launch a dataflow where a node is dynamic, I can't stop it; neither the coordinator nor the daemon crashes (whether I use CTRL+C if attached or dora stop if detached). The only way to stop it is by running dora destroy followed by dora up.

This issue also occurs when running the full dataflow. Even after starting it and running all the nodes, the dataflow doesn't terminate as expected.

Steps to Reproduce

  1. I was working on some courses, so I created a directory, cd into this directory.
  2. Start the Dora daemon: dora up.
  3. Start a new dataflow: dora start dataflow.yaml.
    • You can also run the two dynamic nodes if you wish. Everything functions correctly, but the dataflow doesn't stop at the end.
  4. Attempt to stop it using CTRL+C.

At this step, it freezes until I do the following:
5. Destroy the dataflow: dora destroy.

Environment (please complete the following information):

  • System info: MacOS and Linux
  • Dora version: tag v0.3.6
@Hennzau Hennzau added the bug Something isn't working label Sep 9, 2024
@haixuanTao
Copy link
Collaborator

haixuanTao commented Sep 10, 2024

I have seen this get back as well!

Super annoying.

I'm going to catch up on certain things for lerobot, but I agree that this should be fixed.

FYI, philip is on vacation until the 21st I think.

@haixuanTao
Copy link
Collaborator

The key issue is that we have to back propagate drop event token when some node finishes to use an input so that the shared memory can be freed.

However sometimes it seems that the drop event token is blocked with GIL and the garbage collector, making the whole dataflow unable to sstop gracefully.

@haixuanTao
Copy link
Collaborator

If you add enough time in grace duration you should see the timeout.

@Hennzau
Copy link
Collaborator Author

Hennzau commented Sep 10, 2024

all right, I'll take a look after #650

@Hennzau
Copy link
Collaborator Author

Hennzau commented Sep 11, 2024

probably linked to #625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working coordinator daemon
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants