Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hack: Bad process #280

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

hack: Bad process #280

wants to merge 3 commits into from

Conversation

laurensvalk
Copy link
Member

@laurensvalk laurensvalk commented Jan 8, 2025

This should not be merged, but serves to document and reproduce an odd bug that I am seeing locally.

Context
I was working on a major revision of the port subsystem, and was running into issues with etimers and polling processes.

After some time, I was able to boil it down to a very small example that also builds from the current master branch. This makes it seem independent from the current work (which is not submitted here).

To reproduce
Build either of the two commits from this PR. If everything is working, you should see Hello World printed in Pybricks Code every second after connecting.

  • In the first commit, nothing is printed at all
  • In the second, if we ignore the ev check, it doesn't work at first but it starts when kicked externally. For example, start the REPL (state changes trigger a event broadcast).

Caveats
This may be due to unrelated undefined behavior somewhere else. It seems to vary from build to build. At one point, adding comments made it (dis)appear, presumably because that changes the LC value of the switches. Similarly, experimenting with variations of etimer_set such as etimer_reset don't make the difference, even if they sometimes appear to make it work, but presumably that is the same as above.

This is one reason to create the CI build --- to see if it reproduces there too. EDIT: It does. I suppose that is a good thing.

@laurensvalk
Copy link
Member Author

Also, when reproducing, disconnect any UART sensors. These also still broadcast events (will be fixed in #275). So it is possible we've had this issue for longer but the continuous source of UART broadcast event broadcasts was masking it?

@laurensvalk laurensvalk marked this pull request as draft January 8, 2025 18:19
@coveralls
Copy link

Coverage Status

coverage: 56.36%. remained the same
when pulling c9c3f25 on bad-process
into 4898777 on master.

@dlech
Copy link
Member

dlech commented Jan 8, 2025

If this isn't working and we have ruled out overflowing the event queue, then it kind smells like possibly a memory corruption bug to me.

One way to possibly debug that could be to keep turning off non-essential drivers (PBDRV_CONFIG) and subsystems (PBIO_CONFIG) until this starts working to narrow it down to what is interfering with proper operation.

@laurensvalk
Copy link
Member Author

Thank you.

This seems to be a debug print issue rather than a process issue. I went back to apply this patch to previous released and noticed it started occurring since b460d6d, which is plausible, because running the event loop from a process is a bad idea.

All the more reason to finish up direct UART access for debugging 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants