Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] print() combined with UART writes can freeze hub #304

Closed
laurensvalk opened this issue Apr 13, 2021 · 38 comments
Closed

[Bug] print() combined with UART writes can freeze hub #304

laurensvalk opened this issue Apr 13, 2021 · 38 comments
Assignees
Labels
bug Something isn't working hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime)

Comments

@laurensvalk
Copy link
Member

This freezes the Prime Hub consistently.

from pybricks.pupdevices import UltrasonicSensor
from pybricks.parameters import Port
from urandom import randint

lights = UltrasonicSensor(Port.C).lights

count = 0

while True:
    count += 1
    print("Start iteration", count, end="")
    # Turn on all lights.
    for i in range(50):
        lights.on([randint(0, 100) for j in range(4)])
        # wait(50)
    print("; ended")

Originally posted by @BertLindeman in #297 (comment)

@laurensvalk laurensvalk added bug Something isn't working software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) labels Apr 13, 2021
@laurensvalk
Copy link
Member Author

laurensvalk commented Apr 13, 2021

However, this works fine.

from pybricks.pupdevices import UltrasonicSensor
from pybricks.parameters import Port
from urandom import randint

lights = UltrasonicSensor(Port.C).lights

while True:
    lights.on([randint(0, 100) for j in range(4)])

But this freezes. It appears that combining fast prints with fast sensor writes is an issue.

from pybricks.pupdevices import UltrasonicSensor
from pybricks.parameters import Port
from urandom import randint

lights = UltrasonicSensor(Port.C).lights

while True:
    lights.on([randint(0, 100) for j in range(4)])
    print("a")   # <------- this is only addition. Now it freezes.

@laurensvalk laurensvalk changed the title [Bug] Writing much data to UART device can freeze hub [Bug] print() combined with UART writes can freeze hub Apr 13, 2021
@Novakasa
Copy link

Novakasa commented Apr 13, 2021

I think I reproduced this with a CityHub with an older firmware (before uselect was introduced). I use PybricksHub.write(msg) to communicate with the running program from a PC. I request a "speed measurement" from the program once every second, which prints the result to send it to the PC. The hub program is also constantly reading ColorDistanceSensor.color() values and updating DCMotor.dc()

When running it like this, the hub becomes unresponsive after an inconsistent time of running (not every print freezes the hub). Decreasing the frequency of the "speed measurement" requests as well as increasing the delay of the main loop in the program seems to decrease the chance of the crash.

So this would support the theory that fast prints and sensor readouts are the problem.

Because I did this with an older version of the firmware and I have modified my scripts since then, I am not attaching any specific code here. It seems the above example does represent my case pretty well.

@BertLindeman
Copy link

BertLindeman commented Apr 13, 2021

Have seen freezes in a loop without sensor and only doing randints [EDIT] but it had prints in the loop.
So removed the randint and this freezes in no time:

from pybricks.pupdevices import UltrasonicSensor
from pybricks.parameters import Port
# from urandom import randint

lights = UltrasonicSensor(Port.C).lights

while True:
    # lights.on([randint(0, 100) for j in range(4)])
    lights.on([60, 7, 57, 25])

No print in here.
Maybe the randint provides the sensor some time to process the values?

@laurensvalk
Copy link
Member Author

Does that always happen? That script without prints does not freeze for me.

@BertLindeman
Copy link

It did on here.
Maybe environment?
Now running win10 / MS Edge
Will reboot windows and re-try

@BertLindeman
Copy link

Does that always happen? That script without prints does not freeze for me.

Did not test for always, but the first four attempts froze within about seconds (including program load).
The last one so soon I did not start counting yet.

program:

from pybricks.pupdevices import UltrasonicSensor
from pybricks.parameters import Port
# from urandom import randint

lights = UltrasonicSensor(Port.C).lights

while True:
    # lights.on([randint(0, 100) for j in range(4)])
    lights.on([60, 7, 57, 25])

versions

  • win10, just after reboot
  • Microsoft Edge Versie 89.0.774.75 (Officiële build) (64-bits)
  • Pybricks Code v1.0.0-beta.14
  • Spike PrimeHub; setup like Hardware Module 1
  • Firmware ('primehub', '3.0.0b5', 'e5858276 on 2021-04-12')

@BertLindeman
Copy link

Same Spike setup (as in previous comment) tested on Linux Mint 20.1
on Google Chrome Version 89.0.4389.114 (Official Build) (64-bit)
Pybricks Code v1.0.0-beta.14

All 4 tests froze up.
In about 10, 5, 3 and 19 seconds (eye-counted)

@laurensvalk
Copy link
Member Author

It's possible there is some variation between sensors. I'd have to try another to see if I can reproduce it.

I also tried both the dual-boot and single-boot just to be sure, but that makes no difference, as expected.

@BertLindeman
Copy link

Good idea, I do have two of them with different codes.

The Spike one (if I kept them correctly boxed) is coded 424N9 and the Mindstorms one 316N0
Have used up to now the 316N0 one.
Up to the other...

@BertLindeman
Copy link

The 424N9 definitely reacts differently.
The program with only in the loop:

lights.on([60, 7, 57, 25])

does NOT freeze (well not in a few minutes).

But if the loop only contains:

lights.on([randint(0, 100) for j in range(4)])

It freezes the Spike prime hub in 5 to 10 seconds.

Strange . . . .

Is there a way to report the sensor characteristics in python?
They report both the same device type 62 = 0x3E

Or should we be looking at the random function randint?

@laurensvalk
Copy link
Member Author

We don’t currently expose the device hardware and firmware version, but we could add it to the PUPDevice.info() dictionary in a future release.

If we get to a point where we need this for this release, we can give you a debug build which just prints all that information when you plug it in.

I think I’ll try to find a sensor that has this issue. Thanks for the heads up 👍

@laurensvalk
Copy link
Member Author

So these might be two different issues:

  1. Fast uart sensor write + Fast printing
    • Appears to affect all Ultrasonic Sensors.
    • Possibly Color sensors?
    • Only when using lights?
  2. Fast uart sensor write
    • Appears to affect only some sensors.

Case (1) seems to be the most interesting for now: Can we get print freezes in other scenarios too, without lights? It appears that @Novakasa may have seen this once or twice. If so, this is probably worth fixing for the upcoming v3.0. It would be great if we could find a reproducable case.

Case (2) may be less critical for now. We can just delete the light feature from the v3.0 documentation (just like we did with the IMU) and bring it back for v3.1 when we've investigated it properly.

@Novakasa
Copy link

Novakasa commented Apr 14, 2021

Since I don't have an ultrasonic sensor, I tried something similar to the above examples with my ColorDistanceSensor yesterday, but I couldn't reproduce the problem. Maybe someone can test if the Ultrasonic Sensor also has the problem with the CityHub or TechnicHub?
Once I experience the problem again, I will try to narrow it down to get something reproducible with ColorDistanceSensor.

@laurensvalk
Copy link
Member Author

Were you using the internal IMU sensor, by any chance? That has known issues too.

@Novakasa
Copy link

No, I was using the CityHub when I encountered the problem (which doesn't have the IMU, please definitely correct me if I'm wrong 😅 )

@BertLindeman
Copy link

Sorry, this has become a long post...

So these might be two different issues:

  1. Fast uart sensor write + Fast printing

    • Appears to affect all Ultrasonic Sensors.
    • Possibly Color sensors?
    • Only when using lights?

Program similar to that of the US Distance sensor changed for the color sensor,
freezes in 5, 3 and, 3 iterations of the range(50) loop.
The freeze visible on the hub display
and the color sensor blinks like on the "US distance sensor freeze": 1/2 second OFF ~2 or 3 seconds ON

versions

  • win10
  • Microsoft Edge Versie 89.0.774.76 (Officiële build) (64-bits)
  • Spike prime hub at ('primehub', '3.0.0b5', 'e5858276 on 2021-04-12')
  • Pybricks Code v1.0.0-beta.14
  • Hardware module 1 with 2 sensors and motor cable connected

The program with comment lines removed (program color sensor 1):

from pybricks import version
print(version)
from pybricks.pupdevices import ColorSensor
from pybricks.parameters import Port
from urandom import randint
lights = ColorSensor(Port.B).lights

count = 0
print("") # start at a new line
while True:
    count += 1
    print("Start iteration", count, end="")
    for i in range(50):
        lights.on([randint(0, 100) for j in range(3)])
    print("; ended")
  • I realize that there is NO print in the lights-on loop.
    Still, the freeze is so soon.
    After the freeze it is enough to remove the battery from the hub and add it in again.
    The code webpage notices the disconnect and shows that correctly and Bluetooth connect goes fine.

Sorry, have to add to the confusion.
Was trying to skip the use of randint.
With fixed values I was too impatient, so saw no freeze.
But it might come if I waited some more.

The program below caused this:
After the print on the log froze, the display animation on the hub went ON for some time.
Log:
image

This program froze the print and the display sometime later (program color sensor 2):

from pybricks import version
print(version)
from pybricks.pupdevices import ColorSensor
from pybricks.parameters import Port
from urandom import randint
lights = ColorSensor(Port.B).lights

count = 0
print("") # start at a new line
while True:
    count += 1
    print("Start iteration", count, end="")
    for i in range(50):
        # lights.on([randint(0, 100) for j in range(3)])
        lights.on([count % 5, count % 95, count % 20])
        print(".", end="")
    print("; ended")

After this freeze had to wait for error message:

Connection failed for unknown reason.

The hub was thinking it was still connected and the code webpage did not.
Did crtl-F5 and reboot the hub.
Had (win10) to switch Bleutooth OFF and ON to get a connected hub

Interestingly, the second time some dots were not shown in the log, so the print was skipped?
See iteration 54 and 55:

Start iteration 53..................................................; ended
Start iteration 54.................................................; ended
Start iteration 55.................................................; ended
Start iteration 56..................................................; ended
Start iteration 57..................................................; ended
Start iteration 58..................................................; ended
Start iteration 59..................................................; ended
Start iteration 60..................................................; ended
Start iteration 61...........

@BertLindeman
Copy link

@laurensvalk
Similar behaviour found for the hub lights of

  • technichub
  • primehub
  • movehub

Sometimes the printing stops and the inventorhub display animation goes on.

Would it be better to create another issue for that?
This item is getting a bit overloaded I think.

@laurensvalk
Copy link
Member Author

Would it be better to create another issue for that?

No worries @BertLindeman, we appreciate all your findings.

We can split it into different issues once we find out what's wrong 😄

@BertLindeman
Copy link

Oke @laurensvalk Thank you for your fast responses.
Here it comes:

One program fit for technichub, movehub, and inventor hub.
Can easily be adapted for the cityhub I think, but I do not have one.

from pybricks import version
print(version)
hw_type = version[0]

if str(hw_type) == "technichub":
    from pybricks.hubs import TechnicHub
    hub = TechnicHub()
elif hw_type == "movehub":
    from pybricks.hubs import MoveHub
    hub = MoveHub()
elif hw_type == "primehub":
    from pybricks.hubs import InventorHub
    hub = InventorHub()

from pybricks.parameters import Color
from pybricks.tools import wait
# if not hw_type == "movehub":
#     from urandom import randint

counter = 0
print("")
while True:
    counter += 1
    print("Start iteration", counter, end="")
    for i in range(50):
        hub.light.off()
        # x = [randint(0, 100) for j in range(3)]
        x = [counter % 90, counter % 111, counter % 99]
        hub.light.on(Color(h=x[0], s=x[1], v=x[2]))
        print(".", end="")
    print("; ended")

Technichub results
`'technichub', '3.0.0b5', 'bf4bf0e on 2021-04-12'

Start iteration 18..................................................; ended
Start iteration 19..................................................; ended
Start iteration 20...................

Second try:

Start iteration 157..................................................; ended
Start iteration 158......................

This hub has no display, so cannot comment on that.
The color or intensity of the hub light does not change. So freeze is assumed.
The Bluetooth connection seems to keep active.
No change seen in the buttons on the code webpage.
Recovery:
On the code-webpage disconnect Bluetooth (works best to do this first)
long press on the hub button to stop
and short press to reboot.

Inventorhub results
This time the dark azure one, with nothing attached.
Start the code page with CTRL-F5 to start clean.
Firmware 'primehub', '3.0.0b5', '8f22a995 on 2021-04-13'

Start iteration 2429..................................................; ended
Start iteration 2430..................................................; ended
Start iteration 2431..................................................; ended
Start iter
  • Web page shows Bluetooth connected
  • Hub display keeps animated
  • Button color stays the same.
    This was at 14:10
    At 14:25 I pressed the web-page Bluetooth button to disconnect.
    The hub display keeps animated.
    And the button color was also animated and not the blue "trying to connect" blink.
    So the program keeps on running as designed.

Recovery:
On the code-webpage disconnect Bluetooth (works best to do this first)
Long press the hub middle button to switch off
Long press the hub middle button to reboot.

movehub results
Firmware 'movehub', '3.0.0b5', 'bf4bf0e on 2021-04-12'
CTRL-F5 - clean start
The freeze occurs and in a few seconds the Bluetooth connection drops

Start iteration 599..................................................; ended
Start iteration 600..................................................; ended
Start iteration 601..................................................; ended
Start iteration 602.................................................

As we saw on the inventor hub the drop of the Bluetooth connection "revived" the hub
Hub light color changes animated like at the start.
Recovery:
On the code-webpage Bluetooth already was disconnected
Long press the hub button to switch off
Short press the hub button to reboot.

General remark
There is no need to take the battery OUT and IN to revive the hub, like in the other scenario's here.
A long press on the hub stops the program.

@Novakasa
Copy link

Novakasa commented Apr 14, 2021

I extended the above example for the CityHub and reproduce the same behaviour as reported for the TechnicHub.
Stuck at iteration 88, 37, 38, 144.
'cityhub', '3.0.0b5', 'bf4bf0e on 2021-04-12'

Once the hub freezes, the LED stays at it's color. When pressing the hub button it resets to continuous Blue, but pressing it again doesn't seem to start any program. (Which usually would start the default hello world program I think)

@dlech
Copy link
Member

dlech commented Apr 14, 2021

Once the hub freezes, the LED stays at it's color. When pressing the hub button it resets to continuous Blue, but pressing it again doesn't seem to start any program. (Which usually would start the default hello world program I think)

For TechnicHub/CityHub, we've narrowed down a Bluetooth issue that is unrelated to UART and fits this description that is not present on PrimeHub (haven't tested MoveHub yet). We are tracking it in #306.

@dlech dlech added the hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub label Apr 26, 2021
@BertLindeman
Copy link

BertLindeman commented Apr 27, 2021

Attempting to prove this issue is not yet fixed like #306 just to be sure.

The firmware flashed by install.pybricks.com at the moment (still) is:
('primehub', '3.0.0b5', '8f22a995 on 2021-04-13')
And this still freezes running this program (as in Comment )

from pybricks import version
print(version)
from pybricks.pupdevices import ColorSensor
from pybricks.parameters import Port
from urandom import randint
lights = ColorSensor(Port.B).lights

count = 0
print("") # start at a new line
while True:
    count += 1
    print("Start iteration", count, end="")
    for i in range(50):
        # lights.on([randint(0, 100) for j in range(3)])
        lights.on([count % 5, count % 95, count % 20])
        print(".", end="")
    print("; ended")

Tried to install firmware from build#1006 via the LEGO Mindstorms app on win10.
Got:
image

So waiting for install.pybricks.com to provide a newer firmware level.

@laurensvalk
Copy link
Member Author

Thanks Bert --- indeed, we expect that this issue still exists. It happens on all hubs at the moment.

#306 was something specific to Technic Hub and City Hub.

@dlech
Copy link
Member

dlech commented Apr 27, 2021

I'm not able to reproduce on Technic hub since #306 is fixed (I've only tried one sensor so far though).

@laurensvalk
Copy link
Member Author

It does seem to happen still on Prime Hub. I also just saw the same with the Color Sensor.

And this seems to be a total freeze, including the stop button.

@BertLindeman
Copy link

Hi guys,
Since my previous post I have my technic hub running the color-sensor lights tests as in that post.

So it is now running over 7 hours without problem.
No idea if a roll-over occured on the counter, but at stop it showed:

Start iteration 147305..................Traceback (most recent call last):
  File "main.py", line 14, in <module>
SystemExit: stop button pressed

@dlech
Copy link
Member

dlech commented Apr 30, 2021

I have reproduced this on the City hub. However, since the city hub is the only hub that currently has the watchdog timer enabled, it will reboot instead of freezing. As a first step, we should be able to enable the watchdog time on other hubs so we at least won't have to pull the batteries.

@dlech
Copy link
Member

dlech commented Apr 30, 2021

This might just be my imagination or a concidence, but it seems like the crash (on the City hub at least) happens sooner when the battery voltage is lower.

UPDATE: Probably my imagination. Still crashing with fully charged batteries.

@dlech
Copy link
Member

dlech commented Apr 30, 2021

Since just now (not in any released firmware yet) all Powered Up hubs have the watchdog timer enabled, so we can now do this to test if the most recent reboot was due to a crash:

if hub.reset_reason() == 2:
    print('reset due to watchdog timeout')

(numeric values come from here)

@dlech
Copy link
Member

dlech commented Apr 30, 2021

Problem also reproduces on Move hub.

@dlech
Copy link
Member

dlech commented May 1, 2021

Well... I think I accidentally fix this. While I was looking at code that could possibly be related, I did pybricks/pybricks-micropython@7d14182 just to clean things up a bit - but it seems to have fixed the issue. I've run City hub for about 30 minutes (up to a count of 6000) without hanging.

I can kind of see how this could make a difference. The hang was probably caused by somehow getting into an infinite IRQ handler loop (interrupt isn't cleared, so handler is called again as soon as it exits).

@dlech
Copy link
Member

dlech commented May 1, 2021

Move hub (which shares the same UART driver as City hub) got up to 3378 before locking up. However, this was not the total lock/watchdog reboot as seen before, but rather just the Bluetooth locked up like #306.

@dlech dlech added hub: cityhub Issues related to the LEGO Powered Up Smart hub (Hub No. 4) hub: movehub Issues related to the LEGO BOOST Move hub labels May 1, 2021
dlech added a commit to pybricks/pybricks-micropython that referenced this issue May 1, 2021
It appears that reading the USART SR register clears some flags. In
certain cases, this would cause the interrupt handler to go into an
infinite loop.

By reading the SR register only once, we ensure that all flags are
seen and handled properly.

Issue: pybricks/support#304
@dlech
Copy link
Member

dlech commented May 1, 2021

Applying the same fix to Prime hub got it to run much longer, but I still hit the watchdog reset after 2488 counts.

@dlech dlech removed hub: cityhub Issues related to the LEGO Powered Up Smart hub (Hub No. 4) hub: movehub Issues related to the LEGO BOOST Move hub labels May 20, 2021
@dlech
Copy link
Member

dlech commented May 20, 2021

Recapping as I understand it:

  • Issues with hubs other than Prime/Robot Inventor turned out to be unrelated to UART and have mostly been fixed and have separate issues open for the few cases where there is still funny behavoir.
  • With the watchdog timer reset implemented on all hubs now, we can identify this issue by the fact that the hub reboots itself. If it does not reboot itself, it is a different issue.
  • This issue happens less frequently on Prime/Robot Inventor since pybricks/pybricks-micropython@c661f6d but is still not completely fixed.

@laurensvalk
Copy link
Member Author

In cases where UART is involved, could there be some race condition with uart writes, to set values and to send a keep-alive message at nearly the same time?

@dlech
Copy link
Member

dlech commented Sep 15, 2021

If we are getting a watchdog reset, then we are getting into an infinite loop somewhere. Most likely, the interrupt handler.

pbdrv_uart_write_begin() has code to protect against concurrent calls (returns PBIO_ERROR_AGAIN), so unless that code is incorrect, this shouldn't be a problem.

@laurensvalk
Copy link
Member Author

Issues with hubs other than Prime/Robot Inventor turned out to be unrelated to UART and have mostly been fixed and have separate issues open for the few cases where there is still funny behavoir.

Just to confirm, I'll add that this still seems to be the case. The Color Light Matrix has problems on the Prime Hun and the Essential Hub, but I've not seen them on Technic Hub yet.

dlech added a commit to pybricks/pybricks-micropython that referenced this issue Sep 17, 2021
There were a couple of places where we are looping waiting for
!PBIO_ERROR_AGAIN without calling the Contiki event loop. When
this condition was hit, no events would be processed, resulting
in an infinite loop leading to a watchdog timer reset.

Issue: pybricks/support#304
@dlech dlech self-assigned this Sep 17, 2021
@dlech
Copy link
Member

dlech commented Sep 17, 2021

so unless that code is incorrect

Found the incorrect code causing the infinite loop at a slightly higher level. The fact that it only happened reliably while printing on hubs using btstack seems to be a coincidence of the timing of those specific hubs and Pybricks firmware config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime)
Projects
None yet
Development

No branches or pull requests

4 participants