-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HARDWARE ISSUE: Failures with SOQuartz & TuringPi2 #40
Comments
Do they have schematics available somewhere? The way the CM4 image works might not be compatible with this carrier board, and I'd need to take a closer look at how things are wired up to figure out what's going on. Though I assume the SATA is done through a PCIe SATA controller, which might make this related to the PCIe ranges bug which I should finally upstream the fix for. |
Reaching out to folks to see if they can provide schematics. Also forgot to mention this issue that seems related. wenyi0421/turing-pi#13 although this image appears to be u-boot and able to successfully load. |
Hi! Node 3 indeed has a SATA controlled hooked up using the PCIe. The chip used is ASM1061. It works fine (and out of the box) with CM4 + Raspberry Pi OS and CM4 + DietPi. Raspberry Pi + Ubuntu needs an additional package ( The schematics are not available publicly but do not hesitate to ask any follow-up questions if you have any. |
Okay, I'll make a patched plebian devicetree package with the PCIe ranges fix in the coming days and have you try that out, it's a shot in the dark but it might be related. Basically, right now, the memory ranges set for PCIe are a bit scuffed in the mainline kernel, which wreaks havoc with some PCIe devices. |
Okay, here's a devicetree deb for you to try out with fixed PCIe ranges: https://overviewer.org/~pillow/up/75bea78e59/devicetrees-plebian-quartz64-20230601130309-arm64.deb Install with Let me know if this improves things in any way. |
Reimaged each of the SOQuartz and applied that package. Kubernetes and Longhorn have been running stable for the last 2 hours. Will follow up tomorrow and let you know if anything comes up. |
Had a couple of errors this morning. Decided to swap some of the modules around and noticed the problem following one specific module. Installed it into slot 1 which has HDMI output and captured this. From there I also grabbed some of the logs from
and another snippet
Currently running some memtester commands to try testing memory. |
Closing ticket. This problem is hardware failure on one SOQuartz module hardware. The same module is failing no matter which slot it is in. Running memtester I was able to have the node hang/crash. Rotated the other 2 modules I have through slot 3 in the TuringPi and each one ran 3+ hours without any issue. Appreciate your help troubleshooting this issue! |
EDIT: PROBLEM WAS ISOLATED TO ONE COMPUTE MODULE. IDENTIFIED TO BE HARDWARE ISSUE.
Hey!
I recently got started with the Plebian on SOQuartz compute modules hosted in a TuringPi2. Everything appears to work except when utilizing SOQuartz slot3 of the TuringPi2, where the compute module will crash along while also bringing down the rest of the networking on the TuringPi2.
Able to reproduce with the following:
Compute Module crashes & all devices on TuringPi2 become unreachable. Everything recovers when the module in slot3 is powered off. Powering back on the compute module in slot3 causes another crash with ~15 minutes.
I have no issues with SOQuartz in slots 1,2, and 4. The major difference between those and slot 3 is two SATA ports exposed. https://help.turingpi.com/hc/en-us/articles/8685766680477-Specifications-and-I-O-Ports. M.2 is not exposed in any port because there is only one PCIE lane on SOQuartz. https://turingpi.com/product/cm4-adapter/ is being used for connecting the SOQuartz in.
Unsure if it is related but there is an warning during the open-iscsi installation.
The text was updated successfully, but these errors were encountered: