Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p16s; unstable after resuming from sleep #1152

Open
130s opened this issue Jan 27, 2025 · 6 comments
Open

p16s; unstable after resuming from sleep #1152

130s opened this issue Jan 27, 2025 · 6 comments
Labels
comp_ubuntu problem_broken-unfunctional-cando Can't do XX because existing functionality is broken

Comments

@130s
Copy link
Collaborator

130s commented Jan 27, 2025

As far as I'm aware, these are the sequence of what I have done and the events I'm seeing.

  1. p16s put to sleep.
  2. I swapped the Thunderbolt-3 cables with a new longer one.
  3. p16s put to resume. Didn't show graphics, just an underscore at the top-left.
  4. I found ssh was active so logged on remotely. Did apt update && apt dist-upgrade, saw everal pkgs got upgraded.
  5. I forced rebooting by power key.
  6. Before or while the device was booting I connected another device, an external SSD Connect SATA SSD from kudu1 to p16s #871 (comment) via USB3.1-PCIe conversion cable.
  7. p16s showed an underscore at the top-left again after the initial BIOS screen, didn't even see Ubuntu log I guess.
  8. p16s was accepting ssh connection so I logged in, ran sudo shutdown.
  9. I unplugged both Thunderbolt-3 (eGPU) and USB3.1 (external SSD).
  10. p16s booted w/o issue.
  11. (Cont'd)
@130s 130s added problem_broken-unfunctional-cando Can't do XX because existing functionality is broken comp_ubuntu labels Jan 27, 2025
@130s
Copy link
Collaborator Author

130s commented Jan 27, 2025

So Ubuntu recognizes the eGPU device as "eGPU". Didn't know that.
$ ack -ir egpu syslog            
Jan 27 10:59:55 130s-p16s systemd[1]: Starting EGPU Service...                
Jan 27 10:59:55 130s-p16s egpu-switcher[1826]: [info] looking for eGPU...     
Jan 27 10:59:56 130s-p16s egpu-switcher[1826]: [info] the egpu is connected   
Jan 27 10:59:56 130s-p16s egpu-switcher[1826]: [info] egpu has been added to X.Org config        
Jan 27 10:59:56 130s-p16s egpu-switcher[1826]: [ok] switch completed            
Jan 27 10:59:56 130s-p16s systemd[1]: egpu.service: Deactivated successfully. 
Jan 27 10:59:56 130s-p16s systemd[1]: Finished EGPU Service.              
Jan 27 11:06:08 130s-p16s systemd[1]: Starting EGPU Service...                
Jan 27 11:06:09 130s-p16s egpu-switcher[1829]: [info] looking for eGPU...                  
Jan 27 11:06:14 130s-p16s egpu-switcher[1829]: [info] giving up after 6 retries                  
Jan 27 11:06:14 130s-p16s egpu-switcher[1829]: [info] the egpu is disconnected
Jan 27 11:06:14 130s-p16s egpu-switcher[1829]: [info] egpu has been removed from X.Org config    
Jan 27 11:06:14 130s-p16s egpu-switcher[1829]: [ok] switch completed          
Jan 27 11:06:14 130s-p16s systemd[1]: egpu.service: Deactivated successfully.      
Jan 27 11:06:14 130s-p16s systemd[1]: Finished EGPU Service.       
Jan 27 11:55:29 130s-p16s systemd[1]: Starting EGPU Service...    
Jan 27 11:55:30 130s-p16s egpu-switcher[1867]: [info] looking for eGPU...
Jan 27 11:55:32 130s-p16s egpu-switcher[1867]: [info] the egpu is connected
Jan 27 11:55:32 130s-p16s egpu-switcher[1867]: [info] egpu has been added to X.Org config
Jan 27 11:55:32 130s-p16s egpu-switcher[1867]: [ok] switch completed
Jan 27 11:55:32 130s-p16s systemd[1]: egpu.service: Deactivated successfully.
Jan 27 11:55:32 130s-p16s systemd[1]: Finished EGPU Service.
Jan 27 12:02:38 130s-p16s systemd[1]: Starting EGPU Service...
Jan 27 12:02:39 130s-p16s egpu-switcher[1833]: [info] looking for eGPU...
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [info] giving up after 6 retries
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [info] the egpu is disconnected
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [info] egpu has been removed from X.Org config
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [ok] switch completed
Jan 27 12:02:44 130s-p16s systemd[1]: egpu.service: Deactivated successfully.
Jan 27 12:02:44 130s-p16s systemd[1]: Finished EGPU Service.
Jan 27 12:13:54 130s-p16s systemd[1]: Starting EGPU Service...
Jan 27 12:13:54 130s-p16s egpu-switcher[2046]: [error] unable to read pci information from sysfs: got error while scanning device '0000:50:00.0': the pci 'config' file has an invalid format

Actually I doubt Ubuntu does recognize eGPU as eGPU. I guess eGPU is the information that the custom tool I used when enabling eGPU for the first time #1128 (comment) supplied to Ubuntu, I think.

So re-running it now.

$ sudo egpu-switcher enable     
[info] created egpu bootup service to autorun 'egpu-switcher switch'
[ok] setup successful

Hm, output is way simpler this time. NVIDIA-SMI still didn't show the eGPU device.

Rebooting OS.

@130s
Copy link
Collaborator Author

130s commented Jan 27, 2025

OS got stuck at the BIOS + Ubuntu logo. Rebooted but this time I unplugged the thunderbolt. It came up w/o issues as expected.

WithOUT eGPU's thunderbolt:
$ lspci
00:00.0 Host bridge: Intel Corporation Device 4621 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
00:04.0 Signal processing controller: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant (rev 02)
00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
00:06.2 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #2 (rev 02)
00:07.0 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 02)
00:07.2 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02)
00:0a.0 Signal processing controller: Intel Corporation Platform Monitoring Technology (rev 01)
00:0d.0 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller (rev 02)
00:0d.2 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 (rev 02)
00:0d.3 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #1 (rev 02)
00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01)
00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01)
00:14.3 Network controller: Intel Corporation Alder Lake-P PCH CNVi WiFi (rev 01)
00:15.0 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 (rev 01)
00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01)
00:16.3 Serial controller: Intel Corporation Device 51e3 (rev 01)
00:19.0 Serial bus controller: Intel Corporation Alder Lake-P Serial IO I2C Controller #0 (rev 01)
00:1f.0 ISA bridge: Intel Corporation Alder Lake PCH eSPI Controller (rev 01)
00:1f.3 Audio device: Intel Corporation Alder Lake PCH-P High Definition Audio Controller (rev 01)
00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (16) I219-LM (rev 01)
02:00.0 Non-Volatile memory controller: SK hynix Device 1959
03:00.0 3D controller: NVIDIA Corporation Device 1fb7 (rev a1)
Now, plugged back in the thunderbolt.
$ lspci
00:00.0 Host bridge: Intel Corporation Device 4621 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
00:04.0 Signal processing controller: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant (rev 02)
00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
00:06.2 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #2 (rev 02)
00:07.0 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 02)
00:07.2 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02)
00:0a.0 Signal processing controller: Intel Corporation Platform Monitoring Technology (rev 01)
00:0d.0 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller (rev 02)
00:0d.2 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 (rev 02)
00:0d.3 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #1 (rev 02)
00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01)
00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01)
00:14.3 Network controller: Intel Corporation Alder Lake-P PCH CNVi WiFi (rev 01)
00:15.0 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 (rev 01)
00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01)
00:16.3 Serial controller: Intel Corporation Device 51e3 (rev 01)
00:19.0 Serial bus controller: Intel Corporation Alder Lake-P Serial IO I2C Controller #0 (rev 01)
00:1f.0 ISA bridge: Intel Corporation Alder Lake PCH eSPI Controller (rev 01)
00:1f.3 Audio device: Intel Corporation Alder Lake PCH-P High Definition Audio Controller (rev 01)
00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (16) I219-LM (rev 01)
02:00.0 Non-Volatile memory controller: SK hynix Device 1959
03:00.0 3D controller: NVIDIA Corporation Device 1fb7 (rev a1)
50:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
51:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
52:00.0 VGA compatible controller: NVIDIA Corporation Device 2805 (rev a1)
52:00.1 Audio device: NVIDIA Corporation Device 22bd (rev a1)

Difference of the 2 results above:

$ diff /tmp/without-thunderbolt-egpu.txt /tmp/pci_with-egpu.txt 
26a27,30
> 50:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
> 51:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
> 52:00.0 VGA compatible controller: NVIDIA Corporation Device 2805 (rev a1)
> 52:00.1 Audio device: NVIDIA Corporation Device 22bd (rev a1)

The following is the file referred to at Custom X11 config. The path /usr/share/egpu-switcher/x11-template.conf doesn't exist on my host, instead I see /usr/share/egpu-switcher/egpu.service owned by root.

$ sudo ls -lath /usr/share/egpu-switcher/
total 20K
-rw-r--r--   1 root root  217 Jan 27 13:01 egpu.service
drwxr-xr-x 388 root root  12K Jan 21 07:47 ..
drw-r--r--   2 root root 4.0K Dec 17 12:29 .

$ sudo more /usr/share/egpu-switcher/egpu.service 
# generated by egpu-switcher
[Unit]
Description=EGPU Service
Before=display-manager.service
After=bolt.service

[Service]
Type=oneshot
ExecStart=/usr/bin/egpu-switcher switch auto

[Install]
WantedBy=graphical.target

I don't know if this issue is happening due to a bad config, and if so which config files needs updated.

For now switching back the old thunderbolt cable and see if that fixes.

@130s
Copy link
Collaborator Author

130s commented Jan 27, 2025

For now switching back the old thunderbolt cable and see if that fixes.

Suprisingly, old cable worked w/o issues, Ubuntu recognized eGPU.
Then sudo poweroff, swapped the cable, start the computer and it got stuck again at Lenovo-Ubuntu logo.

Different Thunderbolt cable causing this issue??? 🤯

@130s
Copy link
Collaborator Author

130s commented Jan 27, 2025

Running out of time, sticking w/the old cable for now.

@130s
Copy link
Collaborator Author

130s commented Jan 27, 2025

This new long red thunderbolt-3 cable seems to be working ok as a USB-3, at least my Android phone is detected (as a USB device, not as a PCIe of course).
$ lsusb
Bus 004 Device 004: ID 05e3:0749 Genesys Logic, Inc. SD Card Reader and Writer
Bus 004 Device 003: ID 05e3:0616 Genesys Logic, Inc. hub
Bus 004 Device 002: ID 05e3:0616 Genesys Logic, Inc. hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 006: ID 058f:9540 Alcor Micro Corp. AU9540 Smartcard Reader
Bus 003 Device 011: ID 046d:c404 Logitech, Inc. TrackMan Wheel
Bus 003 Device 012: ID 17ef:60ee Lenovo TrackPoint Keyboard II
Bus 003 Device 008: ID 05e3:0610 Genesys Logic, Inc. Hub
Bus 003 Device 005: ID 05e3:0610 Genesys Logic, Inc. Hub
Bus 003 Device 010: ID 2109:2817 VIA Labs, Inc. USB2.0 Hub             
Bus 003 Device 007: ID 2109:0102 VIA Labs, Inc. USB 2.0 BILLBOARD             
Bus 003 Device 004: ID 2109:2817 VIA Labs, Inc. USB2.0 Hub             
Bus 003 Device 003: ID 174f:1812 Syntek Integrated Camera
Bus 003 Device 002: ID 06cb:00f9 Synaptics, Inc. 
Bus 003 Device 009: ID 8087:0033 Intel Corp. 
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 004: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 002 Device 005: ID 18d1:4ee1 Google Inc. Nexus/Pixel Device (MTP)
Bus 002 Device 003: ID 2109:0817 VIA Labs, Inc. USB3.0 Hub             
Bus 002 Device 002: ID 2109:0817 VIA Labs, Inc. USB3.0 Hub             
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

@130s
Copy link
Collaborator Author

130s commented Jan 27, 2025

I'm seeing Ubuntu gets stuck upon boot with a thunderbolt-3 cable, but with a different thunderbolt-3 cable, it finishes booting. Does Ubuntu recognize different thunderbolt cables?

Jan 27 12:02:39 130s-p16s egpu-switcher[1833]: [info] looking for eGPU...
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [info] giving up after 6 retries
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [info] the egpu is disconnected
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [info] egpu has been removed from X.Org config
Jan 27 12:02:44 130s-p16s egpu-switcher[1833]: [ok] switch completed
Jan 27 12:02:44 130s-p16s systemd[1]: egpu.service: Deactivated successfully.

Clearly the eGPU service created by the tool #1152 (comment) is failing to find an eGPU device, and that seems to be somehow blocking the entire OS to finish the bootup/initialization process.

Jan 27 12:13:54 130s-p16s egpu-switcher[2046]: [error] unable to read pci information from sysfs: got error while scanning device '0000:50:00.0': the pci 'config' file has an invalid format

I guess, part of the challenge now with the new thunderbolt cable is that Ubuntu gets stuck with eGPU if the eGPU + the new thunderbolt cable is connected upon booting. Because it gets stuck, Ubuntu wouldn't advance enough so that I could re-configure with whatever the new setting.
So, I'll have to bypass the point where Ubuntu gets stuck, in order to adjust to whatever the new cable provides (despite I still haven't understood if a new thunderbolt cable would cause this issue).

Based on this, course of actions to try:

  1. Shutdown Ubuntu.
  2. Unplug the eGPU + the new thunderbolt cable.
  3. Boot Ubuntu, let it finish so your user is logged on.
  4. Wipe out eGPU setting that was done by the custom tool p16s; unstable after resuming from sleep #1152 (comment)
  5. Plug the eGPU + the new thunderbolt cable back in.
  6. Re-run the custom eGPU tool p16s; unstable after resuming from sleep #1152 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp_ubuntu problem_broken-unfunctional-cando Can't do XX because existing functionality is broken
Projects
None yet
Development

No branches or pull requests

1 participant