-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dell XPS 15 9560: System crashes when turning GPU on/off repeatedly #148
Comments
Those issues are quite known/expected I think. Did you take a look at #140? |
Something else is at work here. I added the suggest workaround pcie_port_pm=off to my grub config and it didn't help. The system still freezes during bootup, probably when bumblebeed is loaded. |
Update: starting X will lock up the machine if the Nvidia device is OFF. So bbswitch itself seems to work, but X won't start. Neither nouveau or nvidia kernel module is loaded. Does X try to talk to the PCI device, somehow? |
Sorry, answered in a hurry. I have two laptop at the moment: – Dell XPS 9530, which works OK with Bumblebee, not sure with bbswitch alone, should test that. Both are affected by #140 btw. I didn’t have the time until now to look into those issues, but I’ll do so next week. I did some tests with nouveau too, but don’t remember the results, and a new kernel requires new tests. ;) I’ll keep you updated, and we’ll see with @Lekensteyn what can be done then. |
Something loads the nvidia kernel module when I start sddm. I blame X for that, yet the Xorg logs mention nothing. That seems to be the reason why my laptop hangs when starting X with the dGPU turned off. From bug #140 I gather that support for PCIe PM is in the works and that will eventually replace the DSM method? |
The hang issue is possibly not related to pcie_port_pm, but https://bugzilla.kernel.org/show_bug.cgi?id=156341 (Bumblebee-Project/Bumblebee#764). Try the acpi_osi workaround listed there. |
@Lekensteyn OK that’s definitively it for me (my machine is already listed by someone else). Looks like I’ve been missing a lot of fun lately (my comment review queue for Bumblebee is at 1098 right now…). I’ve added myself to the CC list. |
For me Github still shows 143 unread notifications for Bumblebee (none for bbswitch), but there are lots of open issues that still need a response (luckily users are helping each other). Hmm |
I think I've given answers to all Ubuntu/Debian related issues, if I missed some please do tag me |
I am also seeing a hang on startup too on my Precision 5200 running Ubuntu with 4.9.8, if I do no install the module and manually load it everything is fine, but if I install it and reboot my machine hangs before getting to the login prompt with some ACPI related errors. acpi_osi=! acpi_osi=Windows 2009 got me booting but things like backlight did not work, for now I am just running a |
I also have the startup issue on Dell XPS 9560, and it seems to be mainly related to X, as it does not seem to affect Wayland. |
I can confirm this @zeraien I just installed 3.22 and moved over to it today using wayland, I can boot fine with bbswitch installed although my system does lock up a short time after booting. |
I also have issues with freezing while in Wayland, but it seems to be related to the launching of older X-based applications (ex: steam, wine), which probably start the X machinery and this probably triggers the nvidia driver/bbswitch and thus you get a crash. One way I avoided a crash with X, is by forcing bbswitch to turn the dGPU ON, before logging into X. So it seems related to the card being OFF and then X starts and crashes. I'm on Fedora 25 (kernel 4.9.8), bumblebee 3.2.1, Nvidia 1050, Dell XPS 9560 |
If I just manually load bbswitch |
@stefansedich can you explain what exactly you're doing with |
Sure @pronobis so basically I did not run make install, I booted to recovery mode, then ran |
Does that mean that you can run the commands that cause freeze after you load bbswitch? For instance, in my case it's nvidia-smi, lspci, lshw etc. |
I don't seem to get any freeze running lspci or lshw, I could be wrong here
mind you but I can tell from powertop that my nvidia card is turned off so
have assumed that it was all working as expected.
…On Tue, Feb 14, 2017 at 8:31 PM Andrzej Pronobis ***@***.***> wrote:
Does that mean that you can run the commands that cause freeze after you
load bbswitch? For instance, in my case it's nvidia-smi, lspci, lshw etc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIqVAp6as6eb_Iw_3N1fnRmgjNXAlfIks5rcn-IgaJpZM4L3MIz>
.
|
@stefansedich Thanks for the info. Just to summarize:
What is your hardware? |
I am actually on wayland now but was working under X too, cat
/proc/acpi/bbswitch gives me a file not found, but looking at gpu-manager I
can see it turns the card off, and looking at my idle watts it appears the
card is off as if I reboot running the nvidia my watts are 5w or so higher.
And yes can run lscpi many times fine, my hardware is a Precision 5200 so
might be a different story due to the different card.
…On Wed, Feb 15, 2017 at 1:12 PM Andrzej Pronobis ***@***.***> wrote:
@stefansedich <https://github.com/stefansedich> Thanks for the info. Just
to summarize:
- cat /proc/acpi/bbswitch gives OFF
- you are on X, not wayland
- you can run lspci several times in a row and no freeze happens?
What is your hardware?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIqVJmj5495gMIYepk1MOgpN5-QLD0rks5rc2o_gaJpZM4L3MIz>
.
|
I am using |
@stefansedich The fact that /proc/acpi/bbswitch gives you file not found means your bbswitch module is likely not loaded. I think I also did not have the problem with simply no NVidia driver installed and used. The problem exists only if you power down a card that is managed by an nvidia driver (nvidia-smi displays the card) and X is using the Intel card with nvidia-prime. The bug is likely to be the same or related to: https://bugzilla.kernel.org/show_bug.cgi?id=156341 I would urge all Kaby Lake Dell laptop users to post there, since this issue has been previously reported mostly for SkyLake laptops. |
There is a patch in https://bugzilla.kernel.org/show_bug.cgi?id=156341 that solves the freeze when applied against kernel 4.10. |
I tried the patch with Linux 4.10 and my Dell XPS 9560 is not freezing anymore. Nice! However, the nvidia driver version 378.13 doesn't seem to work. It gives some errors and I couldn't make them go away yet. The errors are:
I should add that the nvidia module is not unloaded and the card is not turned off again if primusrun fails. |
Downgrading the nvidia driver to 375.39 worked. I can now play games such as Dota 2 with my Nvidia GTX 1050. |
Interesting since upgrading to 4.10 I just installed the module again and rebooted, and everything is working as expected! no lockup on boot for me now. |
@stefansedich that's weird... aparrently the fix has not been accepted upstream:
|
For what it's worth, it doesn't solve the issue for me - kernel 4.9.11, bios 1.1.3. I still get the hard lock when I switch the nvidia card off. |
OK, I tried the kernel option |
For those playing at home...your Kernel must include the
One kernel rebuild later, and setting |
On NixOS I was getting kernel panics as well on XPS 9560. Following works:
|
I also own the dell xps 15 9560, running ubuntu 17.04. Everything I read about managing the built-in graphic chip and the nvidia graphic card confuses me. Is there a working solution today that is safe and does not require expert linux skills? Could someone write a summary of the steps to take? Thanks a lot! |
@saroele can't help with Ubuntu specifics but essentially what you currently need (and keep in mind that this is a dynamic situation) is to install the latest NVIDIA drivers, install bumblebee and install bbswitch. This is documented in the Arch Linux wiki and at least the general idea can be applied to any distro, you only have to then use your own package manager, etc. Once you have these three moving parts, your dedicated GPU should be off by default and thus not using power. Whenever you want to run an app on the dedicated GPU, you run it with "optirun <whatever_app>". So if you'd like to run Chrome with dedicated GPU, you'd run optirun chrome. This should get you a basic working system. What some users experience is that sometimes switching the GPU on and/or off will cause the system to freeze. To get rid of that, you can set the kernel parameter Personally I'm running Arch Linux which is great because it polishes your Linux skills and gives you a better understanding of what's running where. It does also cost quite some initial time to understand it so tread carefully. An alternative is Manjaro Linux, which is Arch with training wheels. You'll get all the benefits of Arch and quite some helpers to get you up and running. |
I've got the 9560 FHD running Ubuntu 17.04 and nvidia-375 and I can confirm that adding the Here's the benchmarking results: http://openbenchmarking.org/result/1706111-TR-XPS95602060 I'm posting here because this came up as one of the search results while googling :) BTW this guy https://www.reddit.com/r/Dell/comments/63cavx/fixed_nvidia_1050_freezing_in_ubuntu_linux/ has got good instructions for whoever wants to try this out. I'm not sure what are the side effects of this kernel option is though, so I consider this to be a temporary issue until Ubuntu ships us a updated kernel with the proper patches. |
Have the same bug on 4.12_rc6 kernel on Gentoo: https://bugs.freedesktop.org/show_bug.cgi?id=101553 |
Nobody in kernel/nouveau seems to working on this (no even requests for debugging/testing/more information, whatever). So do not expect any fixes soon. See this bug for tracking: https://bugs.acpica.org/show_bug.cgi?id=963 For those who wants to try nouveau on this laptop - there are issues too: https://bugs.freedesktop.org/show_bug.cgi?id=100228 and https://bugzilla.kernel.org/show_bug.cgi?id=156341 |
I just wanted to report that I found a change to this since recently. While the kernel parameter worked for me as well, it does no longer. I'm on manjaro stable kernel 4.13, and since the last update my system will hang on boot with the override set to 1. Luckily I have not experienced the original issues again since I have disabled it again to be able to boot. Just wanted to leave this here if others find themselves in a similar situation. |
I can report that 4.13 does break again and for me |
My Dell XPS 9560 works without problems with 4.13.x. I'm just using |
@skoehler so |
On my Dell XPS 9560 with Linux 4.13.3-1 I have Both attempts yield the same output in
|
My kernel parameters are |
Apparently kernel 4.13.6 doesn't work for me, could be unrelated to bumblebee since it won't boot even with ON or OFF. |
@domenkozar sounds like an unrelated problem? Unless "won't boot" means that it does not get to the login screen or something. What was the last working kernel version? |
FWIW, my brother also has this laptop (Dell Inc. XPS 15 9560/05FFDN, BIOS 1.3.3 05/08/2017) now and I can confirm issues with it (using Arch Linux with Linux 4.13.7). The One major issue that it sucks much battery during system sleep (51% in 9h, approx. 49Wh with the extended battery model). To be investigated later. |
Just upgraded vom 4.13.4 to 4.13.11 and my systems hangs during boot. Probably due to this issue. So disregard my above comment that everything is working fine with 4.13.x. Clearly, something is broken after 4.13.4. |
See also this pull request - should improve the situation after landing in the kernel acpica/acpica#330 |
I've reproduced this with a blacklisted nouveau and no bbswitch/bumblebee. I use the following to turn off the GPU:
logs
|
For everyone who had the issues with ACPI that required acpi_rev_override - important fixes were merged into the mainline kernel git, so starting from 4.17-rc2 kernel you can try without ACPI override option. I have tried with Dell XPS 9560 and it really works. |
@XVilka That's great news! Do you happen to know which exact patches from 4.17-rc2 fix the issue? |
@chenxiaolong I don't know exact commit - the MLC fixes landed in multiple commits, but it is mostly updated |
@XVilka I just upgraded from 4.16 to 4.17.2 (Fedora 28), but unfortunately it still doesn't work without acpi_rev_override. The system still hangs before the login screen. |
Very strange, can you record dmesg for exact ACPI error?
…On Mon, Jun 25, 2018, 10:12 PM PumbaPe ***@***.***> wrote:
@XVilka <https://github.com/XVilka> I just upgraded from 4.16 to 4.17.2
(Fedora 28), but unfortunately it still doesn't work without
acpi_rev_override. The system still hangs before the login screen.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMZ_Sp6LZHa-9JOQ3jjFOKtKdqF6Nl9ks5uAO-3gaJpZM4L3MIz>
.
|
@XVilka Actually there aren't any errors. The log shows exactly the same messages regarding ACPI (except the boot parameter of course). The last couple of messages from a failed boot:
And from a successful boot:
The only error in the whole log is
It's the same for a successful boot though. |
This commit adds document the `acpi_rev_override` kernel parameter. Setting this parameter is necessary on some computers to avoid black screen at boot time. See Bumblebee-Project#148
Using Arch Linux, updated to the latest versions, I still had to use |
On my brand new Dell XPS 9560, the bbswitch kernel module loads and repeatedly turning the GPU on and off lets the system hang. I don't know why and I can't see any kernel output, unfortunately. I'm using kernel 4.9.7.
My Distro is Gentoo. When booting with systemd, my system would actually hang before I could reach the login prompt. (Probably because the bumblebee daemon tries to disable the GPU). When booting with OpenRC (very serial booting process), I could reach the graphical login primpt, but turning GPU on and off repeatedly would result in a system freeze and crash. I just an echo ON/OFF > /proc/acpi/bbswitch.
It might be that the new 10 series GPU come with new ACPI tables for turning on/off the GPU. On a Dell XPS 15 9550 (960M GPU) everything was working fine.
The text was updated successfully, but these errors were encountered: