Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures setting frequency scaling on Android devices. #281

Open
jcgregorio opened this issue Oct 5, 2016 · 9 comments
Open

Failures setting frequency scaling on Android devices. #281

jcgregorio opened this issue Oct 5, 2016 · 9 comments

Comments

@jcgregorio
Copy link

We are seeing failures setting the frequency scaling, and this is consistent across the 3 Nexus 9's we are running. A quick spot check and I found this happening on other devices also, such as the Nexus 7:

https://chromium-swarm.appspot.com/newui/bot?id=skia-rpi-057&selected=1&sort_stats=total%3Adesc
https://chromium-swarm.appspot.com/newui/bot?id=skia-rpi-002&selected=1&sort_stats=total%3Adesc

Nexus 9 event log:

Message Type Timestamp Task ID Version
Periodic reboot: running longer than 44976s bot_rebooting 10/5/2016, 12:23:46 PM (EDT) e4166244
1/19.PullContent(/system/build.prop): Could not send data to (timeout 10000ms): USBErrorIO() bot_error 10/5/2016, 8:28:21 AM (EDT) e4166244
<Device 1/14 HT496JT00270>: Failed to set CPU frequency bot_log 10/5/2016, 6:55:33 AM (EDT) 31ad4c82d2c49f11 e4166244
Periodic reboot: running longer than 39776s bot_rebooting 10/4/2016, 10:40:38 PM (EDT) e4166244
<Device 1/13 HT496JT00270>: Failed to set CPU frequency bot_log 10/4/2016, 6:27:04 PM (EDT) 31aa9f28f250da11 e4166244
<Device 1/5 HT496JT00270>: Failed to set CPU frequency bot_log 10/4/2016, 10:36:22 AM (EDT) 31a8f0f29b30ca11 e4166244
Periodic reboot: running longer than 49023s bot_rebooting 10/4/2016, 10:31:59 AM (EDT) e4166244
<Device 1/26 HT496JT00270>: Failed to set CPU frequency bot_log 10/4/2016, 4:50:29 AM (EDT) 31a7b47af83a4f11 e4166244
<Device 1/18 HT496JT00270>: Failed to set CPU frequency bot_log 10/3/2016, 8:44:50 PM (EDT) 31a5f7957581db11 e4166244
Bot self-quarantined request_update 10/3/2016, 8:36:04 PM (EDT) 290a16d3
<Device 1/14 HT496JT00270>: Failed to set CPU frequency bot_log 10/3/2016, 4:10:32 PM (EDT) 31a4fbf909d06411 290a16d3
Periodic reboot: running longer than 44851s bot_rebooting 10/3/2016, 7:34:41 AM (EDT) 02aa9441
Periodic reboot: running longer than 40289s bot_rebooting 10/2/2016, 7:05:30 PM (EDT) 02aa9441
Periodic reboot: running longer than 46965s bot_rebooting 10/2/2016, 7:52:57 AM (EDT) 02aa9441
Periodic reboot: running longer than 35116s bot_rebooting 10/1/2016, 6:48:39 PM (EDT) 02aa9441

Nexus 7 event log:

Message Type Timestamp Task ID Version
Bot self-quarantined request_update 10/5/2016, 4:40:21 PM (EDT) e4166244
Periodic reboot: running longer than 35336s bot_rebooting 10/5/2016, 6:22:24 AM (EDT) e4166244
Periodic reboot: running longer than 39934s bot_rebooting 10/4/2016, 7:48:24 PM (EDT) e4166244
Periodic reboot: running longer than 39144s bot_rebooting 10/4/2016, 8:35:10 AM (EDT) e4166244
Bot self-quarantined request_update 10/3/2016, 9:24:25 PM (EDT) 290a16d3
<Device 1/10 AG86044202A04GC>: Failed to set CPU frequency bot_log 10/3/2016, 4:32:29 PM (EDT) 31a510914794f811 290a16d3
Execution failed: HTTPSConnectionPool(host='isolateserver.storage.googleapis.com', port=443): Read timed out. task_error 10/3/2016, 4:29:37 PM (EDT) 31a4f1c7aced8811 02aa9441
Periodic reboot: running longer than 40250s bot_rebooting 10/3/2016, 2:03:28 PM (EDT) 02aa9441

@nodirt
Copy link
Member

nodirt commented Oct 5, 2016

@maruel

@maruel
Copy link
Member

maruel commented Oct 5, 2016

@erock2112 can you take a look? @bpastene is it something still often observed?

@erock2112
Copy link
Contributor

There are two problems here: failing to set the scaling mode, and the fact that the device is allowed to run tasks even though setting the CPU scale failed. As for the first, here's a snippet from adb.log:

733 2016-10-06 04:04:39.794 D: 1/27.Write(WRTE, 24, 520, 'RECV5\x00\x00\x00/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor')
733 2016-10-06 04:04:39.802 D: 1/27.Read(OKAY, 520, 24)
733 2016-10-06 04:04:39.811 D: 1/27.Read(WRTE, 520, 24, 'DATA\x0c\x00\x00\x00interactive\nDONE\x00\x00\x00\x00')
733 2016-10-06 04:04:39.820 D: 1/27.Write(OKAY, 24, 520)
733 2016-10-06 04:04:39.828 D: 1/27.Write(CLSE, 24, 520)
733 2016-10-06 04:04:39.835 D: 1/27.Read(CLSE, 0, 24)
733 2016-10-06 04:04:39.843 D: 1/27.Write(OPEN, 25, 0, 'sync:\x00')
733 2016-10-06 04:04:39.850 D: 1/27.Read(OKAY, 521, 25)
733 2016-10-06 04:04:39.859 D: 1/27.Write(WRTE, 25, 521, 'SEND;\x00\x00\x00/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor,33272DATA\n\x00\x00\x00userspace\nDONEW\xcd\xf5W')
733 2016-10-06 04:04:39.868 D: 1/27.Read(OKAY, 521, 25)
733 2016-10-06 04:04:39.876 D: 1/27.Read(WRTE, 521, 25, 'FAIL\x11\x00\x00\x00Permission denied')
733 2016-10-06 04:04:39.886 D: 1/27.Write(OKAY, 25, 521)
733 2016-10-06 04:04:39.886 I: 1/27.SetCPUScalingGovernor(): Failed to push userspace in /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
733 2016-10-06 04:04:39.895 D: 1/27.Write(OPEN, 26, 0, 'shell:echo "userspace" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ;echo -e "\n$?"\x00')
733 2016-10-06 04:04:39.903 D: 1/27.Read(OKAY, 522, 26)
733 2016-10-06 04:04:39.911 D: 1/27.Read(WRTE, 522, 26, "/system/bin/sh: can't create /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: Permission denied\r\n")

Are we sure we have root access on this device? @kjlubick

@erock2112
Copy link
Contributor

We're getting to the point where we have enough devices that it's hard to notice when this fails. IMO the device should be quarantined in this situation.

@kjlubick
Copy link
Contributor

kjlubick commented Oct 7, 2016

All but one of our devices had root access. I am fixing the one that does not right now.

@erock2112
Copy link
Contributor

The Nexus 7 on skia-rpi-002 is still failing to set CPU frequency. This fails for me:

$ adb shell "echo "userspace" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"

But the same command succeeds after "adb root". From what I can tell in adb.log, we never even attempt to restart adbd as root, despite multiple attempts to restart adbd as non-root. Note that this is an old device running K. Is it possible to upgrade it?

@erock2112
Copy link
Contributor

Okay, I think I found the issue. We explicitly reboot our Android devices at the end of our recipes. The Swarming code does not expect this; it does the "adb root" whenever it reboots the device and otherwise expects that adbd is already running as root. When we reboot the device during the swarming task, we don't run "adb root", so we're in non-root mode. I think the right solution is to change Swarming to always reboot Skia Android devices after every task and stop issuing the reboot during the task.

@erock2112
Copy link
Contributor

Okay, removing the reboot from our recipe and having Swarming do it after every task seems to have fixed the CPU scaling on the N7 on skia-rpi-002. I'm still not satisfied with the level of visibility of these errors. IMO any failure in on_before_task should prevent the task from running on that bot, if not quarantine the bot outright. Maybe we should start querying for bot events and alerting for some failures?

@erock2112
Copy link
Contributor

Okay, the N7s are still failing but with a different error:

733 2016-10-17 12:28:58.049 D: 1/6.Write(OPEN, 24, 0, 'sync:\x00')
733 2016-10-17 12:28:58.056 D: 1/6.Read(OKAY, 9, 24)
733 2016-10-17 12:28:58.065 D: 1/6.Write(WRTE, 24, 9, 'SEND;\x00\x00\x00/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed,33272DATA\x07\x00\x00\x00640000\nDONE\n\xc4\x04X')
733 2016-10-17 12:28:58.073 D: 1/6.Read(OKAY, 9, 24)
733 2016-10-17 12:28:58.081 D: 1/6.Read(WRTE, 9, 24, 'OKAY\x00\x00\x00\x00')
733 2016-10-17 12:28:58.090 D: 1/6.Write(OKAY, 24, 9)
733 2016-10-17 12:28:58.097 D: 1/6.Write(CLSE, 24, 9)
733 2016-10-17 12:28:58.104 D: 1/6.Read(CLSE, 9, 24)
733 2016-10-17 12:28:58.112 D: 1/6.Write(OPEN, 25, 0, 'sync:\x00')
733 2016-10-17 12:28:58.119 D: 1/6.Read(OKAY, 10, 25)
733 2016-10-17 12:28:58.127 D: 1/6.Write(WRTE, 25, 10, 'RECV5\x00\x00\x00/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed')
733 2016-10-17 12:28:58.136 D: 1/6.Read(OKAY, 10, 25)
733 2016-10-17 12:28:58.144 D: 1/6.Read(WRTE, 10, 25, 'DATA\x08\x00\x00\x001200000\nDONE\x00\x00\x00\x00')
733 2016-10-17 12:28:58.153 D: 1/6.Write(OKAY, 25, 10)
733 2016-10-17 12:28:58.161 D: 1/6.Write(CLSE, 25, 10)
733 2016-10-17 12:28:58.168 D: 1/6.Read(CLSE, 10, 25)
733 2016-10-17 12:28:58.169 I: 1/6.SetCPUSpeed(): Failed to push 640000 in /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

We successfully set the scaling governor, and writing the frequency file apparently succeeds, but when we read it back it's still set to the previous value. This seems to be the case for all 3 of our N7s, and I have no idea how to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants
@maruel @erock2112 @nodirt @jcgregorio @kjlubick and others