Getting suspend to work properly on A06

If I run echo freeze>/sys/power/state or use the xfce suspend it suspends, and resumes when I press the power button - but drains the battery in a few hours (with protected cells).

echo mem>/sys/power/state blanks the screen for a moment then fails with bash: echo: write error: Connection timed out. The dmesg log has errors about xhci-hcd but adding it to SUSPEND_MODULES under /etc/pm/config.d didn’t help.

Has anyone had any more luck with this?

Same issue.

xhci-hcd is compiled into the kernel so it cannot be unloaded…
I’ve checked the source tree, and it seems that the rockchip usb 3.0 driver is not even included.
I’m trying to disable all xhci features in the kernel config. Compiling the kernel now;

1 Like

I removed xhci from the kernel. Suspend now works. Hibernation doesn’t – can go into hibernation but has issues resuming.

2 Likes

Nice! I added kernel argument to avoid having to recompile:

initcall_blacklist=xhci_hcd_init,xhci_pci_init,xhci_plat_init
3 Likes

cool, today I learn something. :slight_smile:

Thanks pkr, I tried adding that to extraargs in /boot/armbianEnv.txt, and I’m not getting any errors in dmesg any more - but suspend still doesn’t work, it blanks the screen for a moment then returns to the desktop. There’s this (not very helpful) message in syslog:

systemd-sleep[3324]: Failed to suspend system. System resumed again: Operation not supported

Yeah, I just noticed mine wasn’t going into deep suspend either. On mine, it tries deep suspend, fails, and silently goes to s2idle (freeze). Debug messages aren’t helpful at all. I even tried disconnecting all USB devices including keyboard and it’s still the same result :eyes:

[   46.736347] PM: suspend entry (deep)
[   48.017450] Filesystems sync: 1.280 seconds
[   48.017482] PM: Preparing system for sleep (deep)
[   48.125025] Freezing user space processes ... (elapsed 0.003 seconds) done.
[   48.128362] OOM killer disabled.
[   48.128369] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   48.130050] PM: Suspending system (deep)
[   49.743033] PM: suspend of devices complete after 1612.087 msecs
[   49.743068] PM: start suspend of devices complete after 1613.005 msecs
[   49.743079] PM: suspend devices took 1.610 seconds
[   49.744492] PM: late suspend of devices complete after 1.403 msecs
[   49.745819] PM: noirq suspend of devices complete after 1.191 msecs
[   49.746008] Disabling non-boot CPUs ...
[   49.749606] psci: CPU1 killed (polled 0 ms)
[   49.752635] psci: CPU2 killed (polled 0 ms)
[   49.755523] psci: CPU3 killed (polled 0 ms)
[   49.757308] psci: CPU4 killed (polled 0 ms)
[   49.759720] psci: CPU5 killed (polled 0 ms)
[   49.760681] PM: Checking wakeup interrupts
[   49.760694] PM: Calling ledtrig_cpu_syscore_suspend+0x0/0x40
[   49.760717] PM: Calling sched_clock_suspend+0x0/0x40
[   49.760733] PM: Calling timekeeping_suspend+0x0/0x2e0
[   49.760733] PM: Calling irq_gc_suspend+0x0/0x80
[   49.760733] PM: Calling kvm_suspend+0x0/0x40
[   49.760733] PM: Calling fw_suspend+0x0/0x20
[   49.760733] PM: Calling cpu_pm_suspend+0x0/0xe0
[   49.760733] PM: Calling its_save_disable+0x0/0xf0
[   49.760733] PM: Calling its_restore_enable+0x0/0x160
[   49.760733] PM: Calling cpu_pm_resume+0x0/0x70
[   49.760733] PM: Calling kvm_resume+0x0/0x34
[   49.760733] PM: Calling irq_gc_resume+0x0/0x84
[   49.760733] PM: Calling irq_pm_syscore_resume+0x0/0x20
[   49.760733] PM: Calling timekeeping_resume+0x0/0x15c
[   49.760733] PM: Calling sched_clock_resume+0x0/0x60
[   49.760735] PM: Calling ledtrig_cpu_syscore_resume+0x0/0x20
[   49.760774] Enabling non-boot CPUs ...
[   49.761032] Detected VIPT I-cache on CPU1
[   49.761071] GICv3: CPU1: found redistributor 1 region 0:0x00000000fef20000
[   49.761142] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[   49.761847] CPU1 is up
[   49.762036] Detected VIPT I-cache on CPU2
[   49.762067] GICv3: CPU2: found redistributor 2 region 0:0x00000000fef40000
[   49.762122] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[   49.762930] CPU2 is up
[   49.763107] Detected VIPT I-cache on CPU3
[   49.763136] GICv3: CPU3: found redistributor 3 region 0:0x00000000fef60000
[   49.763188] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[   49.764141] CPU3 is up
[   49.764341] Detected PIPT I-cache on CPU4
[   49.764370] GICv3: CPU4: found redistributor 100 region 0:0x00000000fef80000
[   49.764422] CPU4: Booted secondary processor 0x0000000100 [0x410fd082]
[   49.765625] CPU4 is up
[   49.765844] Detected PIPT I-cache on CPU5
[   49.765867] GICv3: CPU5: found redistributor 101 region 0:0x00000000fefa0000
[   49.765906] CPU5: Booted secondary processor 0x0000000101 [0x410fd082]
[   49.767154] CPU5 is up
[   49.768849] PM: noirq resume of devices complete after 1.679 msecs
[   49.769831] PM: early resume of devices complete after 0.827 msecs
[   50.838253] PM: resume of devices complete after 1068.403 msecs
[   50.840841] PM: resume devices took 1.070 seconds
[   50.840869] PM: Finishing wakeup.
[   50.840878] OOM killer enabled.
[   50.840883] Restarting tasks ... done.
[   50.851450] PM: suspend exit
[   50.853645] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[   50.919339] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 50000000Hz, actual 50000000HZ div = 0)

Looks like it’s something between its_save_disable and its_restore_enable

Oh well… I tried to suspend last night with DT under my pillow…
A few hours later I touched something hot 0.0

disabled all my usb ports (lsusb displays nothing) and still no deep suspend. Probably not related to usb :expressionless:

But on the bright side, with all my poking and twiddling things, my screen ended up black, usb is dead, and it’s now drawing 470mA from USB-C while still connected to wifi! now that’s what i call reasonable power consumption!

Could you try lowering it to Gear 1 (maybe even lower, 1 LITTLE core @ 400MHz) and measure power consumption?

I believe it will be almost exactly the same, or at least too small a difference for my meter to measure (<10mA). One little thing I noticed is the big cores are stuck at 816MHz minimum instead of going down to 408MHz, so manually setting the frequency or even disabling the big cores should affect power consumption on light load.
In Manjaro you can use cpu-temp-speed to check current clock speeds.

I see. Indeed the lower clock states are marked as “inefficient”.
We may have some luck if we enable ftrace for the kernel and poke around…

Edit: note to self:

loc comment
kernel/power/suspend.c:L391 suspend_enter
L425 call to pm_sleep_disable_secondary_cpus, then cpuidle_pause. We observe a log sequence to shutdown non-booting cores.
L434 call to syscore_suspend
drivers/base/syscore.c:L47 syscore_suspend – goes through registered syscore_ops. cpu_pm is one of them (maybe irrelevant).
arch/arm64/kernel/sleep.S, arm64/kernel/suspend.c arm64 sleep. It’s hooked up to cpuidle.

In between, there’re two paths that may cause immediate resume:

  • pm_sleep_disable_secondary_cpus fails, or suspend_test fails (update: that’s for pm_test).
  • syscore_suspend() fails.

Update:

After a few rounds of printk hacking…

	error = syscore_suspend();
  pr_err("after syscore_suspend, error=%d", error);
	if (!error) {
		*wakeup = pm_wakeup_pending();
    pr_err("after pm_wakeup_pending");
		if (!(suspend_test(TEST_CORE) || *wakeup)) {
			trace_suspend_resume(TPS("machine_suspend"),
				state, true);
      pr_err("before suspend_ops->enter");
			error = suspend_ops->enter(state);
      pr_err("after suspend_ops->enter");
			trace_suspend_resume(TPS("machine_suspend"),
				state, false);
		} else if (*wakeup) {
			error = -EBUSY;
      pr_err("wakeup BUSY");
		}
		syscore_resume();
	} else {
    pr_err("syscore_suspend fails: %d", error);
  }
[  965.639523] PM: suspend entry (deep)
[  965.930618] Filesystems sync: 0.291 seconds
[  965.974514] Freezing user space processes ... (elapsed 0.003 seconds) done.
[  965.978488] OOM killer disabled.
[  965.978494] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  965.980077] printk: Suspending console(s) (use no_console_suspend to debug)
[  966.526141] PM: suspend devices took 0.550 seconds
[  966.529183] PM: after platform_suspend_prepare_noirq
[  966.529248] Disabling non-boot CPUs ...
[  966.532391] psci: CPU1 killed (polled 0 ms)
[  966.536375] psci: CPU2 killed (polled 0 ms)
[  966.539808] psci: CPU3 killed (polled 0 ms)
[  966.542967] psci: CPU4 killed (polled 0 ms)
[  966.545572] psci: CPU5 killed (polled 0 ms)
[  966.546524] PM: after syscore_suspend, error=0
[  966.546524] PM: after pm_wakeup_pending
[  966.546524] PM: before suspend_ops->enter
[  966.546524] PM: after suspend_ops->enter
[  966.546543] Enabling non-boot CPUs ...
[  966.547097] Detected VIPT I-cache on CPU1

So it’s suspend_ops->enter not doing its work now;

For arm64 suspend should be handled by PSCI. drivers/firmware/psci/psci.c
Down there (PSCI firmware), it’s really out of my reach. But I’ll finish the tracing anyway.

[    0.000000] psci: psci_init_smccc: feature = 0
[    0.000000] psci: SMC Calling Convention v1.2
[    0.000000] psci: psci_init_cpu_suspend: feature = 0
[    0.000000] psci: psci_init_system_suspend: ret = 0
[    0.000000] psci: psci_system_reset2: ret = -1
[   80.919482] Disabling non-boot CPUs ...
[   80.923640] psci: CPU1 killed (polled 0 ms)
[   80.927784] psci: CPU2 killed (polled 0 ms)
[   80.931379] psci: CPU3 killed (polled 0 ms)
[   80.935126] psci: CPU4 killed (polled 10 ms)
[   80.938110] psci: CPU5 killed (polled 0 ms)
[   80.939568] PM: after syscore_suspend, error=0
[   80.939568] PM: after pm_wakeup_pending
[   80.939568] PM: before suspend_ops->enter
[   80.939568] psci: enter psci_system_suspend_enter
[   80.939568] psci: enter psci_system_suspend, pa_cpu_resume=312F490
[   80.939568] PM: after suspend_ops->enter
[   80.939599] Enabling non-boot CPUs ...

So… PSCI is correctly initialized. But invoke_psci_fn(PSCI_FN_NATIVE(1_0, SYSTEM_SUSPEND), pa_cpu_resume, 0, 0); bounces back immediately indicating an error.

Suspecting psci is not properly configured in device tree.

This is the original patch: [PATCH 0/2] PSCI: system suspend support


psci_system_suspend_enter -> cpu_suspend -> psci_system_suspend

arm64 suspend.c, L119

	if (__cpu_suspend_enter(&state)) { // state is 0 for system suspend
		/* Call the suspend finisher */
		ret = fn(arg); // but PSCI doesn't like it...

		/*
		 * Never gets here, unless the suspend finisher fails.
		 * Successful cpu_suspend() should return from cpu_resume(),
		 * returning through this code path is considered an error
		 * If the return value is set to 0 force ret = -EOPNOTSUPP
		 * to make sure a proper error condition is propagated
		 */
		if (!ret)
			ret = -EOPNOTSUPP;
	} else {
		RCU_NONIDLE(__cpu_suspend_exit());
	}

Possibly related: 0012-add-suspend-to-rk3399-PBP.patch · master · manjaro-arm / packages / core / linux-pinebookpro · GitLab

The patch looks promising. It uses a rockchip-specific way of talking to the trusted firmware (ATF).
It cannot be directly applied to DevTerm, because it uses ACPI pm_poweroff_prepare. (It will be removed soon, anyway: [v7,20/20] reboot: Remove pm_power_off_prepare() - Patchwork)

If we want to do it quick and dirty, we can just plug it into psci.c without even touching device tree ((
If clockworkpi cares enough they should be able to pick it up and polish…

1 Like

Heh, I was just about to suggest that you check the work on Pinebook Pro, but looks like you already started :slight_smile:. They still haven’t gotten suspend to work for everyone, but for certain configurations it works perfectly.

Unfortunately I’m a bit computer illiterate, so I can’t help much here :sweat_smile:

Unfortunately I’m a bit computer illiterate, so I can’t help much here :sweat_smile:

Says the one who taught me about initcall_blacklist :stuck_out_tongue:

4AM now. I need to suspend.

Research led me to initcall_blacklist (and initcall_debug). I figured a little bit of research and hackery was much faster than trying to compile a kernel.

Update.

I managed to patch the kernel with the rockchip pm code. Kernel is compiling, let’s see what will happen.

Update.

Fixed some errors in DT, power-key confirmed to be: <&gpio0 RK_PA5 GPIO_ACTIVE_HIGH>; (see A06 schematics, 2 of 15, RK3399_C

Kernel module loads fine. And… nothing happened. As expected. Maybe we have to use “virtual poweroff”?

3 Likes

Update.

As the rockchip pm config code loads after PSCI, we can reliably hijack the syscore suspend descriptor – after that, we intercept suspend requests that were meant to go to PSCI.

Turns out the rockchip patch is really half-baked. That “virtual poweroff” command will try to parse a device tree node for… ir keys, which seems to be talking about a remote controller that comes with a development board. You point the ir led to the dev board and shoot a resume command. Lazy devs :smiley: :smiley: (edit: wait, an rk3399 android TV box?)

The core of that function is very similar to what we have in PSCI driver.

PSCI ver:

static int psci_system_suspend(unsigned long unused)
{
	phys_addr_t pa_cpu_resume = __pa_symbol(function_nocfi(cpu_resume));

	return invoke_psci_fn(PSCI_FN_NATIVE(1_0, SYSTEM_SUSPEND),
			      pa_cpu_resume, 0, 0);
}

rockchip ver:

int sip_smc_virtual_poweroff(void)
{
	struct arm_smccc_res res;

	res = __invoke_sip_fn_smc(PSCI_FN_NATIVE(1_0, SYSTEM_SUSPEND), 0, 0, 0);
	return res.a0;
}

… which is only worse, right.

Result? Doesn’t even go into s2idle, because it thinks the suspend is successful :man_facepalming:

[   48.432366] rockchip_pm_virt_pwroff: begin
[   48.432366] rockchip_pm_virt_pwroff: end. ret = 0

So the real deal may be hiding inside ATF.
I really hope clockworkpi can chime in here to assist. They might have a BSP uboot that bundles proper suspend ATF.

Update. Looking into upstream ATF, many ROCKCHIP_SIP smc functions are not handled at all.
Rockchip does not provide source code for their ATF.

We can try their binary version.

Update: Tried rkbin bl31.elf + mainline uboot, no banana

At this point we’re out of (obvious) options. I’ll back off for a while to recharge myself. If anyone is interested, the next target will be to examine the upstream ATF code for rockchip platform. TLDR: It’s not compatible with the pm code in the PBP patch, and it’s known that cpu suspend works, but system suspend, which asks to suspend the last active cpu, fails.

One possible reason would be that power domains and the regulators are not properly shut down. image