clockworkpi

1.2GHz and beyond

First, I’m not responsible for anything if you want to try what I did. Everything is at your own risk.

Clockworkpi is advertised as “Allwinner R16-J Quad-core Cortex-A7 CPU @1.2GHz” although if you check your cpufreq-info info with the kernel that came with OS v.0.5 you will see that the maximum frequency is, in fact, 1.008GHz witch is ~20% less. So I thought that maybe something was wrong with it and tried to look more information.

The Allwinner R16 datasheet/User manual doesn’t specify what is the maximum clock frequency of the SOC, but if we go to the sunxi-linux wiki we see that the R16 is actually a rebranded version of the A33, and if you go to:

https://linux-sunxi.org/R16

You will end up in the A33 page. There you see that the maximum frequency is not even 1.2GHz, but 1.5GHz :crazy_face:

So if the R16 is basically the A33 we are actually underclocking the R16 and we could obtain ~50% more horse power??? :thinking:

Well I tried to look a bit more on it, and compare our board to others with the same SOC. The very closest board is the Banana Pi M2M. It uses the same R16 SOC, same PMIC, same Wifi module… If we look to the board dts file for Banana Pi M2M:

We see that it is practically the same that we use for clockworkpi, almost a copy and paste:

As you can see both boards use as its bases the A33 code in the very begining:

/dts-v1/;
#include "sun8i-a33.dtsi"

The big difference between them, and actually where we are interested is the following code:

&cpu0 {
	cpu-supply = <&reg_dcdc3>;
};

&cpu0_opp_table {
	opp-1104000000 {
		opp-hz = /bits/ 64 <1104000000>;
		opp-microvolt = <1320000>;
		clock-latency-ns = <244144>; /* 8 32k periods */
	};

	opp-1200000000 {
		opp-hz = /bits/ 64 <1200000000>;
		opp-microvolt = <1320000>;
		clock-latency-ns = <244144>; /* 8 32k periods */
	};
};

Here we can see that for Banana pi M2M the SOC can go to 1.1GHz, and 1.2GHz. The other information, opp-microvolt, tells the PMIC to change the voltage to that value when changing the clock.

So, from where cames the 1.008GHz that we have at maximum?

This information is written in the file:

There we have all “basic” supported clock speeds: 120MHz, 240MHz, …, 1.008GHz.

So if banana pi M2M can go to 1.2GHz, also without any heatsink, why clockworkpi can’t?

There is another difference on our board dts file. The voltages for the cpu in our regulator are fixed:

&reg_dcdc3 {
	regulator-always-on;
	regulator-min-microvolt = <1200000>;
	regulator-max-microvolt = <1200000>;
	regulator-name = "vdd-cpu";
};

whereas for bananapi they can vary from 0.9V to 1.4V

&reg_dcdc3 {
	regulator-always-on;
	regulator-min-microvolt = <900000>;
	regulator-max-microvolt = <1400000>;
	regulator-name = "vdd-cpu";
};

Well, this means that even if you underclock cpi to 240MHz it would still apply 1.2V to the CPU. I think would be better to apply less voltage if we underclock, since it could save some battery, on the other hand, if we are locked to 1.2V we can’t go to higher frequencies. At least, if we plan to use the same code used in Bpi board. So I changed the voltages for dcdc3, d5ldo, and dcdc2, which are different from our board to the Bpi board. Now the kernel can change voltages when we want to change clock speed.

Is safe changing the voltage applied to the cpu to 1.4V? Well, if we look to the R16 datasheet (https://linux-sunxi.org/images/b/b3/R16_Datasheet_V1.4_(1).pdf), on page 25, you will see the recommended voltages. They recommend the CPU voltage to be from 0.9V to 1.4V. These numbers are coincident of what we have for Bpi. So if we are in this range we are actually using the Allwinner recommended values. If we see the “Absolute maximum ratings” you will see that an “overclocking” would be between 1.4V and 1.5V.

Another consideration that we have to take if we want to do this experiment is the thermals of the cpu. I saw some posts from some years that clockworkpi was overheating, and even some people asked to underclock the board.

So, what temperature is safe? Especially if we don’t want to use heatsink? To answer that we can go to the same table on the R16 datasheet. There you will see that the maximum “recommended” case (silicon die) temperature is 90 degrees Celsius. So everything bellows it is perfectly fine.

To monitor the temperature I compiled the kernel with the information provided by @r043v. I enabled the following flags:

CONFIG_CAN=y
CONFIG_CAN_SUN4I=y
CONFIG_MFD_SUN4I_GPADC=y
CONFIG_SUN4I_GPADC=y

and with this new kernel (I’m working on kernel 5.5, you can get a patch for 5.5 on my github https://github.com/wolfallein/clokworkpi-kernel, the patch doesn’t apply yet the thermals readings that I mentioned) I was able to see the cpu temperature with:

cpi@clockworkpi:~/src/kernel/linux$ cat /sys/class/thermal/thermal_zone0/temp
47304

So, I went a bit further because I wanted to see if we can get 1.5GHz, so I added a 1.3GHz (1.32V), 1.4Ghz, 1.45GHz, and 1.5GHz all last 3 with 1.4V (the maximum recommended voltage).

With 1.5GHz I got the system freezing sometimes, with 1.45GHz it worked well, but when I tried to compile some massive code it started to send corrupted kernel messages, so I tried 1.4GHz with 1.4V and it is working very well.

With 1.4GHz, and 1.4V I recompiled the kernel as an experiment. All 4 processors were at 100% for about half-hour, I didn’t use any fan, and I use the same “motherboard container” without any holes on it. Didn’t get any crash, everything was fine. The maximum temperature was 76 degrees Celsius. I also was looking to the clock frequency, and I saw that the CPU was being thermal throttling to lower clocks. Sometimes, when the temps were ~76, the CPU was clocked down to 1.3GHz and 1.2GHz not less. This is very good news. It means that the kernel is taking care of the thermals and decreasing the clock when it sees that the cpu is getting hot.

I did some performance tests, and the increase in clck speed really makes a big difference. I can try to compile some results in another post.

Another thing that is good to have in our kernel is the ondemand profile. It would decrease the clock if the CPU is not being used, and with voltage modifications to the dts file it could save battery.

Since the clock is increased the battery will also take a hit, so the already small battery would last for less time, but maybe with this tweaks we can play some more games that were just almost playable :grin: The Ondemand CPU profile wold increase the clock just when we need it. I noticed that some emulators doesn’t make use of all cores, instead only uses one, and a higher clock is very welcome.

Maybe the clockworkpi developers could tell us what is very safe to do, and explain why we don’t get 1.2GHz out of the box. So, 1.2GHz seems to be a very very safe clock, they even advertise it.

9 Likes

awesome research, great !!

“ondemand” governor do not really adjust clock speed, it put cores at min/max,
“sched util” and “conservatives” governors are way better and adjust speed smoothly
fantasti launcher let you adjust current governor & cpu speed

thanks a lot for your work, i’ll adjust arch launcher & kernel to have default speed way beyond current setting !

also thanks for the 5.5 patch link with backlight driver rewrited :slight_smile:

1 Like

This is awesome! Would it be possible to get a binary kernel build that could be installed with these changes? I saw the thermal additions and considered building my own, but your latest experiments might push me to build my own anyway. Still, having a binary available could be useful, and it’s probably more likely to get into the next OS update that way anyway, as was done with 0.5 and @shell’s kernel.

1 Like

Some benchmark results. I set the frequency with:
cpi@clockworkpi:~$ cpufreq-set -f 1GHz

I compiled the “userspace” governor in my kernel to be able to change it.

Results:

Offical max clock speed: 1GHz.

cpi@clockworkpi:~$ cpufreq-info | grep "current CPU"
current CPU frequency is 1.01 GHz.

cpi@clockworkpi:~$ sysbench --test=cpu --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          51.3161s
    total number of events:              10000
    total time taken by event execution: 205.2042
    per-request statistics:
         min:                                 20.36ms
         avg:                                 20.52ms
         max:                                 64.26ms
         approx.  95 percentile:              20.54ms

Threads fairness:
    events (avg/stddev):           2500.0000/23.80
    execution time (avg/stddev):   51.3010/0.01

The maximum temp was: 54 C. No temperature throttling.

1.2GHz

cpi@clockworkpi:~$ cpufreq-info | grep "current CPU"
  current CPU frequency is 1.20 GHz.

cpi@clockworkpi:~$ sysbench --test=cpu --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          43.1258s
    total number of events:              10000
    total time taken by event execution: 172.4514
    per-request statistics:
         min:                                 17.10ms
         avg:                                 17.25ms
         max:                                 46.66ms
         approx.  95 percentile:              17.27ms

Threads fairness:
    events (avg/stddev):           2500.0000/22.73
    execution time (avg/stddev):   43.1128/0.01

The maximum temp was: 59 C. No temperature throttling.

1.3GHz

cpi@clockworkpi:~$ cpufreq-info | grep "current CPU"
current CPU frequency is 1.30 GHz. 
cpi@clockworkpi:~$ sysbench --test=cpu --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          39.9097s
    total number of events:              10000
    total time taken by event execution: 159.6102
    per-request statistics:
         min:                                 15.83ms
         avg:                                 15.96ms
         max:                                 43.39ms
         approx.  95 percentile:              15.97ms

Threads fairness:
    events (avg/stddev):           2500.0000/16.69
    execution time (avg/stddev):   39.9025/0.01

The maximum temp was: 61 C. No temperature throttling.

1.4GHz

cpi@clockworkpi:~$ sudo cpufreq-set -f 1.4GHz 
cpi@clockworkpi:~$ cpufreq-info | grep "current CPU"
  current CPU frequency is 1.40 GHz.  
cpi@clockworkpi:~$ sysbench --test=cpu --num-threads=4 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          37.1670s
    total number of events:              10000
    total time taken by event execution: 148.6283
    per-request statistics:
         min:                                 14.74ms
         avg:                                 14.86ms
         max:                                 46.39ms
         approx.  95 percentile:              14.87ms

Threads fairness:
    events (avg/stddev):           2500.0000/22.28
    execution time (avg/stddev):   37.1571/0.01

The maximum temp was: 67 C. No temperature throttling.

Keep in mind that all tests are quick tests. So we didn’t have time to start to have temperature throttling in any. Which is good. It means that our SOC can handle higher clocks for a considerable amount of time.

I tried a test with 1.4GHz, and after ~2minutes the CPU was underclocked to 1.3GHz when it reached ~75 C.

So, in the end, it is safe in my opinion. The system switches to lower clocks automatically when we are hot, even if I manually set to higher clocks. This is good when you only need a boost of power but normally you don’t need everything at the maximum.

Another thing is what I mentioned before. Most emulators don’t work with many cores, instead, they only use one core. So, one core at 1.4GHz at 100% could work for a lot more time than all 4 at 100% because it wouldn’t increase that much the temperature.

1 Like

Thanks :slight_smile:

Thanks to explain.

Well, it was a easy fix. I preferred to keep in our code than modify the kernel files, so I removed the include and copied what we needed from last kernel, before they modify it.

They also modified a lot of things on the future RC, it was a bit more complicated to fix, but I managed, when they launch the next RC I will send the patch for cpi.

I removed all fancy stuff (boot logo) that was consuming a lot of the file. I also removed the cursor changes that they did. I actually prefer the cursor blinking on my lcd.

2 Likes

I really recommend you to build yours, it is very nice to see it running after.

For the new clock frequencies you only need to change the dtb file, I suppose. You can still use the same kernel that you have. If you have the v0.5 it should work ok.

For the thermal reading, and new governors, I needed to enable it in the configuration.

I’ll try to upload my kernel, and dtb. As I mentioned, my kernel doesn’t have the fancy logo, instead you will see the traditional linux Tux (3 of them).

Another thing is that this is very experimental, use at your risk

3 Likes

How “easy” it is to patch the kernel with your changes ?
You know, for normal peoples like me :slight_smile:

2 Likes

(as tux reflect cpu core number i think crop is not supported as the 4th one is hidden
for cpu governor the arch wiki page is a great resource who helped me a lot

for the kernel conf i use that :

CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

from my tests, ondemand make too hard variation, great for server but not viable for playing, conservative work much better as it is stepped but not really change in real time so there are perf drop visible for seconds and is not really viable too, schedutil work again better and change faster but as conservative it still visible, so by default i let maximum perf at boot using cpupower util, taking down cores with manual speed depending of what i need (2 cores @ 600mhz is great)

3 Likes

I uploaded the dts, dtb, my config, and the uImage binary

My kernel boots default in ondemand governor, but you can change manually the frequency or to performance if you want.

If you want to compile your kernel in the future, I tried to explain a bit here:

3 Likes

Exactly what I thought.

I read the difference between them. I’m a bit surprise, I would expect the ondemand better for gaming because it changes fast, but didn’t think about the “visual effect”.

I will enable the conservative mode in my next build.

Thanks

1 Like

haha I didn’t want to say that was too easy, just that I didn’t make a proper change to the code. As @r043v mentioned the better would be to rewrite the entire driver avoiding references to deprecated calls. What I did was a “fix”. it works, but just for a while.

When you download the kernel sources you have access to all the kernel code, and you can customize it as you wish, like opening a text file, and writing your name into it. The easiest way is to change the kernel drivers that are loaded, specify new drivers. After the modifications, that can be just enabling a new driver, or writing a new one, you can create a patch with the modifications that you did.

1 Like

it work pretty fine at 1.4GHz !!
the speed up is awesome, now in the hunt is playable under fbneo :video_game:

3 Likes

Sorry my english is bad…
I did not meant that you said it was easy, what I meant is how would someone like me, without the knowledge you have, apply the patch ?

Could you provide a small step by step ?

Damn! I want this too!!!
Would you be kind enough to test spcinv95.zip under FBNeo ?

Amazing!

We should mainline the kernel development at some point, and update the cpi git repo too.

2 Likes

So for the steps…
I am waiting anxiously.

To be honest I don’t know everything about patching. I’m learning.

Basically you download two sources and work in only one, keeping the other source without changing. After your modifications you can use a command called “diff” to show the differences between your version, and the original. With a diff file you can apply the same changes that you did in another clean source. The idea is that you don’t need to send all the source files, and it is easier for others to understand what you did.

If you are working in a git, you can use “git diff” command. You can find more information here:

https://git-scm.com/docs/git-diff

https://www.thegeekstuff.com/2014/03/git-patch-create-and-apply/

I’m really sorry to not make a step by step guide. I’m not very good to explain things, and actually I don’t know very well how to do it. In my patch, I just followed the git diff manual.

2 Likes

your compile guide from the git is pretty complete !
(glad to see you recommend mount /boot

@Lix on a first try to install the binary, you will not have real benefit (else gain knowledge) about compile it yourself, and it took about one hour for compile it under the gs

spcinv95.zip crash using the latest fbneo retroarch from buildbot, my old one manually compiled was run it fine, strange

as we gain ~400mhz/core the speed up is proportional, that’s not magic,
in the hunt now run at a bit more than 40fps, still slow but playable, i finished it yesterday, pretty hard game

1 Like

ok. In the Hunt run flawless in MAME though
I have only issues with FBNeo

Yeah I will try to understand whats in the git and try to learn.
I’ll do it on another card though…lol