R01, fbturbo: Accelerated 2D graphics in X11

Logging my attempt to get fbturbo driver to work with AllWinner D1.

fbturbo repo is here: GitHub - ssvb/xf86-video-fbturbo: Xorg DDX driver for ARM devices (Allwinner, RPi and others)
A portal with nice build instructions: Xorg - linux-sunxi.org

Edit src/Makefile.am, remove ARM assembly source, BackingStore, LibUMP/MaliGPU related stuff.

Clean up fbdev.c etc. to remove the references to the hardware resources we don’t have. Also remove the related configuration options.

make, sudo make install, and the fbturbo driver will be installed.

Troubleshooting:

Update /etc/X11/xorg.conf.d/10-d1.conf:

Section "Module"
        Load    "shadow"
EndSection

Section "Device"
        Identifier      "FBDEV"
        Driver          "fbturbo"
        Option          "fbdev" "/dev/fb0"

        Option          "SwapbuffersWait" "true"
        Option          "OffTime" "0"
        Option          "Rotate" "CW"
EndSection

At this point, startx should be able to bring it up correctly. No G2D still.
If the screen goes black, revert to original 10-d1.conf, blind type startx, and the screen will be initialized again.

To avoid having no idea what’s gone wrong when the screen goes black: startx -- -logverbose 6 > startx.log 2>&1

Current issue:

The G2D part is meant to be used with legacy sunxi 3.x kernel. Some work has to be done in sunxi_disp.c and sunxi_x_g2d.c to bring it forward.

The initialization fails early, here:

So the ioctl semantic totally changed. Edit: … the semantics do change. But the real problem here is permission. /dev/disp and /dev/g2d are root:root rw-------.
Need to find the latest ioctl definitions first.
Some docs wrt latest (5.4 kernel) driver interface:

As for where to find the actual driver header… I’m cloning the kernel repo to see if I can find anything.

Update

New sunxi_display2 ioctl interfaces are found. drivers/video/fbdev/sunxi/disp2/disp/dev_disp.c:L3648
sunxi_disp_ioctl.h is obsolete. Copying sunxi_display2.h from kernel tree.
Structs and commands have been renamed, need to align them in sunxi_disp.c;
Need to port 32-bit ioctl calls to 64-bit.
wip repo: GitHub - yatli/xf86-video-fbturbo: Xorg DDX driver for ARM devices (Allwinner, RPi and others)

Update

Some playground code that actually displays something on LCD:

I was planning to test things from g2d_driver.h (fill rectangle etc.) but it starts to feel wrong too soon – pointers are 32 bit, docs are wrong etc. etc.
It turns out there are two versions of g2d:

  • v1: drivers/char/sunxi_g2d/g2d_driver.c
  • v2: drivers/char/sunxi_g2d/g2d_rcq/g2d.c

And v2 is not as feature rich as v1 yet. The only relevant part in official G2D doc starts from section 3.2.2-2.0 版本接口.

Worse, not all v2 interfaces are documented. For example, G2D_CMD_FILLRECT_H is missing, and we have to go to drivers/char/sunxi_g2d/g2d_rcq/g2d_mixer.c:L82 to find out what’s really happening.

Update

Okay okay got G2D_CMD_FILLRECT_H working after a few kernel oops.
Another version of the doc (I don’t know why there are so many different versions!) asks to put physical address into the “align” field.


这合理吗.jpg (

These guys got it right: T113-S3 G2D 裸机测试 | 全志在线开发者论坛
We need use_phys_addr = 1 because we are operating on the framebuffer directly. If not set, kernel panic awaits.

So I guess this is the first time the g2d engine is running on R01.
Preliminary test data is ready, benchmarking rectangle filling performance of g2d vs. optimized software:

 ========== fill_test, w=10 h=10 n=10000 ===========
 >>> software fillrect: TIME = 33.590000ms.
 >>> hardware fillrect: TIME = 1588.535000ms.
 ========== fill_test, w=30 h=30 n=10000 ===========
 >>> software fillrect: TIME = 275.237000ms.
 >>> hardware fillrect: TIME = 1625.978000ms.
 ========== fill_test, w=50 h=50 n=10000 ===========
 >>> software fillrect: TIME = 692.817000ms.
 >>> hardware fillrect: TIME = 1835.902000ms.
 ========== fill_test, w=70 h=70 n=10000 ===========
 >>> software fillrect: TIME = 1322.177000ms.
 >>> hardware fillrect: TIME = 1862.888000ms.
 ========== fill_test, w=90 h=90 n=10000 ===========
 >>> software fillrect: TIME = 2188.596000ms.
 >>> hardware fillrect: TIME = 2223.052000ms.
 ========== fill_test, w=100 h=100 n=10000 ===========
 >>> software fillrect: TIME = 2715.522000ms.
 >>> hardware fillrect: TIME = 2230.658000ms.
 ========== fill_test, w=200 h=200 n=10000 ===========
 >>> software fillrect: TIME = 10805.467000ms.
 >>> hardware fillrect: TIME = 3146.057000ms.
 ========== fill_test, w=300 h=300 n=10000 ===========
 >>> software fillrect: TIME = 24263.317000ms.
 >>> hardware fillrect: TIME = 4816.747000ms.

image

Guys, actual acceleration when h/w>90!

Full screen rotation also looks very promising. The following test simulates rotating a 1280x480 shadow buffer into the 480x1280 display buffer.

 ========== rotate_test ===========
 >>> software rotate: TIME = 8205.986000ms. FPS = 7.311736.
 >>> hardware rotate: TIME = 125.690000ms. FPS = 477.364946.

image

Update

I managed to wire up the DDX driver to reserve display engine layer, point to framebuffer memory, initialize G2D, and sunxi_g2d_blt is reached. However, g2d blt did not happen.
Two issues here.

  1. It fails this check:

    Either the source or destination is not within the framebuffer.
  2. preferred blt size. This value is set to 1000 in original fbturbo but according to our test, it is only more efficient for h/w>90 (8100 pixels). I logged down the blt operations and they’re pretty small, like 32x32, 100x14, 1x1, etc. etc.

So I decide to first work out hardware-backed rotation.

Update:

Rotation works. Could be improved, since DDX exposes the damaged rectangles, but since our rotation engine is fast I just do fullscreen rotations.
The critical step is to hack the shadow buffer into the framebuffer memory so we get the physical address. There are other ways to do this but anyway :smiley:

After this, the failed range check I mentioned in the previous update passes. And it reveals that Xorg server is calling bitblt to copy from shadow to shadow. The problem though is that bitblt causes buffer glitches. Not sure why.

Performance wise I don’t see huge improvement going from FALLBACK_BITBLT to G2D_CMD_BITBLT_H. Bottleneck elsewhere?

So far, this brings up the DDX driver to a workable state with rotation acceleration.

Primary goal reached.

Update

The bitblt problem turns out to be trivial. fbturbo wants a return status 1 for success and 0 for failure. Bazinga.
With that fixed we now have super smooth window moving. Still glitches sometimes. Need to find out when.
It seems that there are some unsupported overlapped bitblt modes in the g2d engine.
The original fbturbo covered some cases and I added a few more.

You will see that, a window moves to the left very smooth (accelerated) but moves to the right laggy (non-accelerated).

Secondary goal reached.

@andypiper if you’re waiting for signal, the time is now :slight_smile:

If you want to test this accelerated driver:

  1. Backup your /etc/X11/xorg.conf.d/10-d1.conf !!
  2. Download https://nextcloud.yatao.info:10443/s/cJbbpto4TX3NMJn
  3. Unpack, cd into it, sudo make install
  4. startx

What remains to be done (polishment)

  • It takes over the framebuffer. No fbcon :frowning:
  • Glitches in bitblt. I haven’t figured out the exact condition, but it may be related to buffer alignment and overflow.
7 Likes

Thanks for working on this. Watching with interest and will give things a try as it gets further along!

2 Likes

wow
this is really something

nice job

1 Like

Okay, progress!

This concludes the error patterns of overlapped bitblt:

  /* SIZE(432,808) ORIGIN(i,217) OFFSET(8,0) results:
   * 0..8
   * 9..15 X
   * 16..24
   * 25..31 X
   * 32..40
   * 41..47 X
   * 48..56
   * 57..63 X
   * ...
   * X + 1 doesn't change the pattern.
   * Width + 1 doesn't change the pattern.
   * Delta X + 1 DOES change the pattern.
   * TODO Moving Y?
   *
   * ... OFFSET(9,0) results:
   * 0..7
   * 8..15 X
   * ...
   *
   * ... OFFSET(1,0) results:
   * 0..255 OK
   *
   * ... OFFSET(2,0) results:
   * 0..14 OK
   * 15 X
   * 16..30 OK
   * 31 X
   *
   * ... OFFSET(14,0) results:
   * 0..2 OK
   * 3..15 X
   *
   * ... OFFSET(16,0) results:
   * 0 OK
   * 1..15 X
   *
   * ... OFFSET(17,0) results:
   * 0..255 X
   *
   * ... Negative X offset ALL OK.
   * ... Applying both X & Y offsets destroys the pattern.
   */

So… when offset_x + (src_x % 16) >= 17, there will be glitches.
To work around this, we can split the bitblt into multiple operations with smaller offsets.
Window movement is super smooth in all directions now.
Still some problems – part of the buffer is damaged in this multi-ops manner. Need to save the damaged parts.

All four damage cases are shown here:

Further tests exposed that bitblt area height cannot be larger than 128. Also worked around with multiple ops.

2 Likes

Released version 0.1 – everything should be working.

  • Accelerated full-screen rotation
  • Accelerated window movement
  • Accelerated window scrolling
  • Glitches in alpha are fixed
  • Framebuffer console is not disabled on Xorg exit

Installation

  • Download, unpack, cd into it
  • sudo make install
  • sudo systemctl enable devterm-r01-dispfd-daemon.service
  • sudo systemctl start devterm-r01-dispfd-daemon.service

@guu @andypiper

8 Likes
cpi@devterm-R01:~/fbturbo$ sudo make install
/usr/bin/mkdir -p '/usr/lib/xorg/modules/drivers'
/bin/bash ./libtool   --mode=install /usr/bin/install -c   fbturbo_drv.la '/usr/lib/xorg/modules/drivers'
libtool: install: /usr/bin/install -c .libs/fbturbo_drv.so /usr/lib/xorg/modules/drivers/fbturbo_drv.so
/usr/bin/install: cannot stat '.libs/fbturbo_drv.so': No such file or directory
make: *** [Makefile:3: install] Error 1

should I need to compile the fbutrbo_drv.so?

1 Like

ooh, I need to pack .libs into the tarball. One minute.

try fbturbo-r01-v0.1-3.tar.gz

This package is built with v0.1 base OS image. Guess it will be compatible though.

1 Like

tested

works fantastic!!

I will include this driver in next OS Image

I guess R01 is perfect now

3 Likes

I used it for a full day and still find some glitches:

  • Kernel oops about buffer overflow
  • Terminal scrollback sometimes damage one line or two, when scrolled height >= 128

I’ll let you know when I fix these issues. Let’s make R01 perfect :slight_smile:

2 Likes

any plans for an OS image update (even better if it can be updated via the package manager, rather than a full reinstall…)

if you are using v0.2a R01 Os image or later

you can do it with package manager

sudo apt update
sudo apt install -y xf86-video-fbturbo-r01
sudo reboot

according to the feedback
v0.1 os image will have issues

I’ve tested these commands on v0.2a os image

if upgrade falls to the failed

just need ssh into the DT ,or mount the sdcard on any linux PC

to remove the files

/etc/X11/xorg.conf.d/10-d1.conf 
/usr/lib/xorg/modules/drivers/fbturbo_drv.*
/usr/local/bin/r01-dispfd-daemon.elf
1 Like

This is really cool. x feels speedy. thanks! i’ve included this in my R01 setup script: GitHub - katmai/r01: Modifications for the ClockworkPi DevTerm R01

is there any chance that this driver could be configured to run in 8-bit or lower colour depth? it might be a long shot but reducing the number of bits used per pixel could improve performance further, and given the lack of a GPU on the r01 and the… errr… lacklustre capabilities of the fbcon, it might be best to try and make x11 a glorified teminal multiplexer for most usecases. trying to run the server with 2/4/8 bit depths fails immediately, and 16 starts, but only on the lower half of the framebuffer, with a weird green tint. fails upon any sort of button event with a segfault. suggestions?

edit:
also, would there be any benefit from switching from fbturbo under xorg to something like Xfbdev, which is a stripped-down X server that just draws straight to the framebuffer? TinyCore and Puppy already use it since they target low-power devices, and I’m sure they would appreciate having more eyes on the codebase.