R01, fbturbo: Accelerated 2D graphics in X11

Logging my attempt to get fbturbo driver to work with AllWinner D1.

fbturbo repo is here: GitHub - ssvb/xf86-video-fbturbo: Xorg DDX driver for ARM devices (Allwinner, RPi and others)
A portal with nice build instructions: Xorg - linux-sunxi.org

Edit src/Makefile.am, remove ARM assembly source, BackingStore, LibUMP/MaliGPU related stuff.

Clean up fbdev.c etc. to remove the references to the hardware resources we don’t have. Also remove the related configuration options.

make, sudo make install, and the fbturbo driver will be installed.

Troubleshooting:

Update /etc/X11/xorg.conf.d/10-d1.conf:

Section "Module"
        Load    "shadow"
EndSection

Section "Device"
        Identifier      "FBDEV"
        Driver          "fbturbo"
        Option          "fbdev" "/dev/fb0"

        Option          "SwapbuffersWait" "true"
        Option          "OffTime" "0"
        Option          "Rotate" "CW"
EndSection

At this point, startx should be able to bring it up correctly. No G2D still.
If the screen goes black, revert to original 10-d1.conf, blind type startx, and the screen will be initialized again.

To avoid having no idea what’s gone wrong when the screen goes black: startx -- -logverbose 6 > startx.log 2>&1

Current issue:

The G2D part is meant to be used with legacy sunxi 3.x kernel. Some work has to be done in sunxi_disp.c and sunxi_x_g2d.c to bring it forward.

The initialization fails early, here:

So the ioctl semantic totally changed. Edit: … the semantics do change. But the real problem here is permission. /dev/disp and /dev/g2d are root:root rw-------.
Need to find the latest ioctl definitions first.
Some docs wrt latest (5.4 kernel) driver interface:

As for where to find the actual driver header… I’m cloning the kernel repo to see if I can find anything.

Update

New sunxi_display2 ioctl interfaces are found. drivers/video/fbdev/sunxi/disp2/disp/dev_disp.c:L3648
sunxi_disp_ioctl.h is obsolete. Copying sunxi_display2.h from kernel tree.
Structs and commands have been renamed, need to align them in sunxi_disp.c;
Need to port 32-bit ioctl calls to 64-bit.
wip repo: GitHub - yatli/xf86-video-fbturbo: Xorg DDX driver for ARM devices (Allwinner, RPi and others)

Update

Some playground code that actually displays something on LCD:

I was planning to test things from g2d_driver.h (fill rectangle etc.) but it starts to feel wrong too soon – pointers are 32 bit, docs are wrong etc. etc.
It turns out there are two versions of g2d:

  • v1: drivers/char/sunxi_g2d/g2d_driver.c
  • v2: drivers/char/sunxi_g2d/g2d_rcq/g2d.c

And v2 is not as feature rich as v1 yet. The only relevant part in official G2D doc starts from section 3.2.2-2.0 版本接口.

Worse, not all v2 interfaces are documented. For example, G2D_CMD_FILLRECT_H is missing, and we have to go to drivers/char/sunxi_g2d/g2d_rcq/g2d_mixer.c:L82 to find out what’s really happening.

Update

Okay okay got G2D_CMD_FILLRECT_H working after a few kernel oops.
Another version of the doc (I don’t know why there are so many different versions!) asks to put physical address into the “align” field.


这合理吗.jpg (

These guys got it right: T113-S3 G2D 裸机测试 | 全志在线开发者论坛
We need use_phys_addr = 1 because we are operating on the framebuffer directly. If not set, kernel panic awaits.

So I guess this is the first time the g2d engine is running on R01.
Preliminary test data is ready, benchmarking rectangle filling performance of g2d vs. optimized software:

 ========== fill_test, w=10 h=10 n=10000 ===========
 >>> software fillrect: TIME = 33.590000ms.
 >>> hardware fillrect: TIME = 1588.535000ms.
 ========== fill_test, w=30 h=30 n=10000 ===========
 >>> software fillrect: TIME = 275.237000ms.
 >>> hardware fillrect: TIME = 1625.978000ms.
 ========== fill_test, w=50 h=50 n=10000 ===========
 >>> software fillrect: TIME = 692.817000ms.
 >>> hardware fillrect: TIME = 1835.902000ms.
 ========== fill_test, w=70 h=70 n=10000 ===========
 >>> software fillrect: TIME = 1322.177000ms.
 >>> hardware fillrect: TIME = 1862.888000ms.
 ========== fill_test, w=90 h=90 n=10000 ===========
 >>> software fillrect: TIME = 2188.596000ms.
 >>> hardware fillrect: TIME = 2223.052000ms.
 ========== fill_test, w=100 h=100 n=10000 ===========
 >>> software fillrect: TIME = 2715.522000ms.
 >>> hardware fillrect: TIME = 2230.658000ms.
 ========== fill_test, w=200 h=200 n=10000 ===========
 >>> software fillrect: TIME = 10805.467000ms.
 >>> hardware fillrect: TIME = 3146.057000ms.
 ========== fill_test, w=300 h=300 n=10000 ===========
 >>> software fillrect: TIME = 24263.317000ms.
 >>> hardware fillrect: TIME = 4816.747000ms.

image

Guys, actual acceleration when h/w>90!

Full screen rotation also looks very promising. The following test simulates rotating a 1280x480 shadow buffer into the 480x1280 display buffer.

 ========== rotate_test ===========
 >>> software rotate: TIME = 8205.986000ms. FPS = 7.311736.
 >>> hardware rotate: TIME = 125.690000ms. FPS = 477.364946.

image

Update

I managed to wire up the DDX driver to reserve display engine layer, point to framebuffer memory, initialize G2D, and sunxi_g2d_blt is reached. However, g2d blt did not happen.
Two issues here.

  1. It fails this check:

    Either the source or destination is not within the framebuffer.
  2. preferred blt size. This value is set to 1000 in original fbturbo but according to our test, it is only more efficient for h/w>90 (8100 pixels). I logged down the blt operations and they’re pretty small, like 32x32, 100x14, 1x1, etc. etc.

So I decide to first work out hardware-backed rotation.

Update:

Rotation works. Could be improved, since DDX exposes the damaged rectangles, but since our rotation engine is fast I just do fullscreen rotations.
The critical step is to hack the shadow buffer into the framebuffer memory so we get the physical address. There are other ways to do this but anyway :smiley:

After this, the failed range check I mentioned in the previous update passes. And it reveals that Xorg server is calling bitblt to copy from shadow to shadow. The problem though is that bitblt causes buffer glitches. Not sure why.

Performance wise I don’t see huge improvement going from FALLBACK_BITBLT to G2D_CMD_BITBLT_H. Bottleneck elsewhere?

So far, this brings up the DDX driver to a workable state with rotation acceleration.

Primary goal reached.

Update

The bitblt problem turns out to be trivial. fbturbo wants a return status 1 for success and 0 for failure. Bazinga.
With that fixed we now have super smooth window moving. Still glitches sometimes. Need to find out when.
It seems that there are some unsupported overlapped bitblt modes in the g2d engine.
The original fbturbo covered some cases and I added a few more.

You will see that, a window moves to the left very smooth (accelerated) but moves to the right laggy (non-accelerated).

Secondary goal reached.

@andypiper if you’re waiting for signal, the time is now :slight_smile:

If you want to test this accelerated driver:

  1. Backup your /etc/X11/xorg.conf.d/10-d1.conf !!
  2. Download https://nextcloud.yatao.info:10443/s/cJbbpto4TX3NMJn
  3. Unpack, cd into it, sudo make install
  4. startx

What remains to be done (polishment)

  • It takes over the framebuffer. No fbcon :frowning:
  • Glitches in bitblt. I haven’t figured out the exact condition, but it may be related to buffer alignment and overflow.
4 Likes

Thanks for working on this. Watching with interest and will give things a try as it gets further along!

1 Like

wow
this is really something

nice job