Ideas for improving LCD speed

What is the current fastest approach for writing pixels to the ILI8894 LCD on pico/pico2?

On one hand, Luckfox Lyra has reasonable framerate (20-30 fps) which allows playing games.
On the other hand, on PicoMite has abysmal framerate unless overclocked. Micropython has bad framerates, but that’s potentially due to the use of python for implementing hardware interaction. picocalc_helloworld and picocalc_lvgl_graphics_demo examples rely on the same slow implementation (esp. scrolling which reads back from LCD memory).

My own is based on the Lyra driver (picocalc_lua/drivers at master · benob/picocalc_lua · GitHub), with SPI clocked at 75Mhz, 16bit color and 16bit SPI writes. Yet, framerate is real bad (e.g. clearing the 320x320 screen is around 0.3s).

Some other options I am looking into:

  • In-memory framebuffer and use the coproc to send data in background (e.g. characters for a terminal, since pixel framebuffer would leave very little free memory)
  • DMA but my understanding is that spi_write_blocking is already using DMA
  • PIO but I read everywhere that it’s not fast enough

Do you have other ideas / implementations I could look at?

Also, it’s possible that my implementation is buggy, I’d be happy if someone with knowledge of those things could look at it.

2 Likes

We got some discussion about the PSRAM and LCD limitation here.

Just received my picocalc today so I can’t try anything I’ve talked about in the topic. For now…

Using baremetal approach, yes we can use the other processor (or DMA) to send data to the dedicated SPI. The complexity will be in sharing the memory between 2 async CPU.
About SPI, I’m not sure about the LCD limitation… On the paper (datasheet of ILI9488), the max timing for the SPI is about 20 MHz for the clock. But I think you can extend to 40-50 MHz with less integrity of data (because the hold+setup time is equal to 50 MHz).
You said it took 0.3s to render a frame, this is more or less my worst estimate for 20 MHz.
So I guess the SPI is still at 25 or thereabouts, not 75, because you set the spi baudrate but not the spi clock. If I don’t mistake, set_baudrate ask for a freq, but if the spi clock is not fast enough, it will use the max it’s can. (sorry I didn’t experiment too often with RPxxxx clock tree to be more precise of how to)
Always on paper, the screen in its max SPI configuration, could refresh 320x320x16 at 12.2 FPS. Maybe 15-16 if you push the SPI without corrupting data.

DMA is a good friend for tasks who need to move data in the memory without requiring CPU.
spi_write_blocking don’t use DMA, only a simple while loop to fill the transmit/receive buffer.
You can find an exemple here of how to use SPI with DMA. Can be quite confusing, so don’t exitate to experiment with this exemple first.

For the PIO, depend of the requirements… In general, the dedicated peripheral do the job better, but in some case, where you need to handle some bit manipulation, or like using a QSPI, SPI peripheral doesn’t suffice and the PIO come to the rescue!
Don’t take this statement at face value, it’s just a generalisation about dedicated circuits vs. programmable circuits. You’d need testbenchs to compare numerical results.

Also beware that Luckfox Lyra use Rockchip chip, not an RP2xxx, so no PIO will be available and the pico-sdk will not be compatible. I saw someone using linux drm directly to render things on screen but can’t find it…

2 Likes

I too was a bit disappointed with the screen update speed. The LCD driver code in the picocalc repo isn’t the prettiest code I’ve ever seen but I assume most of the execution time is spent in the spi write functions anyway so optimizing it might not help much.

Having said that though, there is an 8 color mode in the hardware - it updates two pixels per byte sent, instead of one pixel per three bytes sent, and so could theoretically be six times faster. I haven’t sent up a coding environment yet though so I can’t tell whether this does the “right” thing and bit repeats the color into the framebuffer. If it did, then region fills of common colors could be faster at least.

Also, if it’s scrolling via readback, text modes at least could be improved by keeping a shadow copy and just drawing what’s changed.

2 Likes

I just updated my WebMite/PicoMite builds to include the new display driver code the official PicoMite developer included in RC19. I ended up having to remove all the original Clockwork display driver code, but I think that’s probably for the better. This new implementation isn’t bug free (yet), but it seems to be faster.

I don’t really have any experience with display driver code though, so if anyone else wants to take a look at this and has suggestions, please do! Even better if you can submit a PR, or create an issue to help track soemthing specific. I could only get the new PicoMite code to work by using their init commands for the display. Trying to use the original Clockwork init code sort of worked, but scrolling caused corruption.

More details and links to additional info at the release link below…

1 Like

I planned to rewrite the tft driver at some point. For my usage, the @ethern0t remark about using 8-bit mode can be a good start, as I need a good refresh rate instead of an accurate color.
I don’t know so much PicoMite, if it can use 8-bit colors only, maybe it’s can be a great deal to enhance responsiveness (and scrolling).

I’m stuck with stm32 code for the moment (adding permanent backlight memory, tuning clock for power efficiency, etc.), so it can take a moment before sharing some advance on the screen topic.

2 Likes

Based on the video posted yesterday, the hardware scrolling support seems viable as well. There are optional fixed zones at the top and bottom of the screen.

What’s not clear from the hardware docs is what happens when you run out of scroll space. I assume it wraps back around so when scrolling we only ever need to draw the new part.

1 Like

The memory wraps at the end of GRAM (at byte 345,600), depending on bpp it might not be at a row boundary, which is sad.

I have played around with random writes and scrolling, and it sometimes crashes under conditions that I do not fully understand. It might be best to not write outside of the basic range.

In picocalc_lua, I resort to wrapping my draw calls manually.

Interesting! I guess if there’s enough ram for 320 x480 but the screen is only 320x320 there is some extra.

If I remember correctly the vram is 18 bits per pixel. So if we configure the 8 color mode is that changing how vram is interpreted, or just how spi writes work?

I would say it changes how it is interpreted since you can read the memory back and get consistant results. As an experiment, one could write at one bpp and then configure a different bpp to read the content.

Has anyone tried HSTX on rp2350? There is an example showing how to interact with a similar display pico-examples/hstx/spi_lcd/hstx_spi_lcd.c at master · raspberrypi/pico-examples · GitHub

This example uses a screen with an ST7789 chip, and from what I can read from its datasheet, it can support up to 65 MHz SPI clock (maybe 70 MHz).
But the major issue isn’t the SPI clock from the pico side, but from the SPI requirement from the actual screen chip (ILI9488). Actually its limit us to 20 MHz/Mbps for the SPI. :confused:

If we can find an another screen with performance similar to the ST7789 chip, who fit in place of the previous one, and probably make an adapter for the flex connector… This will unlock a lot of possibilities!

1 Like

I am talking about obtaining performance similar to that of the Luckfox Lyra, not 60fps. I read somewhere that the ILI9488 can run at 40MHz.

I’m not sure I understand. What “kind” of performance are we talking about? The Luckfox Lyra should have the same SPI contraints than the pico 2…
The point I wanted to make was that the problem isn’t with the proc/daughterboard, but with the screen itself.

Maybe the datasheet only covers the middle corner, maybe some screen can handle the 40 MHz.
What I was saying earlier was “that perhaps it was possible to reach 50 MHz” precisely because the setup and hold time limit for data was higher than that of the clock.

But this is above the limits announced in the datasheet. I’ve already seen screens frying from overheating because of an oversized clock and no protection on screen IC.
It’s not certain, but there is a risk.

If somebody have made some testing, I’m interested too!

With touch panel :smiling_face_with_sunglasses: (same bus)

1 Like

From my testing on the Lyra average fps is around 45fps. But Hispoot who made the cfw has released another LCD driver that should improve refresh rate i have yet to test it.

new screen driver for testing.
It uses rgb565 mode by default, instead of rgb666 mode, and can use 3bit mode by passing p_3bit_mode=1 p_dither=1 or p_3bit_mode=1 p_dither=0

If you want high refresh rate, you can try 3bit mode.

I reviewed the micropython port code about SPI…

Yes, it seem to be init with 40 MHz, but for the same reason I’ve noticed you at the first post, I think the 40 MHz “requested” is in fact the default one (25 MHz).

To check if the 40 MHz is reached, there is two ways:

  • check the clock signal with an oscilloscope,
  • look at the value returned by the spi_init() function.

I should look at it!

I believe the “default” in micropython is closer to 24MHz, but you can overclock via machine.freq(x,x) and be able to see a difference in refresh rate.

I’m the one who set 40MHz in the driver, but I really have no idea what I’m doing, and I never tested the difference (except for the overclock stuff just then) :sweat_smile:

1 Like

You cannot use a ILI9488 display in 16-bit RGB565 mode (2 bytes per pixel) over SPI, only RGB666 (3 bytes per pixel) and RGB111 (2 pixels per byte)

I asked the question in another topic (LCD ILI9488 really doesn't support 565 RGB mode?) because I’m unsure about what the datasheet say about the ILI9488 pixels format.
Were you able to try writing directly in the 565RGB format? I’m interested in!

Yes, doesn’t work. You can use it in RGB565 in 8 or 16-bit parallel mode.