Ideas for improving LCD speed

What’s the secret to getting 111 mode working? I was hoping it would be as simple as changing the 0x55 interface mode to 0x01 and then setting down count/2 (packed) bytes instead of count*2 565 halfwords. But it’s not working for me.

Try 0x61. Think that is what I used

Yeah, I tried that already after noticing somebody else had it in their code. Didn’t help. Theoretically it shouldn’t matter because the upper bits are documented as ignored if you’re using SPI. Any other differences in init code or handling? I’m always updating an even number of pixels at present.

I kept my modifications as minimal as possible. I got it down to the lcd_draw function. If I send the packed data four times, everything runs (with obviously garbled screen data). If I send the packed data once, and only change the init mode from 0x55 to either 0x61 or 0x01, something hard faults.

And to confirm, the pixel data is two pixels per byte in bits 0-2 and bits 3-5, with the upper two bits being zero?

EDIT: I’ve spent several hours playing with this, trying different things. It seems like as soon as I set the interface mode to 0x01 or 0x61, writing any sort of pixel data to the screen locks up. I’ve tried modifying a few different otherwise functional programs, no luck. I also tried forcing idle mode (0x39) since that seems like it might need to be a prerequisite since it only pays attention to the upper bits of the color data.

I even went as far as bringing up my own code from scratch, fixing all the dumb mistakes, and then checking 16bpp mode works fine. 3bpp just… doesn’t do anything. The LCD inits properly, and my code runs, but nothing ever shows up on the screen.

I added code to scroll the screen vertically just to “prove” the LCD isn’t dead, and the screen does scroll in 8 color mode (using whatever was in VRAM from my last run)

I also added support for 18bpp and 24bpp since it was just a few extra lines of code, and for what it’s worth, the memory footprint of the LCD appears to be fixed at 18/24bpp per pixel, because when I changed the memory access mode and don’t clear the entire screen, the rest of the screen stays the same color.

This means that there isn’t magically more memory for page flipping in some of the lower color bit modes.

When I tested I used 480x320 and always updated the entire screen in one go. i.e. set the display area to the whole screen. I seem to remember it didn’t work if setting smaller areas.

Interesting idea! Didn’t help me though. Thank you anyway for the suggestion.

Dave

@picouser, can you share init code for your 3-bit experiments? I am also having trouble making 3-bit mode work.

Ok, got it working with lcd_write_reg(0x3A, 0x22);

Wow, so a major doc bug then?

Ben, I can confirm that works as well for me. How did you figure that out? Just guessing different init values? Didn’t occur to me to try that.

Has anybody figured out how the scrolling regions work? The top and middle are pretty simple.

If top is nonzero, the top N lines of VRAM are shown there. Then the next middle lines are shown, starting at whatever the scroll base (in register 0x37) is set to.

I can’t figure out how bottom works though. The docs say it’s a fixed number of lines from the end of video memory, but I’ve tried a bunch of different values there and they don’t make sense. I can get a part of the bottom to be fixed, but I don’t understand why.

Basically with these fixed regions, you can scroll in either direction really quickly while only updating one line of text, which combined with the 3bpp could actually give you usably fast text editing.

EDIT - FINALLY figured it out. top is the number of lines to fix at the top, middle is 320 - top - bottom, and you have to write 160 + bottom to the bottom register.

160 is 480 - 320.

This gives you a fixed top and bottom and a cleanly wrapping scrolling region.

If you don’t need a bottom fixed region, you can just have scrolling wrap at the end of VRAM, I think.

It’s getting late and I’m probably doing something stupid, but the 565 mode packing seems really weird.

My color struct is { uint8_t r, g, b, unused; };

I have the BGR register set to 0x48, as this is necessary to get the correct color mappings in 18 and 244 bpp modes, where I just pass the first three bytes of the struct into the spi hardware.

8 color mode also works fine; the bits are packed (binary) 00RGBRGB.

But as far as I can tell, 565 mode is something like (binary) BBBBBRRRRRRGGGGG unless I have some sort of offsetting bug in my code.

Did you swap the bytes before sending?

Nope. I remembered seeing that in your code and didn’t really understand it, and since that code only worked in black and white, I couldn’t see any difference.

Once I set the SPI mode to 16 bits and used spi_write16_blocking, I got the mapping I was expecting, which is r5g6b5.

I didn’t check your recent release, did you send down register 0x39 for 3 bit mode? It’s supposed to save power.

EDIT - while trying to power off my picocalc for the night I realized that some time in the past few hours I’d crashed the keyboard driver somehow and hadn’t noticed. Explains why none of my attempts to read the keyboard were working.

I had tried to use the C64 “matrix mode” but eventually noticed in the official source release the iomatrix isn’t ever actually modified.

Couple more things I’ve figured out.

Highest SPI speed for LCD I can set seems to be 62’500’000. Setting it lower than that affects overall performance, obviously.

With the SPI speed maxed out, a full 320x480 screen fill (using 16 pixel runs in 3bpp and 8 pixel runs in 16bpp, and a dump simple one-at-a-time loop for 18/24): (this is on a stock original pico)

21.2ms for 3bpp
59.6ms for 16bpp
167ms for 18/24bpp

EDIT - I played with unrolling the loop more, and kept getting steady wins, which surprised me unless there is a LOT of internal setup in spi_write_blocking.

128x - 12.1ms (curve is finally starting to flatten out)
64x - 12.6ms
32x - 14.0ms
16x - 17.1ms
8x - 21.2ms

(This is already two pixels per byte, so 8x is already 16 pixels per run)

1 Like

The theoretical limit at 62,500,000 is 1000 x 320 x 480 x 8 / 2 / 62500000=9.8304 ms, so 12ms is not too bad. Did you try with looping dma to remove all setup time?

Nah, not yet. I did implement an idea I had for opaque text where I expand it from a bitmap to a pixel instead of the terrible “pixel at a time” routine the base class uses.

16bpp text fill (40x40 characters) went from 800ms to 50.1ms when I did that. That’s fast enough for rapidly paging through code in a text editor.

3bpp is still twice as fast (but only twice as fast considering it’s pushing a quarter the data).

EDIT: using 8 bit writes for 16bpp mode took 52.0ms. 16 bit writes took 50.9, slightly faster (and the colors are correctly unpacked)

-Dave

1 Like

Good news is I made a line-oriented version of my output routine that is about twice as fast (one region per line instead of one region per character).

Bad news is that while switching out my case for a really cool one that holds the debug probe, I finally got unlucky and messed up my LCD like so many other folks here. I took it apart and re-seated it a few times (both the screen itself and the fragile connector) and got it to the point where only the first 16 lines are screwed up. Every other line in the first 16 is bad, rest of display is fine.

-Dave

When I get my PicoCalc I am going to be super-careful with the screen, and for that matter am going to tape it down with black electrical tape to keep it from moving around, as that seems like a major source of problems.

1 Like

Yup. I wish I’d done that. Good luck!

Black tapes: magic solver!