Geometry Vibes 3D Port - R-Pi Pico 1@35 FPS!

Keith_Dobbelaere · March 3, 2026, 6:46am

Check out my game in development:

KeithDobbelaere/GeometryVibes3D-PicoCalc

A faithful RP2040 port of Geometry Vibes 3D, targeting the ClockworkPi PicoCalc.

Current state: basic gameplay loop (scrolling level, ship movement, and collision) rendered in wireframe “fake 3D” using fixed-point math.

Features (so far)

Fixed-point 3D camera + projection (no frame-time floats required)
Streaming level playback (GVL1 / 56-bit column format), reads columns on demand from storage
Ship controls (keyboard input; 45° up/down travel like the original)
Collision detection against level geometry, including rotation/inversion modifiers

Hardware integration highlights

ILI9488 320×320 display: dual-core line raster + slab binning + SPI DMA streaming (~35 FPS)
SD card + FAT32: stream levels/*.BIN columns on demand (no full level in RAM)
PicoCalc keyboard: polled input via the device driver layer

Toolchain

Raspberry Pi Pico SDK v2.2.0
ARM GCC 14.2
VS Code Pico Project extension

Keith_Dobbelaere · March 10, 2026, 3:47am

Current state: playable wireframe “fake 3D” implementation with a level-select menu, HUD, portal/ship effects, and stable fixed-rate rendering on the ILI9488.

Features (Updated)

Fixed-point 3D camera + projection (no frame-time floats required)
Streaming level playback using the GVL1 / 56-bit column format
Level-select menu with highlighted selection
HUD layer with:
- controls hint
- level label
- progress bar + percentage
- fading file-load event text
Ship controls with 45° up/down travel like the original
Collision detection against level geometry, including rotation/inversion modifiers
Wireframe effects, including:
- animated portal rays
- ship trail
- ship explosion chunks

Hardware / platform highlights

ILI9488 320×320 display
- dual-core slab renderer
- SPI DMA streaming
- screen-space text + fill-rect overlay primitives
- stable ~30 FPS pacing
SD card + FAT32
- streams levels/*.BIN columns on demand
- no full level load into RAM
PicoCalc keyboard
- polled through the platform/input layer

Rendering notes

Fixed-capacity render lists using static storage
Major-axis slab line rasterization for cleaner wireframe output
ROM-resident 8×8 bitmap font
Cached screen-space text objects for HUD and menu rendering

Tools

Level editor: tools/level_editor/level_editor.py
- tkinter-based 9×N obstacle editor
- place ship start position
- paint obstacles and modifiers
- locked auto-generated endcap + portal preview
- rectangle select, copy, cut, paste, and undo
- import/export JSON
- export packed GVL1 binary files for the game

Toolchain

Raspberry Pi Pico SDK v2.2.0
ARM GCC 14.2
VS Code Pico Project extension

Keith_Dobbelaere · March 15, 2026, 9:37am

First release ready to download and try out!

Geometry Vibes 3D for PicoCalc - v0.4.0-beta.1

First public PicoCalc release with animated obstacle groups, updated tooling, and gameplay progression.

Highlights

Playable wireframe Geometry Vibes 3D experience on ClockworkPi PicoCalc
Title screen, level select, HUD, portal effects, ship trail, and explosion effects
Animated obstacle groups with grouped primitive definitions in the level format
Runtime rendering and collision support for animated groups
Updated Python level editor with:
- animation-group authoring
- live preview
- pivot and step editing
- JSON Open / Save / Save As workflow
- GVL2 binary export

Included

UF2 build for PicoCalc with Raspberry Pi Pico 1
Current level set
Title screen in raw RGB565

Controls

Space: thrust / confirm
Enter: confirm
Esc / Power: pause
F1: toggle status overlay

Notes

This is an early public release and is being published as a pre-release while broader testing continues.
Animated obstacle collision appears solid in current testing, but more gameplay testing is still planned.

Installation

Put the PicoCalc into BOOTSEL mode
Copy the included .uf2 to the device
Make sure the SD card contains the required files:
- levels/L01.BIN thru L07.BIN
- assets/title.rgb565

JackCarterSmith · March 16, 2026, 4:42pm

Nice job for the optimization of the screen rendering!

Keith_Dobbelaere · March 22, 2026, 5:08pm

Thank you. That display class took the longest, but without it, the game wouldn’t be possible. Utilizing both cores to ping-pong slabs using DMA over SPI was a challenge.

Keith_Dobbelaere · March 22, 2026, 5:21pm

The game is fully featured now, and fun to play, so check it out—I have v0.5.0-beta.1 posted on GitHub. I think we squeezed quite a bit out of the Pico 1 on this platform, and the SPI display was a real challenge. Thanks to anyone who’s shown an interest, so far. Thanks to BlairLeduc for his driver code and to Kuratius for the camera optimizations.

Keith_Dobbelaere · March 22, 2026, 5:31pm

I should also mention, there’s a full-featured editor if you want to create custom levels with custom animated groups, primitive painting, star placement, custom colors, etc. Then it’s as simple as clicking Export Bin and you have a new level. All 10 default levels are in the repo as *.json files and can be loaded into the editor if you prefer to just modify those.

Kuratius · March 22, 2026, 8:12pm

I think you could even implement some rudimentary 3D, as long it’s scenes where overdraw isn’t a problem and only solid color triangles are used. I think a duffs device (jumping into an unrolled loop via switch statement) and some assembler using fixed regs to use stm instructions could help with that. You have nearly 10x the raw horse power of the GBA and people manage to get 10-15 fps using software rendering on it for fairly complex scenes, like with OpenLara. Thumb is on average maybe 2-3x slower than arm because the instructions aren’t very powerful, but the higher clock might compensate for it.
It’s probably not necessary for this project except for maybe the rectfill function, but it might be useful if the renderer is used for other things.

Also I left some explanations on the commit for the comments you added to my code, I think you were confused about how it works so maybe it’s helpful.

It might be worth checking if the game code can be compressed enough to execute from ram, although it’s probably not a very high priority in any case.
Another thing I noticed is that gcc seems be very reluctant to use the uxth instruction for extracting the lower 16 bits of an unsigned int, even though it ought to be faster than shifting or applying a bitmask via bic. I don’t really know what’s up with that. Even a cast to uint16_t seems to get compiled to bitshifts instead of uxth. Maybe it’s hoping to just use strh instead and skip the instruction?

Keith_Dobbelaere · March 24, 2026, 12:47am

Which commit are you referring to?

Kuratius · March 24, 2026, 10:14am

aa8748b but I think you found it already.
Regarding the uxth issue a gcc developer responded to my bug report, apparently cortex-m0plus has a broken cost model.

Also I have a question about how the renderer works, my understanding of it is that the cores render into their own buffer and then kick off a dma when they are done.
Does this actually require two cores to work?
Like can the dma from both cores run simultaneously?
Or is the issue that a single core cant run dma while rendering to a separate part of the buffer due to access contention?
I thought I remembered reading that the memory banks of pi pico are interleaved so memory contention should be minimal.

https://forums.raspberrypi.com/viewtopic.php?f=145&t=311811

Keith_Dobbelaere · March 24, 2026, 2:55pm

The DMA is separate hardware from the CPUs. It’s a ping-pong/double-buffered setup. While one buffer is being sent out through the display pipeline, the other is being built. Then they swap. The second core helps manage that pipeline, but the actual SPI data movement is still done by DMA.When I first started, single core, no DMA, I was only getting ~19 FPS pushing very, very rudimentary graphics via SPI. If you increase SLAB_ROWS past 32, you run out of RAM. So, It’s all to work around SPI and RAM limitations.

Keith_Dobbelaere · March 24, 2026, 3:16pm

There’s probably a lot of room for improvement. I just quit when I got the frame rates I wanted.

Keith_Dobbelaere · March 24, 2026, 4:12pm

To this point, maybe we could put the hottest display routines into SRAM. I’d never considered that before.

Kuratius · March 24, 2026, 5:41pm

I think it depends on what the access pattern to them is, if it’s a very slow routine that takes a lot of time but also isn’t hot in the sense of being called often enough to remain in cache (like in a loop) then the ram will make it faster.
For functions inside loops it will probably not be noticeable unless they contain extremely large unrolled loops or switch statements because the cache will handle the accesses.
It will probably make performance more predictable though.

I’m also not entirely certain how static (rather than malloced) memory is handled on pi pico, in the absolute worst case they might set cache to write-through-mode and have it resident in flash or something. I would hope only static const arrays are in flash.

Keith_Dobbelaere · March 24, 2026, 6:08pm

I haven’t done any profiling, so I have no idea where the bottlenecks would be. On the display side, Core0 handles binning, and Core1 is basically the renderer. I might try marking some functions for SRAM storage—some of the more self-contained draw functions, maybe:

drawLineIntoSlab()

drawLineXMajorIntoSlab()

drawLineYMajorIntoSlab()

I’d do them one at a time and see if it improves anything.

So, per google, we’d just wrap the function in __no_inline_not_in_flash_func(…), right?

Kuratius · March 24, 2026, 6:56pm

From what I understand, yes. But there’s a good chance this isn’t very noticeable on functions that are called more than once per frame. On Nintendo DS there is a similar situation with itcm (static instruction memory directly hooked up to the cpu, same speed as cache) being the equivalent of ram and ram being the equivalent of flash, since they are hooked up to different bus clocks (cache clock is tied to the cpu clock at 60 to 120 MHz, ram is tied to 33 MHz bus clock) and there is an additional cache system on top of it, and typically itcm is most beneficial on things that run on an interrupt or that need to run during a dma, since on Nintendo DS starting DMA blocks accesses to anything other than cache and itcm.

Keith_Dobbelaere · March 24, 2026, 10:57pm

I wired pin 2, 3, and 21 to Core0, Core1, and DMA respectively, and hooked them up to my scope. Unsurprisingly, Core1 is doing the vast majority of the work in the display class but does have a tiny bit of down-time. More importantly, though, DMA stays pegged the whole time. So, I think we’re maxed on throughput.

void Ili9488Display::renderAndFlushFrame(const Frame& f) {

    probe_on(PIN_PROBE_CORE1);

        …

        if (slabIndex != 0) {

            wait_for_spi_dma_idle();

            probe_off(PIN_PROBE_DMA);

        }

        start_dma_slab(slab, W * rows);

        probe_on(PIN_PROBE_DMA);

        ping ^= 1;
    }

    wait_for_spi_dma_idle();

    spi_set_format(spi1, 8, SPI_CPOL_0, SPI_CPHA_0, SPI_MSB_FIRST);

    gpio_put(PIN_CS, 1);

    probe_off(PIN_PROBE_CORE1);

}

Keith_Dobbelaere · March 24, 2026, 11:09pm

I had my daughter playing while I watched the scope, and when entering denser regions, Core1 does become 100% utilized. So there is room to improve!

Topic		Replies	Views
Basic programs the PicoCalc PicoCalc tinkering , programs	121	7462	April 14, 2026
Graphical demos for the PicoCalc PicoCalc	44	1857	April 13, 2026
GameBoy emulator PicoCalc firmware	37	2689	March 31, 2026
Bubble Universe PicoCalc demo	22	984	November 1, 2025
Foobler notes - 1 - "Getting Started" PicoCalc	52	2622	July 17, 2025

Geometry Vibes 3D Port - R-Pi Pico 1@35 FPS!

Features (so far)

Hardware integration highlights

Toolchain

Features (Updated)

Hardware / platform highlights

Rendering notes

Tools

Toolchain

Highlights

Included

Controls

Notes

Installation

Related topics