Compiling custom keyboard firmware

Every time I bump the keyboard and it gets disconnected and I have to pop the front off to do the plug/unplug dance, I play a little more with getting my custom firmware onto the keyboard. (I use acme so I need a proper middle-mouse button, so I am using v0.1, but I want some of the bug fixes from newer version, like yatli’s fix: Keyboard stuck in Bootloader mode - #2 by yatli . Also there are some things I want to play with, hacking the keyboard firmware is fun!)

So I have gotten the Arduino SDK installed, I have gotten my changes in, I think it should work, I know how to get .bin files loaded (I use the programmer from the repo rather than trying to get the Arduino SDK to do it; it would be nice not to have to use the Arduino SDK), and now I’m stuck at the last piece:

In file included from /tmp/arduino_build_81750/sketch/devterm.h:5:0,
Alternatives for USBHID_Types.h:
ResolveLibrary(USBHID_Types.h)
from /tmp/arduino_build_81750/sketch/keyboard.h:9, → candidates:

             from DevTerm/Code/devterm_keyboard/devterm_keyboard.ino:1:

state.h:1:26: fatal error: USBHID_Types.h: No such file or directory
#include <USBHID_Types.h>
^
compilation terminated.
exit status 1
USBHID_Types.h: No such file or directory

I think this is probably a missing library, something I didn’t click, something I wasn’t supposed to click but did, something I didn’t install; I do not know, though.

I’ve got the settings correct, I believe, or at least what I have matches the wiki ( Compile keyboard bootloader and firmware · clockworkpi/DevTerm Wiki · GitHub ):

settings

I’m sure I’m doing something trivially wrong (or at least I hope the solution is simple and I have zero confidence that I understand what is going on and I really wish I was dealing with a Makefile).

1 Like

I jostled my DevTerm and the keyboard went into the busy-wait loop again. This ate another 15 minutes; I needed a break anyway.

TL;DR: 0.1 keyboard firmware with a delay added to fix the busy-wait race condition: https://media.freespeechextremist.com/rvl/full/22f4e1e35c95803479306abf0fa1e0170d99697cd93308bac5a981394ca10bfd?name=devterm_keyboard.bin . This will be useful to basically no one but people that also use acme on the DevTerm and also can’t get the firmware to compile. That may be just me.

The remainder is an explanation because it was an adventure. Gory details follow. Most hackers’ blood runs cold if they hear a sentence that starts with “Can’t you just⋯” but it’s a little harder to recognize when you are doing it to yourself with “We could just⋯” or “Hey, I’ll just⋯”.

The mouse support on the 0.3 firmware is much nicer and I’d really like to be able to use it, but the changes to the behavior of middle-click (simulating a scroll wheel if you move the mouse while holding it; I use acme and this breaks acme). So I’ve got a .ino with that behavior fixed but I cannot get it to compile. (Same versions of everything listed in the README, etc., balky Arduino IDE refuses to compile the code, I blow away ~/.arduino15 and start from scratch and grab the .) I really like the uConsole and the PicoCalc is fun so far, but I still use the DevTerm literally every day. I still absolutely love this machine.

void setup() {
  USBComposite.setManufacturerString("ClockworkPI");
  USBComposite.setProductString("DevTerm");
  USBComposite.setSerialString(SER_NUM_STR);
  
  dev_term.Keyboard = new HIDKeyboard(HID);
  dev_term.Joystick = new HIDJoystick(HID);
  dev_term.Mouse    = new HIDMouse(HID);
  dev_term.Consumer = new HIDConsumer(HID);

  dev_term.Keyboard->setAdjustForHostCapsLock(false);

  dev_term.state = new State();

  dev_term.Keyboard_state.layer = 0;
  dev_term.Keyboard_state.prev_layer = 0;
  dev_term.Keyboard_state.fn_on = 0;
  dev_term.Keyboard_state.shift = 0;
  
  dev_term._Serial = new  USBCompositeSerial;
  
  HID.begin(*dev_term._Serial,reportDescription, sizeof(reportDescription));

  while(!USBComposite); // All we need to do is add delay(10); here.

  keyboard_init(&dev_term);
  keys_init(&dev_term);
  trackball_init(&dev_term);
  
  dev_term._Serial->println("setup done");

  pinMode(PD2,INPUT);// switch 2 in back 
  
  delay(1000);
}

So, I thought, it’s one call: I should just patch the binary! It’s easy-mode for reverse-engineering because the source is right there: this is essentially open-book! This turns out to have been more of an adventure than intended. (But still less pain than arguing with the Arduino SDK.)

The .bin files in the repo (which, with a small amount of effort, you can extract from the shell scripts in Code/devterm_keyboard/bin from the GitHub - clockworkpi/DevTerm: This code repository offers downloads for the latest images of various DevTerm models, as well as kernel patches, keyboard firmware, the source code for screen and printer drivers, hardware schematics, assembly instructions, and essential technical documents. repo) are, of course, without symbols. So, xxd! Easy! I’ll just find the strings in the binary. ARM uses constant pools, so the strings are probably right next to the code! Not too many strings in there, but you can see “ClockworkPI” and “DevTerm” and “setup done” starting at 0xa8d4, and a few other string constants before that. Since stmduino puts the firmware at 0x8000000, this means that the address in memory. Also it turns out that stmduino is Cortex-M3 and Thumb-only and Thumb is way less like ARM than I expected. I’ve never written Thumb by hand; just ARM. This turns out to be relevant.

This will save you some trouble if you want to dump firmware for stmduino devices:

arm-none-eabi-objdump -b binary -m arm -M force-thumb -D --adjust-vma=0x08000000 devterm_keyboard.ino.bin

ARM vs. Thumb is interesting. I fell in love with the ISA back when the Game Boy Advance (ARM7tdmi) was current-gen; I was somewhat late to the game as the Acorn RISC Machine desktops first shipped around the time of the 80286. (There is a really fascinating series of articles at A history of ARM, part 1: Building the first chip - Ars Technica . I learned ARM assembly from a very opinionated site from a devoted ARM fan, ARM ASSEMBLER PROGRAMMING; tutorial, resources, and examples (which was, back then, heyrick.co.uk, if you have the urge to inspect it in the Wayback Machine). The assembly language was super expressive and wonderful and I was seduced and largely ignored Thumb. Every instruction is the same size, one 32-bit word, so unlike x86 where you start disassembling at the wrong offset and you get some plausible-looking gibberish, you’re always aligned when disassembling ARM. You could also do a lot of fun things, like immediate values in registers could be shifted and then you could do a relative load with an offset and this was how you usually interacted with I/O registers. You could keep registers around for this, you’d set one to the video memory or wherever and do some loads and stores; it turns out that if you’re restricted to Thumb-only, immediate values are so scarce that you chase pointers to pointers to pointers just to get a 32-bit number into a 32-bit register.

So, what you do for convenience and instruction-bumming in ARM mode becomes a constant necessity in Thumb mode: the compiler holds onto those registers for dear life. Consequently, although I was looking for “setup done”, and figured it would be straightforward to find its address in a constant pool, it was loaded relative to r6, which was set up approximately 292 bytes before that string was used, and I spent some time not realizing this.

 8000b36:	4e58      	ldr	r6, [pc, #352]	@ (0x8000c98)
 ⋯
 // dev_term._Serial->println("setup done");
 8000c5a:	6970      	ldr	r0, [r6, #20]
 8000c5c:	4915      	ldr	r1, [pc, #84]	@ (0x8000cb4)
 8000c5e:	f003 fc68 	bl	0x8004532

(If you haven’t read much disassembly, on the left is the address, then the 16-bit Thumb instruction, with a little space for double-wide instructions, then the assembly mnemonic. The addresses after the @s are provided by objdump, but the mapping to lines in the code If you know ARM, it’s probably pretty legible but there are gotchas.)

One of the reasons I like to deal with acme is more or less the same reason people like Jupyter notebooks: it’s really easy to intersperse code and annotations, you run a shell and you can make notes. So I was already sitting with one foot in Plan 9 and one in Linux and was pretty sure there was a hex editor for Plan 9 (never enjoyed xxd -r). Cloning and installing vexed(1) was the easiest part of this entire process: shithub: vexed .

Eventually, spending a lot of time blindly stumbling through objdump and I grepped the disassembly for every occurrence of #1000 and found what I was looking for:

 // delay(1000)
 8000c6a:	f44f 707a 	mov.w	r0, #1000	@ 0x3e8
 8000c6e:	f003 fd91 	bl	0x8004794

The bit that made it stand out was that you can see that the pointer dev_term is used in three functions a few lines above the delay, for keyboard_init, keys_init, and trackball_init.

After some careful reading backwards, I annotated enough of the function to figure out what I had to work with:

 // while(!USBComposite);
*8000c36:	4b1e      	ldr	r3, [pc, #120]	@ (0x8000cb0)
 8000c38:	f897 1074 	ldrb.w	r1, [r7, #116]	@ 0x74
 8000c3c:	681a      	ldr	r2, [r3, #0]
*8000c3e:	b901      	cbnz	r1, 0x8000c42
*8000c40:	e7fe      	b.n	0x8000c40
 8000c42:	7b13      	ldrb	r3, [r2, #12]
 8000c44:	2b05      	cmp	r3, #5
*8000c46:	d1fa      	bne.n	0x8000c3e
 // keyboard_init(&dev_term);
*8000c48:	4813      	ldr	r0, [pc, #76]	@ (0x8000c98)
*8000c4a:	f7ff fba5 	bl	0x8000398
 // keys_init(&dev_term);
*8000c4e:	4812      	ldr	r0, [pc, #72]	@ (0x8000c98)
*8000c50:	f7ff fe9a 	bl	0x8000988
 // trackball_init(&dev_term);
*8000c54:	4810      	ldr	r0, [pc, #64]	@ (0x8000c98)
*8000c56:	f7ff ff1b 	bl	0x8000a90
 // dev_term._Serial->println("setup done");
 8000c5a:	6970      	ldr	r0, [r6, #20]
*8000c5c:	4915      	ldr	r1, [pc, #84]	@ (0x8000cb4)
*8000c5e:	f003 fc68 	bl	0x8004532
 // pinMode(PD2,INPUT)
 8000c62:	2102      	movs	r1, #2
 8000c64:	2030      	movs	r0, #48	@ 0x30
*8000c66:	f003 fc83 	bl	0x8004570
 // delay(1000)
 8000c6a:	f44f 707a 	mov.w	r0, #1000	@ 0x3e8
*8000c6e:	f003 fd91 	bl	0x8004794
 // return
 8000c72:	b00c      	add	sp, #48	@ 0x30
 8000c74:	e8bd 81f0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, pc}

The return was implicit, but have a look at which registers it’s restoring. Incidentally, although the .ino file may have initially looked like friendly C code, there was something more sinister lurking: it’s actually C++. I spent some time scratching my head: the beginning of setup() calls, for example, USBComposite.setManufacturerString("ClockworkPI");, but then there’s while(!USBComposite). It may look like we’re just waiting for a variable to stop being null but that’s actually calling something like explicit operator bool() const somewhere.

Most of the code is pretty straightforward. Entirely too straightforward: -Os is pretty good at Thumb, I guess. To do a delay(10), we’ll need at least six bytes and there doesn’t appear to be anywhere to put them. No space in the constant pool. If we’re patching in-place, we can’t just insert it, because that’ll screw up all of the offsets everywhere, and if you look at that series of instructions a little closer, you’ll notice that of the 24 instructions where we’re operating, 14 of them (marked with asterisks) are PC-relative: not just PC-relative loads from the constant pool, but look at, for example, 8000c3e, which is a conditional relative-jump (cbnz, “compare and branch if non-zero”), but all of the branches are relative. The code looks as simple as we can make it, as compact as it could be, and extremely sensitive to being moved. It is probably satisfying if you’re the author of the compiler, but if you’re trying to find a place to cram some new instructions, it’s nightmarish.

I spent some time trying to figure out 8000c40, which is just an endless loop: it jumps to 8000c40. The instruction right before it, jumps over it unless r1==0 and I don’t know what that actually signifies so why it would choose that time to lock up is a mystery to me. I spent an inordinate amount of time thinking “Maybe I could write some cleverly compressed code, shove it onto the stack, and execute it there”.

A sacrifice has to be made, clearly. Thankfully, we have dev_term._Serial->println("setup done"); and that’s debugging output. If we take that out, we get eight extra bytes, more than the six we need. Then we could just move everything down. So, the first step, we make some space:

 // while(!USBComposite);
 8000c36:	4b1e      	ldr	r3, [pc, #120]	@ (0x8000cb0)
 8000c38:	f897 1074 	ldrb.w	r1, [r7, #116]	@ 0x74
 8000c3c:	681a      	ldr	r2, [r3, #0]
 8000c3e:	b901      	cbnz	r1, 0x8000c42
 8000c40:	e7fe      	b.n	0x8000c40
 8000c42:	7b13      	ldrb	r3, [r2, #12]
 8000c44:	2b05      	cmp	r3, #5
 8000c46:	d1fa      	bne.n	0x8000c3e
 8000c48:	bf00      	nop
 8000c4a:	bf00      	nop
 8000c4c:	bf00      	nop
 8000c4e:	bf00      	nop
 // keyboard_init(&dev_term);
 8000c50:	4811      	ldr	r0, [pc, #68]	@ (0x8000c98)
 8000c52:	f7ff fba1 	bl	0x8000398
 // keys_init(&dev_term);
 8000c56:	4810      	ldr	r0, [pc, #64]	@ (0x8000c98)
 8000c58:	f7ff fe96 	bl	0x8000988
 // trackball_init(&dev_term);
 8000c5c:	480e      	ldr	r0, [pc, #56]	@ (0x8000c98)
 8000c5e:	f7ff ff17 	bl	0x8000a90
 // pinMode(PD2,INPUT)
 8000c62:	2102      	movs	r1, #2
 8000c64:	2030      	movs	r0, #48	@ 0x30
 8000c66:	f003 fc83 	bl	0x8004570
 // delay(1000)
 8000c6a:	f44f 707a 	mov.w	r0, #1000	@ 0x3e8
 8000c6e:	f003 fd91 	bl	0x8004794
 // return
 8000c72:	b00c      	add	sp, #48	@ 0x30
 8000c74:	e8bd 81f0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, pc}

Eight bytes never felt so luxurious. Of course, it was easy enough to adjust the PC-relative ldrs but the branches were and are a huge pain. The offsets for the ldrs were easy enough: those instructions all moved down eight bytes, so subtract eight from the offset. For the encoding (which matters if you’re going to be making the change with a hex editor), ldr is always word-aligned so Thumb doesn’t encode the two lowest bits, and the offset is at the end of the instruction, so you just subtract 2: 4813 becomes 4811. For the bl, though, the encoding is a mess. You have 23 immediate bits (sign plus 22 bits) offset from the PC and split up across the two half-words and shifted some of them are XOR’d with the sign and I do not know why. They are corrected in the above, but at this stage, I actually just wrote #WRONG next to them until the end.

# Some Ruby that does the bl calculation:
def bl_calc bl_addr, target_addr
	p = (bl_addr + 4)
	imm32 = (target_addr - p) >> 1
	imm11 = imm32 & 0x7FF # bits[10:0]
	imm10 = (imm32 >> 11) & 0x3FF # bits[20:11]
	s = (imm32 >> 21) & 1
	i1 = (imm32 >> 20) & 1
	i2 = (imm32 >> 19) & 1
	j1 = (~(i1 ^ s)) & 1
	j2 = (~(i2 ^ s)) & 1
	hi = (0b11110 << 11) | (s << 10) | imm10
	lo = (0b11 << 14) | (j1 << 13) | (1 << 12) | (j2 << 11) | imm11
	'%04x %04x' % [hi, lo]
end

So, where’s the semicolon in that while loop? That is, where can we safely insert the call? That’s complicated, but in a somewhat less tedious way than the encoding of bl.

First it loads a constant from the pool into r3: 8000cb0 has 38 06 00 20, which is little-endian, so the address is 20000638. That’s probably some mmap’d I/O register, so that’s probably the address it checks to see if the USB is ready. So we probably need to make sure that the loop reads from wherever that is. r1 and r2 are both caller-saved registers, which means that if we call delay(10), we’ll need to restore them. For r1, we can just make sure it’s reset every time by making the bne.n point back up to 8000c38, which loads r1 from wherever r7 is pointed. r2 is a bigger problem, but since r4 is callee-saved and none of the instructions below us use it until it gets restored at 8000c74, we can just replace r2 with r4 and instead of figuring out how it’s encoded, it’s a big file, we just find an occurrence of ldr r4, [r3, #0] elsewhere (8001196 has one). I didn’t find any other occurrences of ldrb r3, [r4, #12], but it was easy to find the bit by being lazy and xoring the encodings for ldrb r3,[r2], ldrb r3,[r4], and ldrb r3[r2,#12]: 0x7823^(0x7813^0x7b13) and that yields 7b23. For the bne.n, I had irb open and just read the disassembly file, looked for other bne.n occurrences, then subtracted the destination address from the address of the instruction and looked for 0x16: d1f3.

Finally, we have the code:

 // while(!USBComposite);
 8000c36:	4b1e      	ldr	r3, [pc, #120]	@ (0x8000cb0 = 0x20000638)
 8000c38:	f897 1074 	ldrb.w	r1, [r7, #116]	@ 0x74
 8000c3c:	681c      	ldr	r4, [r3, #0]
 8000c3e:	b901      	cbnz	r1, 0x8000c42
 8000c40:	e7fe      	b.n	0x8000c40
 8000c42:	bf00      	nop
 8000c44:	200a      	movs	r0, #10
 8000c46:	f003 fda5  	bl 0x8004794
 8000c4a:	7b23      	ldrb	r3, [r4, #12]
 8000c4c:	2b05      	cmp	r3, #5
 8000c4e:	d1f3      	bne.n	0x8000c38
 // keyboard_init(&dev_term);
 8000c50:	4811      	ldr	r0, [pc, #68]	@ (0x8000c98)
 8000c52:	f7ff fba1 	bl	0x8000398
 // keys_init(&dev_term);
 8000c56:	4810      	ldr	r0, [pc, #64]	@ (0x8000c98)
 8000c58:	f7ff fe96 	bl	0x8000988
 // trackball_init(&dev_term);
 8000c5c:	480e      	ldr	r0, [pc, #56]	@ (0x8000c98)
 8000c5e:	f7ff ff17 	bl	0x8000a90
 // pinMode(PD2,INPUT)
 8000c62:	2102      	movs	r1, #2
 8000c64:	2030      	movs	r0, #48	@ 0x30
 8000c66:	f003 fc83 	bl	0x8004570
 // delay(1000)
 8000c6a:	f44f 707a 	mov.w	r0, #1000	@ 0x3e8
 8000c6e:	f003 fd91 	bl	0x8004794
 // return
 8000c72:	b00c      	add	sp, #48	@ 0x30
 8000c74:	e8bd 81f0 	ldmia.w	sp!, {r4, r5, r6, r7, r8, pc}

We end up with an extra nop but it doesn’t hurt to have an extra nop in an idle loop in an init function. The modified section summed to 54 bytes, so I tapped it into the hex editor (remember: little-endian!) and then disassembled the result and diffed that against the original assembly to see if it really did look right:

@@ -1345,21 +1345,21 @@
  8000c32:      f001 fc49       bl      0x80024c8
  8000c36:      4b1e            ldr     r3, [pc, #120]  @ (0x8000cb0)
  8000c38:      f897 1074       ldrb.w  r1, [r7, #116]  @ 0x74
- 8000c3c:      681a            ldr     r2, [r3, #0]
+ 8000c3c:      681c            ldr     r4, [r3, #0]
  8000c3e:      b901            cbnz    r1, 0x8000c42
  8000c40:      e7fe            b.n     0x8000c40
- 8000c42:      7b13            ldrb    r3, [r2, #12]
- 8000c44:      2b05            cmp     r3, #5
- 8000c46:      d1fa            bne.n   0x8000c3e
- 8000c48:      4813            ldr     r0, [pc, #76]   @ (0x8000c98)
- 8000c4a:      f7ff fba5       bl      0x8000398
- 8000c4e:      4812            ldr     r0, [pc, #72]   @ (0x8000c98)
- 8000c50:      f7ff fe9a       bl      0x8000988
- 8000c54:      4810            ldr     r0, [pc, #64]   @ (0x8000c98)
- 8000c56:      f7ff ff1b       bl      0x8000a90
- 8000c5a:      6970            ldr     r0, [r6, #20]
- 8000c5c:      4915            ldr     r1, [pc, #84]   @ (0x8000cb4)
- 8000c5e:      f003 fc68       bl      0x8004532
+ 8000c42:      bf00            nop
+ 8000c44:      200a            movs    r0, #10
+ 8000c46:      f003 fda5       bl      0x8004794
+ 8000c4a:      7b23            ldrb    r3, [r4, #12]
+ 8000c4c:      2b05            cmp     r3, #5
+ 8000c4e:      d1f3            bne.n   0x8000c38
+ 8000c50:      4811            ldr     r0, [pc, #68]   @ (0x8000c98)
+ 8000c52:      f7ff fba1       bl      0x8000398
+ 8000c56:      4810            ldr     r0, [pc, #64]   @ (0x8000c98)
+ 8000c58:      f7ff fe96       bl      0x8000988
+ 8000c5c:      480e            ldr     r0, [pc, #56]   @ (0x8000c98)
+ 8000c5e:      f7ff ff17       bl      0x8000a90
  8000c62:      2102            movs    r1, #2
  8000c64:      2030            movs    r0, #48 @ 0x30
  8000c66:      f003 fc83       bl      0x8004570
1 Like

This might actually come in handy, unexpectedly, as it looks like the PicoCalc uses the same CPU, and I have some dumb PicoCalc stuff to try shortly.

1 Like