I jostled my DevTerm and the keyboard went into the busy-wait loop again. This ate another 15 minutes; I needed a break anyway.
TL;DR: 0.1 keyboard firmware with a delay added to fix the busy-wait race condition: https://media.freespeechextremist.com/rvl/full/22f4e1e35c95803479306abf0fa1e0170d99697cd93308bac5a981394ca10bfd?name=devterm_keyboard.bin . This will be useful to basically no one but people that also use acme on the DevTerm and also can’t get the firmware to compile. That may be just me.
The remainder is an explanation because it was an adventure. Gory details follow. Most hackers’ blood runs cold if they hear a sentence that starts with “Can’t you just⋯” but it’s a little harder to recognize when you are doing it to yourself with “We could just⋯” or “Hey, I’ll just⋯”.
The mouse support on the 0.3 firmware is much nicer and I’d really like to be able to use it, but the changes to the behavior of middle-click (simulating a scroll wheel if you move the mouse while holding it; I use acme and this breaks acme). So I’ve got a .ino with that behavior fixed but I cannot get it to compile. (Same versions of everything listed in the README, etc., balky Arduino IDE refuses to compile the code, I blow away ~/.arduino15 and start from scratch and grab the .) I really like the uConsole and the PicoCalc is fun so far, but I still use the DevTerm literally every day. I still absolutely love this machine.
void setup() {
USBComposite.setManufacturerString("ClockworkPI");
USBComposite.setProductString("DevTerm");
USBComposite.setSerialString(SER_NUM_STR);
dev_term.Keyboard = new HIDKeyboard(HID);
dev_term.Joystick = new HIDJoystick(HID);
dev_term.Mouse = new HIDMouse(HID);
dev_term.Consumer = new HIDConsumer(HID);
dev_term.Keyboard->setAdjustForHostCapsLock(false);
dev_term.state = new State();
dev_term.Keyboard_state.layer = 0;
dev_term.Keyboard_state.prev_layer = 0;
dev_term.Keyboard_state.fn_on = 0;
dev_term.Keyboard_state.shift = 0;
dev_term._Serial = new USBCompositeSerial;
HID.begin(*dev_term._Serial,reportDescription, sizeof(reportDescription));
while(!USBComposite); // All we need to do is add delay(10); here.
keyboard_init(&dev_term);
keys_init(&dev_term);
trackball_init(&dev_term);
dev_term._Serial->println("setup done");
pinMode(PD2,INPUT);// switch 2 in back
delay(1000);
}
So, I thought, it’s one call: I should just patch the binary! It’s easy-mode for reverse-engineering because the source is right there: this is essentially open-book! This turns out to have been more of an adventure than intended. (But still less pain than arguing with the Arduino SDK.)
The .bin files in the repo (which, with a small amount of effort, you can extract from the shell scripts in Code/devterm_keyboard/bin from the GitHub - clockworkpi/DevTerm: This code repository offers downloads for the latest images of various DevTerm models, as well as kernel patches, keyboard firmware, the source code for screen and printer drivers, hardware schematics, assembly instructions, and essential technical documents. repo) are, of course, without symbols. So, xxd! Easy! I’ll just find the strings in the binary. ARM uses constant pools, so the strings are probably right next to the code! Not too many strings in there, but you can see “ClockworkPI” and “DevTerm” and “setup done” starting at 0xa8d4, and a few other string constants before that. Since stmduino puts the firmware at 0x8000000, this means that the address in memory. Also it turns out that stmduino is Cortex-M3 and Thumb-only and Thumb is way less like ARM than I expected. I’ve never written Thumb by hand; just ARM. This turns out to be relevant.
This will save you some trouble if you want to dump firmware for stmduino devices:
arm-none-eabi-objdump -b binary -m arm -M force-thumb -D --adjust-vma=0x08000000 devterm_keyboard.ino.bin
ARM vs. Thumb is interesting. I fell in love with the ISA back when the Game Boy Advance (ARM7tdmi) was current-gen; I was somewhat late to the game as the Acorn RISC Machine desktops first shipped around the time of the 80286. (There is a really fascinating series of articles at A history of ARM, part 1: Building the first chip - Ars Technica . I learned ARM assembly from a very opinionated site from a devoted ARM fan, ARM ASSEMBLER PROGRAMMING; tutorial, resources, and examples (which was, back then, heyrick.co.uk, if you have the urge to inspect it in the Wayback Machine). The assembly language was super expressive and wonderful and I was seduced and largely ignored Thumb. Every instruction is the same size, one 32-bit word, so unlike x86 where you start disassembling at the wrong offset and you get some plausible-looking gibberish, you’re always aligned when disassembling ARM. You could also do a lot of fun things, like immediate values in registers could be shifted and then you could do a relative load with an offset and this was how you usually interacted with I/O registers. You could keep registers around for this, you’d set one to the video memory or wherever and do some loads and stores; it turns out that if you’re restricted to Thumb-only, immediate values are so scarce that you chase pointers to pointers to pointers just to get a 32-bit number into a 32-bit register.
So, what you do for convenience and instruction-bumming in ARM mode becomes a constant necessity in Thumb mode: the compiler holds onto those registers for dear life. Consequently, although I was looking for “setup done”, and figured it would be straightforward to find its address in a constant pool, it was loaded relative to r6, which was set up approximately 292 bytes before that string was used, and I spent some time not realizing this.
8000b36: 4e58 ldr r6, [pc, #352] @ (0x8000c98)
⋯
// dev_term._Serial->println("setup done");
8000c5a: 6970 ldr r0, [r6, #20]
8000c5c: 4915 ldr r1, [pc, #84] @ (0x8000cb4)
8000c5e: f003 fc68 bl 0x8004532
(If you haven’t read much disassembly, on the left is the address, then the 16-bit Thumb instruction, with a little space for double-wide instructions, then the assembly mnemonic. The addresses after the @s are provided by objdump, but the mapping to lines in the code If you know ARM, it’s probably pretty legible but there are gotchas.)
One of the reasons I like to deal with acme is more or less the same reason people like Jupyter notebooks: it’s really easy to intersperse code and annotations, you run a shell and you can make notes. So I was already sitting with one foot in Plan 9 and one in Linux and was pretty sure there was a hex editor for Plan 9 (never enjoyed xxd -r). Cloning and installing vexed(1) was the easiest part of this entire process: shithub: vexed .
Eventually, spending a lot of time blindly stumbling through objdump and I grepped the disassembly for every occurrence of #1000 and found what I was looking for:
// delay(1000)
8000c6a: f44f 707a mov.w r0, #1000 @ 0x3e8
8000c6e: f003 fd91 bl 0x8004794
The bit that made it stand out was that you can see that the pointer dev_term is used in three functions a few lines above the delay, for keyboard_init, keys_init, and trackball_init.
After some careful reading backwards, I annotated enough of the function to figure out what I had to work with:
// while(!USBComposite);
*8000c36: 4b1e ldr r3, [pc, #120] @ (0x8000cb0)
8000c38: f897 1074 ldrb.w r1, [r7, #116] @ 0x74
8000c3c: 681a ldr r2, [r3, #0]
*8000c3e: b901 cbnz r1, 0x8000c42
*8000c40: e7fe b.n 0x8000c40
8000c42: 7b13 ldrb r3, [r2, #12]
8000c44: 2b05 cmp r3, #5
*8000c46: d1fa bne.n 0x8000c3e
// keyboard_init(&dev_term);
*8000c48: 4813 ldr r0, [pc, #76] @ (0x8000c98)
*8000c4a: f7ff fba5 bl 0x8000398
// keys_init(&dev_term);
*8000c4e: 4812 ldr r0, [pc, #72] @ (0x8000c98)
*8000c50: f7ff fe9a bl 0x8000988
// trackball_init(&dev_term);
*8000c54: 4810 ldr r0, [pc, #64] @ (0x8000c98)
*8000c56: f7ff ff1b bl 0x8000a90
// dev_term._Serial->println("setup done");
8000c5a: 6970 ldr r0, [r6, #20]
*8000c5c: 4915 ldr r1, [pc, #84] @ (0x8000cb4)
*8000c5e: f003 fc68 bl 0x8004532
// pinMode(PD2,INPUT)
8000c62: 2102 movs r1, #2
8000c64: 2030 movs r0, #48 @ 0x30
*8000c66: f003 fc83 bl 0x8004570
// delay(1000)
8000c6a: f44f 707a mov.w r0, #1000 @ 0x3e8
*8000c6e: f003 fd91 bl 0x8004794
// return
8000c72: b00c add sp, #48 @ 0x30
8000c74: e8bd 81f0 ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
The return was implicit, but have a look at which registers it’s restoring. Incidentally, although the .ino file may have initially looked like friendly C code, there was something more sinister lurking: it’s actually C++. I spent some time scratching my head: the beginning of setup() calls, for example, USBComposite.setManufacturerString("ClockworkPI");, but then there’s while(!USBComposite). It may look like we’re just waiting for a variable to stop being null but that’s actually calling something like explicit operator bool() const somewhere.
Most of the code is pretty straightforward. Entirely too straightforward: -Os is pretty good at Thumb, I guess. To do a delay(10), we’ll need at least six bytes and there doesn’t appear to be anywhere to put them. No space in the constant pool. If we’re patching in-place, we can’t just insert it, because that’ll screw up all of the offsets everywhere, and if you look at that series of instructions a little closer, you’ll notice that of the 24 instructions where we’re operating, 14 of them (marked with asterisks) are PC-relative: not just PC-relative loads from the constant pool, but look at, for example, 8000c3e, which is a conditional relative-jump (cbnz, “compare and branch if non-zero”), but all of the branches are relative. The code looks as simple as we can make it, as compact as it could be, and extremely sensitive to being moved. It is probably satisfying if you’re the author of the compiler, but if you’re trying to find a place to cram some new instructions, it’s nightmarish.
I spent some time trying to figure out 8000c40, which is just an endless loop: it jumps to 8000c40. The instruction right before it, jumps over it unless r1==0 and I don’t know what that actually signifies so why it would choose that time to lock up is a mystery to me. I spent an inordinate amount of time thinking “Maybe I could write some cleverly compressed code, shove it onto the stack, and execute it there”.
A sacrifice has to be made, clearly. Thankfully, we have dev_term._Serial->println("setup done"); and that’s debugging output. If we take that out, we get eight extra bytes, more than the six we need. Then we could just move everything down. So, the first step, we make some space:
// while(!USBComposite);
8000c36: 4b1e ldr r3, [pc, #120] @ (0x8000cb0)
8000c38: f897 1074 ldrb.w r1, [r7, #116] @ 0x74
8000c3c: 681a ldr r2, [r3, #0]
8000c3e: b901 cbnz r1, 0x8000c42
8000c40: e7fe b.n 0x8000c40
8000c42: 7b13 ldrb r3, [r2, #12]
8000c44: 2b05 cmp r3, #5
8000c46: d1fa bne.n 0x8000c3e
8000c48: bf00 nop
8000c4a: bf00 nop
8000c4c: bf00 nop
8000c4e: bf00 nop
// keyboard_init(&dev_term);
8000c50: 4811 ldr r0, [pc, #68] @ (0x8000c98)
8000c52: f7ff fba1 bl 0x8000398
// keys_init(&dev_term);
8000c56: 4810 ldr r0, [pc, #64] @ (0x8000c98)
8000c58: f7ff fe96 bl 0x8000988
// trackball_init(&dev_term);
8000c5c: 480e ldr r0, [pc, #56] @ (0x8000c98)
8000c5e: f7ff ff17 bl 0x8000a90
// pinMode(PD2,INPUT)
8000c62: 2102 movs r1, #2
8000c64: 2030 movs r0, #48 @ 0x30
8000c66: f003 fc83 bl 0x8004570
// delay(1000)
8000c6a: f44f 707a mov.w r0, #1000 @ 0x3e8
8000c6e: f003 fd91 bl 0x8004794
// return
8000c72: b00c add sp, #48 @ 0x30
8000c74: e8bd 81f0 ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
Eight bytes never felt so luxurious. Of course, it was easy enough to adjust the PC-relative ldrs but the branches were and are a huge pain. The offsets for the ldrs were easy enough: those instructions all moved down eight bytes, so subtract eight from the offset. For the encoding (which matters if you’re going to be making the change with a hex editor), ldr is always word-aligned so Thumb doesn’t encode the two lowest bits, and the offset is at the end of the instruction, so you just subtract 2: 4813 becomes 4811. For the bl, though, the encoding is a mess. You have 23 immediate bits (sign plus 22 bits) offset from the PC and split up across the two half-words and shifted some of them are XOR’d with the sign and I do not know why. They are corrected in the above, but at this stage, I actually just wrote #WRONG next to them until the end.
# Some Ruby that does the bl calculation:
def bl_calc bl_addr, target_addr
p = (bl_addr + 4)
imm32 = (target_addr - p) >> 1
imm11 = imm32 & 0x7FF # bits[10:0]
imm10 = (imm32 >> 11) & 0x3FF # bits[20:11]
s = (imm32 >> 21) & 1
i1 = (imm32 >> 20) & 1
i2 = (imm32 >> 19) & 1
j1 = (~(i1 ^ s)) & 1
j2 = (~(i2 ^ s)) & 1
hi = (0b11110 << 11) | (s << 10) | imm10
lo = (0b11 << 14) | (j1 << 13) | (1 << 12) | (j2 << 11) | imm11
'%04x %04x' % [hi, lo]
end
So, where’s the semicolon in that while loop? That is, where can we safely insert the call? That’s complicated, but in a somewhat less tedious way than the encoding of bl.
First it loads a constant from the pool into r3: 8000cb0 has 38 06 00 20, which is little-endian, so the address is 20000638. That’s probably some mmap’d I/O register, so that’s probably the address it checks to see if the USB is ready. So we probably need to make sure that the loop reads from wherever that is. r1 and r2 are both caller-saved registers, which means that if we call delay(10), we’ll need to restore them. For r1, we can just make sure it’s reset every time by making the bne.n point back up to 8000c38, which loads r1 from wherever r7 is pointed. r2 is a bigger problem, but since r4 is callee-saved and none of the instructions below us use it until it gets restored at 8000c74, we can just replace r2 with r4 and instead of figuring out how it’s encoded, it’s a big file, we just find an occurrence of ldr r4, [r3, #0] elsewhere (8001196 has one). I didn’t find any other occurrences of ldrb r3, [r4, #12], but it was easy to find the bit by being lazy and xoring the encodings for ldrb r3,[r2], ldrb r3,[r4], and ldrb r3[r2,#12]: 0x7823^(0x7813^0x7b13) and that yields 7b23. For the bne.n, I had irb open and just read the disassembly file, looked for other bne.n occurrences, then subtracted the destination address from the address of the instruction and looked for 0x16: d1f3.
Finally, we have the code:
// while(!USBComposite);
8000c36: 4b1e ldr r3, [pc, #120] @ (0x8000cb0 = 0x20000638)
8000c38: f897 1074 ldrb.w r1, [r7, #116] @ 0x74
8000c3c: 681c ldr r4, [r3, #0]
8000c3e: b901 cbnz r1, 0x8000c42
8000c40: e7fe b.n 0x8000c40
8000c42: bf00 nop
8000c44: 200a movs r0, #10
8000c46: f003 fda5 bl 0x8004794
8000c4a: 7b23 ldrb r3, [r4, #12]
8000c4c: 2b05 cmp r3, #5
8000c4e: d1f3 bne.n 0x8000c38
// keyboard_init(&dev_term);
8000c50: 4811 ldr r0, [pc, #68] @ (0x8000c98)
8000c52: f7ff fba1 bl 0x8000398
// keys_init(&dev_term);
8000c56: 4810 ldr r0, [pc, #64] @ (0x8000c98)
8000c58: f7ff fe96 bl 0x8000988
// trackball_init(&dev_term);
8000c5c: 480e ldr r0, [pc, #56] @ (0x8000c98)
8000c5e: f7ff ff17 bl 0x8000a90
// pinMode(PD2,INPUT)
8000c62: 2102 movs r1, #2
8000c64: 2030 movs r0, #48 @ 0x30
8000c66: f003 fc83 bl 0x8004570
// delay(1000)
8000c6a: f44f 707a mov.w r0, #1000 @ 0x3e8
8000c6e: f003 fd91 bl 0x8004794
// return
8000c72: b00c add sp, #48 @ 0x30
8000c74: e8bd 81f0 ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
We end up with an extra nop but it doesn’t hurt to have an extra nop in an idle loop in an init function. The modified section summed to 54 bytes, so I tapped it into the hex editor (remember: little-endian!) and then disassembled the result and diffed that against the original assembly to see if it really did look right:
@@ -1345,21 +1345,21 @@
8000c32: f001 fc49 bl 0x80024c8
8000c36: 4b1e ldr r3, [pc, #120] @ (0x8000cb0)
8000c38: f897 1074 ldrb.w r1, [r7, #116] @ 0x74
- 8000c3c: 681a ldr r2, [r3, #0]
+ 8000c3c: 681c ldr r4, [r3, #0]
8000c3e: b901 cbnz r1, 0x8000c42
8000c40: e7fe b.n 0x8000c40
- 8000c42: 7b13 ldrb r3, [r2, #12]
- 8000c44: 2b05 cmp r3, #5
- 8000c46: d1fa bne.n 0x8000c3e
- 8000c48: 4813 ldr r0, [pc, #76] @ (0x8000c98)
- 8000c4a: f7ff fba5 bl 0x8000398
- 8000c4e: 4812 ldr r0, [pc, #72] @ (0x8000c98)
- 8000c50: f7ff fe9a bl 0x8000988
- 8000c54: 4810 ldr r0, [pc, #64] @ (0x8000c98)
- 8000c56: f7ff ff1b bl 0x8000a90
- 8000c5a: 6970 ldr r0, [r6, #20]
- 8000c5c: 4915 ldr r1, [pc, #84] @ (0x8000cb4)
- 8000c5e: f003 fc68 bl 0x8004532
+ 8000c42: bf00 nop
+ 8000c44: 200a movs r0, #10
+ 8000c46: f003 fda5 bl 0x8004794
+ 8000c4a: 7b23 ldrb r3, [r4, #12]
+ 8000c4c: 2b05 cmp r3, #5
+ 8000c4e: d1f3 bne.n 0x8000c38
+ 8000c50: 4811 ldr r0, [pc, #68] @ (0x8000c98)
+ 8000c52: f7ff fba1 bl 0x8000398
+ 8000c56: 4810 ldr r0, [pc, #64] @ (0x8000c98)
+ 8000c58: f7ff fe96 bl 0x8000988
+ 8000c5c: 480e ldr r0, [pc, #56] @ (0x8000c98)
+ 8000c5e: f7ff ff17 bl 0x8000a90
8000c62: 2102 movs r1, #2
8000c64: 2030 movs r0, #48 @ 0x30
8000c66: f003 fc83 bl 0x8004570