I was recently searching through my hard drive looking for examples that might highlight my love of low level programming, and I present here one such project which is a good example of this.
THUMB is an instruction set that was added to some ARM processors as a way to reduce code size, which is advantageous as ARM typically finds use in very constrained devices. The processor can switch between two operating modes at the request of the program, switching from decoding ARM instructions to smaller THUMB instructions.
I found THUMB particularly interesting since the instruction set is clean and simple and it might make a nice basis for a virtual machine to execute programs written in it. My idea was to use programs written in C and compile them to THUMB and then execute them in a virtual machine, the entire process serving as a powerfully scripting system for games.
Ultimately, I switched to a MIPS processor as my basis for this project, but I still got as far as making a THUMB disassembler. I should point out that this was the first disassembler I ever wrote so it marks a large mile stone for me. Since then I have made a very nice MIPS disassembler using much of what I learned on this project. The code for the THUMB disassembler can be found here:
To test a disassembler there needs to be something to feed it. For that purpose, I compiled a small program in GCC targeted for the THUMB instruction set. The program is listed below, and is just a nonsense program designed only to be valid so as to generate some instructions:
int littleFunc( void ) { return 2; } int testA( int v ) { return v << 1; } int testC( int v ) { int c[] = { 1, 0 }; return c[v]; } int entry ( void ) { int i=0; while ( i<10 ) i += testC( i ); return i; }
At the time I didn’t have any kind of parser for the ELF format so I had to compile and output to a raw binary format. In the source provided you can see that there is a large C array that stores all the THUMB machine instructions to be decoded, which was out of laziness since I didn’t want to have to do any file loading. During development I made frequent checks against ‘objdump’ to help catch any bugs. The raw output of the disassembly fed with the given program can be seen below:
0: B580 push { r7 lr } 2: AF00 add r7, sp, #0 4: 2302 mov r3, #2 6: 1C18 add r0, r3, ?r0 8: 46BD mov ?r5, ?r7 A: BC80 pop { r7 } C: BC02 pop { r1 } E: 4708 bx ?r0, ?r1 10: B580 push { r7 lr } 12: B082 sub sp, #8 14: AF00 add r7, sp, #0 16: 6078 str r0, [r7, #4] 18: 687B ldr r3, [r7, #4] 1A: 005B lsl r3, r3, #1 1C: 1C18 add r0, r3, ?r0 1E: 46BD mov ?r5, ?r7 20: B002 add sp, #8 22: BC80 pop { r7 } 24: BC02 pop { r1 } 26: 4708 bx ?r0, ?r1 28: B580 push { r7 lr } 2A: B084 sub sp, #16 2C: AF00 add r7, sp, #0 2E: 6078 str r0, [r7, #4] 30: 1C3B add r3, r7, ?r0 32: 3308 add r3, #8 34: 2201 mov r2, #1 36: 601A str r2, [r3, #0] 38: 1C3B add r3, r7, ?r0 3A: 3308 add r3, #8 3C: 2200 mov r2, #0 3E: 605A str r2, [r3, #4] 40: 1C3B add r3, r7, ?r0 42: 3308 add r3, #8 44: 687A ldr r2, [r7, #4] 46: 0092 lsl r2, r2, #2 48: 58D3 ldr r3, [r2, r3] 4A: 1C18 add r0, r3, ?r0 4C: 46BD mov ?r5, ?r7 4E: B004 add sp, #16 50: BC80 pop { r7 } 52: BC02 pop { r1 } 54: 4708 bx ?r0, ?r1 56: 46C0 mov ?r0, ?r0 58: B580 push { r7 lr } 5A: B082 sub sp, #8 5C: AF00 add r7, sp, #0 5E: 2300 mov r3, #0 60: 607B str r3, [r7, #4] 62: E005 b #-16374 64: F7FF bl2 #63487 66: FFCC bl1 #65484 68: 1C03 add r3, r0, ?r0 6A: 687A ldr r2, [r7, #4] 6C: 18D3 add r3, r2, ?r3 6E: 607B str r3, [r7, #4] 70: 687B ldr r3, [r7, #4] 72: 2B09 cmp r3, #9 74: DDF6 ble 30680 76: 687B ldr r3, [r7, #4] 78: 1C18 add r0, r3, ?r0 7A: 46BD mov ?r5, ?r7 7C: B002 add sp, #8 7E: BC80 pop { r7 } 80: BC02 pop { r1 } 82: 4708 bx ?r0, ?r1
for comparison here is the output of objdump:
Disassembly of section .text: 00000000 <littleFunc>: 0: b580 push {r7, lr} 2: af00 add r7, sp, #0 4: 2302 movs r3, #2 6: 1c18 adds r0, r3, #0 8: 46bd mov sp, r7 a: bc80 pop {r7} c: bc02 pop {r1} e: 4708 bx r1 00000010 <testA>: 10: b580 push {r7, lr} 12: b082 sub sp, #8 14: af00 add r7, sp, #0 16: 6078 str r0, [r7, #4] 18: 687b ldr r3, [r7, #4] 1a: 005b lsls r3, r3, #1 1c: 1c18 adds r0, r3, #0 1e: 46bd mov sp, r7 20: b002 add sp, #8 22: bc80 pop {r7} 24: bc02 pop {r1} 26: 4708 bx r1 00000028 <testC>: 28: b580 push {r7, lr} 2a: b084 sub sp, #16 2c: af00 add r7, sp, #0 2e: 6078 str r0, [r7, #4] 30: 1c3b adds r3, r7, #0 32: 3308 adds r3, #8 34: 2201 movs r2, #1 36: 601a str r2, [r3, #0] 38: 1c3b adds r3, r7, #0 3a: 3308 adds r3, #8 3c: 2200 movs r2, #0 3e: 605a str r2, [r3, #4] 40: 1c3b adds r3, r7, #0 42: 3308 adds r3, #8 44: 687a ldr r2, [r7, #4] 46: 0092 lsls r2, r2, #2 48: 58d3 ldr r3, [r2, r3] 4a: 1c18 adds r0, r3, #0 4c: 46bd mov sp, r7 4e: b004 add sp, #16 50: bc80 pop {r7} 52: bc02 pop {r1} 54: 4708 bx r1 56: 46c0 nop ; (mov r8, r8) 00000058 <entry>: 58: b580 push {r7, lr} 5a: b082 sub sp, #8 5c: af00 add r7, sp, #0 5e: 2300 movs r3, #0 60: 607b str r3, [r7, #4] 62: e007 b.n 74 <entry+0x1c> 64: 687b ldr r3, [r7, #4] 66: 1c18 adds r0, r3, #0 68: f7ff fffe bl 28 <testC> 6c: 1c03 adds r3, r0, #0 6e: 687a ldr r2, [r7, #4] 70: 18d3 adds r3, r2, r3 72: 607b str r3, [r7, #4] 74: 687b ldr r3, [r7, #4] 76: 2b09 cmp r3, #9 78: ddf4 ble.n 64 <entry+0xc> 7a: 687b ldr r3, [r7, #4] 7c: 1c18 adds r0, r3, #0 7e: 46bd mov sp, r7 80: b002 add sp, #8 82: bc80 pop {r7} 84: bc02 pop {r1} 86: 4708 bx r1