THUMB Disassembler

I was recently searching through my hard drive looking for examples that might highlight my love of low level programming, and I present here one such project which is a good example of this.

THUMB is an instruction set that was added to some ARM processors as a way to reduce code size, which is advantageous as ARM typically finds use in very constrained devices.  The processor can switch between two operating modes at the request of the program, switching from decoding ARM instructions to smaller THUMB instructions.

I found THUMB particularly interesting since the instruction set is clean and simple and it might make a nice basis for a virtual machine to execute programs written in it.  My idea was to use programs written in C and compile them to THUMB and then execute them in a virtual machine, the entire process serving as a powerfully scripting system for games.

Ultimately, I switched to a MIPS processor as my basis for this project, but I still got as far as making a THUMB disassembler.  I should point out that this was the first disassembler I ever wrote so it marks a large mile stone for me.  Since then I have made a very nice MIPS disassembler using much of what I learned on this project.  The code for the THUMB disassembler can be found here:

thumb_disassm.cpp

To test a disassembler there needs to be something to feed it.  For that purpose, I compiled a small program in GCC targeted for the THUMB instruction set.  The program is listed below, and is just a nonsense program designed only to be valid so as to generate some instructions:

int littleFunc( void )
{
    return 2;
}

int testA( int v )
{
    return v << 1;
}

int testC( int v )
{
    int c[] = { 1, 0 };
    return c[v];
}

int entry ( void )
{

    int i=0;
    while ( i<10 )
        i += testC( i );    return i;
}

At the time I didn’t have any kind of parser for the ELF format so I had to compile and output to a raw binary format.  In the source provided you can see that there is a large C array that stores all the THUMB machine instructions to be decoded, which was out of laziness since I didn’t want to have to do any file loading.  During development I made frequent checks against ‘objdump’ to help catch any bugs.  The raw output of the disassembly fed with the given program can be seen below:

    0:  B580    push    { r7 lr }
    2:  AF00    add     r7, sp, #0
    4:  2302    mov     r3, #2
    6:  1C18    add     r0, r3, ?r0
    8:  46BD    mov     ?r5, ?r7
    A:  BC80    pop     { r7 }
    C:  BC02    pop     { r1 }
    E:  4708    bx      ?r0, ?r1
   10:  B580    push    { r7 lr }
   12:  B082    sub     sp, #8
   14:  AF00    add     r7, sp, #0
   16:  6078    str     r0, [r7, #4]
   18:  687B    ldr     r3, [r7, #4]
   1A:  005B    lsl     r3, r3, #1
   1C:  1C18    add     r0, r3, ?r0
   1E:  46BD    mov     ?r5, ?r7
   20:  B002    add     sp, #8
   22:  BC80    pop     { r7 }
   24:  BC02    pop     { r1 }
   26:  4708    bx      ?r0, ?r1
   28:  B580    push    { r7 lr }
   2A:  B084    sub     sp, #16
   2C:  AF00    add     r7, sp, #0
   2E:  6078    str     r0, [r7, #4]
   30:  1C3B    add     r3, r7, ?r0
   32:  3308    add     r3, #8
   34:  2201    mov     r2, #1
   36:  601A    str     r2, [r3, #0]
   38:  1C3B    add     r3, r7, ?r0
   3A:  3308    add     r3, #8
   3C:  2200    mov     r2, #0
   3E:  605A    str     r2, [r3, #4]
   40:  1C3B    add     r3, r7, ?r0
   42:  3308    add     r3, #8
   44:  687A    ldr     r2, [r7, #4]
   46:  0092    lsl     r2, r2, #2
   48:  58D3    ldr     r3, [r2, r3]
   4A:  1C18    add     r0, r3, ?r0
   4C:  46BD    mov     ?r5, ?r7
   4E:  B004    add     sp, #16
   50:  BC80    pop     { r7 }
   52:  BC02    pop     { r1 }
   54:  4708    bx      ?r0, ?r1
   56:  46C0    mov     ?r0, ?r0
   58:  B580    push    { r7 lr }
   5A:  B082    sub     sp, #8
   5C:  AF00    add     r7, sp, #0
   5E:  2300    mov     r3, #0
   60:  607B    str     r3, [r7, #4]
   62:  E005    b       #-16374
   64:  F7FF    bl2     #63487
   66:  FFCC    bl1     #65484
   68:  1C03    add     r3, r0, ?r0
   6A:  687A    ldr     r2, [r7, #4]
   6C:  18D3    add     r3, r2, ?r3
   6E:  607B    str     r3, [r7, #4]
   70:  687B    ldr     r3, [r7, #4]
   72:  2B09    cmp     r3, #9
   74:  DDF6    ble     30680
   76:  687B    ldr     r3, [r7, #4]
   78:  1C18    add     r0, r3, ?r0
   7A:  46BD    mov     ?r5, ?r7
   7C:  B002    add     sp, #8
   7E:  BC80    pop     { r7 }
   80:  BC02    pop     { r1 }
   82:  4708    bx      ?r0, ?r1

for comparison here is the output of objdump:

Disassembly of section .text:

00000000 <littleFunc>:
   0:    b580          push    {r7, lr}
   2:    af00          add    r7, sp, #0
   4:    2302          movs    r3, #2
   6:    1c18          adds    r0, r3, #0
   8:    46bd          mov    sp, r7
   a:    bc80          pop    {r7}
   c:    bc02          pop    {r1}
   e:    4708          bx    r1

00000010 <testA>:
  10:    b580          push    {r7, lr}
  12:    b082          sub    sp, #8
  14:    af00          add    r7, sp, #0
  16:    6078          str    r0, [r7, #4]
  18:    687b          ldr    r3, [r7, #4]
  1a:    005b          lsls    r3, r3, #1
  1c:    1c18          adds    r0, r3, #0
  1e:    46bd          mov    sp, r7
  20:    b002          add    sp, #8
  22:    bc80          pop    {r7}
  24:    bc02          pop    {r1}
  26:    4708          bx    r1

00000028 <testC>:
  28:    b580          push    {r7, lr}
  2a:    b084          sub    sp, #16
  2c:    af00          add    r7, sp, #0
  2e:    6078          str    r0, [r7, #4]
  30:    1c3b          adds    r3, r7, #0
  32:    3308          adds    r3, #8
  34:    2201          movs    r2, #1
  36:    601a          str    r2, [r3, #0]
  38:    1c3b          adds    r3, r7, #0
  3a:    3308          adds    r3, #8
  3c:    2200          movs    r2, #0
  3e:    605a          str    r2, [r3, #4]
  40:    1c3b          adds    r3, r7, #0
  42:    3308          adds    r3, #8
  44:    687a          ldr    r2, [r7, #4]
  46:    0092          lsls    r2, r2, #2
  48:    58d3          ldr    r3, [r2, r3]
  4a:    1c18          adds    r0, r3, #0
  4c:    46bd          mov    sp, r7
  4e:    b004          add    sp, #16
  50:    bc80          pop    {r7}
  52:    bc02          pop    {r1}
  54:    4708          bx    r1
  56:    46c0          nop            ; (mov r8, r8)

00000058 <entry>:
  58:    b580          push    {r7, lr}
  5a:    b082          sub    sp, #8
  5c:    af00          add    r7, sp, #0
  5e:    2300          movs    r3, #0
  60:    607b          str    r3, [r7, #4]
  62:    e007          b.n    74 <entry+0x1c>
  64:    687b          ldr    r3, [r7, #4]
  66:    1c18          adds    r0, r3, #0
  68:    f7ff fffe     bl    28 <testC>
  6c:    1c03          adds    r3, r0, #0
  6e:    687a          ldr    r2, [r7, #4]
  70:    18d3          adds    r3, r2, r3
  72:    607b          str    r3, [r7, #4]
  74:    687b          ldr    r3, [r7, #4]
  76:    2b09          cmp    r3, #9
  78:    ddf4          ble.n    64 <entry+0xc>
  7a:    687b          ldr    r3, [r7, #4]
  7c:    1c18          adds    r0, r3, #0
  7e:    46bd          mov    sp, r7
  80:    b002          add    sp, #8
  82:    bc80          pop    {r7}
  84:    bc02          pop    {r1}
  86:    4708          bx    r1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s