Crap JIT

November 30, 2015 by BitHack

Another weekend has passed, which meens another weekend project has been finished. Im not sure how I managed this as I currently spend most of my spare time shooting super mutants and ghouls in the face.

The project this time… A really sucky JIT compiler.

My recent success writing a small compiler and VM for a made up language really got me interested in this area once again. My compiler would generate bytecode for a stack based architecture and then my VM would emulate this.

While this aproach works, its not very efficient; The bytecode is never going to change after its compiled, yet we still pay the code of dynamic dispatch. I had an idea… could I generate sequences of host assembly instruction that directly encapsulate each of my bytecodes? If I could do this, then I could just string these little assembly blocks together in the same order as the bytecode and invoke it directly on the host.

I chose to target x86 code for my proof of concept because I am most familiar with it. The same aproach should in theory work for almost any architecture however.

I kept the bytecode basicaly the same as before for simplicity. Next up I wrote an assembly listing for nasm that contained all of these atomic little assembly blocks. A quick and dirty python script takes the binary output from nasm and the generated map file and outputs a generated cpp file containing all of the assembly chunks.

The assemly chunks start off life as follows:

ins_const:
    mov  eax, 0xaabbccdd
    push eax
ins_getl:
    mov  eax, [ebp+0xaabbccdd]
    push eax
ins_setl:
    pop eax
    mov [ebp+0xaabbccdd], eax
ins_add:
    pop eax;
    add [esp], eax;

And after assembly and passing threw my script I end up with the following:

{ ins_const   , 6 ,  1, -1, "\xb8\xdd\xcc\xbb\xaa\x50" },
{ ins_getl    , 7 ,  2, -1, "\x8b\x85\xdd\xcc\xbb\xaa\x50" },
{ ins_setl    , 7 ,  3, -1, "\x58\x89\x85\xdd\xcc\xbb\xaa" },
{ ins_add     , 4 , -1, -1, "\x58\x01\x04\x24" },

Some instructions take operands however, which the script deduces by looking for the operand constant 0xaabbccdd in the output bytestream after assembly. If the least significant bytes have changed after assembly then its fairly clear this is a relative operand, which are slightly more involved to fixup.

Most of the effort was spent wrapping up the interface to crapjit to make it east to build a codebuffer and handle relocations and labels.

As usual to prove things work I decided to jit a function to compute the factoral of a number. The code to jit this looks as follows:

label_t jit_factoral(crapjit_t & j) {
    reloc_t l0;
    label_t fn;
    // prologue
    fn = j.emit_label();
    j.emit_frame(0);
    // if (val <= 1)
    j.emit_getl(2);
    j.emit_const(1);
    j.emit_leq();
    l0 = j.emit_jz();
    // return 1
    j.emit_const(1);
    j.emit_return(0);
    // else
    l0.set(j.emit_label());
    // return val * factoral(val - 1)
    j.emit_getl(2);
    j.emit_getl(2);
    j.emit_const(1);
    j.emit_sub();
    j.emit_call().set(fn);
    j.emit_sink(1);
    j.emit_mul();
    j.emit_return(0);
    return fn;
}

Bare in mind that this is a stack architecture we are emulating so its fairly inefficient to start with. I would have felt deeply ashamed if I didnt at least include one kind of optimization in crap jit. Noticing that many assembly chunks start with ‘pop eax’ and end with ‘push eax’; when chaining such chunks, these instructions can be removed since they basicaly nop.

The generated code is absolutely horrible, yet functional:

00320000  push        ebp
00320001  mov         ebp,esp
00320003  sub         esp,0
00320009  mov         eax,dword ptr [ebp+8]
0032000F  push        eax
00320010  mov         eax,1
00320015  pop         edx
00320016  cmp         edx,eax
00320018  setle       al
0032001B  and         eax,1
0032001E  cmp         eax,0
00320021  je          00320034
00320027  mov         eax,1
0032002C  pop         ebp
0032002D  add         esp,0
00320033  ret
00320034  mov         eax,dword ptr [ebp+8]
0032003A  push        eax
0032003B  mov         eax,dword ptr [ebp+8]
00320041  push        eax
00320042  mov         eax,1
00320047  sub         dword ptr [esp],eax
0032004A  call        00320000
0032004F  add         esp,4
00320055  mul         eax,dword ptr [esp]
00320058  mov         dword ptr [esp],eax
0032005B  pop         eax
0032005C  pop         ebp
0032005D  add         esp,0
00320063  ret

Once side benefit of targeting x86 is that the ABI is trivialy compatable with my stack based architecture, so I was able to directly call into my JIT code without any kind of interface. That was a nice freebie.

All the code for this is on github and has been tested on windows and linux.

There is loads of scope to improve the generated output from crapjit. One aproach would be to store all of the IR instructions in a buffer, and then as a final synthesis step use a maximal munch tiling algorythm to cover the IR in assembly chunks. Some of the worst case code could be improved usign this method. Furthermore, the IR in the buffer itself could have all the usual optimization passes executed on it. Lastly a final peephole optimizer could clean up a ton of redundant operations in the output assembly.

It was fun to see that a simple JIT compiler could be made during the breaks from playing fallout 4 over a single weekend. Even if the output code is shockingly poor, its functional, and certainly faster then interpreting bytecode, so lets not be too quick to poke fun at it.

A simpler tinier expression parser

November 6, 2015 by BitHack

In my previous post I showed the expression parser that I have been using in some of my hobby projects. I always thought of this as fairly small and neat in its operation. It made sense to me and I was happy with it. However… Since writing that post, I have been thinking about expression parsing again and I have made some discoveries.

Last night I tried to eliminate the stack from my expression parse and use simply recursion and the main stack itself. I came up with the following algorithm, which works great, and I was amazed at how tiny it is:

int prec(char c) {
    switch (c) {
    case ('=') : return 1;
    case ('&') : return 2;
    case ('+') : return 3;
    case ('*') : return 4;
    default:     return 0;
    }
}

void level(int l) {
    while (char t = peek()) {
        if (t == ')') return;
        if (prec(t) > l) {
            level(prec(t));
        }
        else if (prec(t) == l) {
            t = next();
            ident();
            level(l);
            putc(t);
        }
        else {
            return;
        }
    }
}

void expr() {
    ident();
    level(1);
}

Since nobody can come up with anything original these days, I wanted to know more about the algorithm I have just found, since it surely has been discovered by those who came before me. It seems this is a crappy weird version of a precedence climbing parser. This link here does a very good job of explaining how it works, and how to extend it to handle things like differing association, which I had not considered.

After seeing that article I wanted to try out the jist of the algorithm as they present it. Here is the full source for my little test program I made to play around with it:

#include <stdio.h>
#include <assert.h>

// the source expession
static const char * g = "z=a+b*c+d=e";
//static const char * g = "a+b&c";
//static const char * g = "a+(b&c)";

// lexer duties
char next()         { if (*g) return *g++; else return '\0'; }
char peek()         { return *g; }
void putc(char c)   { putchar(c); }
void expect(char c) { char t = next(); assert(t == c); }
void expr(int v);

// munch an itentifier or parenthesized expression
void ident() {
    char n = next();
    if (n == '(') { expr(1); expect(')'); return; }
    assert(n >= 'a' && n <= 'z');
    putc(n);
}

// the precedence table
int prec(char c) {
    switch (c) {
    case ('=') : return 1;
    case ('&') : return 2;
    case ('+') : return 3;
    case ('*') : return 4;
    default:     return 0;
    }
}

// precedence climbing expression parser
void expr(int min_prec) {
    ident();
    while (prec(peek()) >= min_prec) {
        char c = next();
        expr(prec(c));
        putc(c);
    }
}

int main() {
    expr(1);
    return 0;
}

This version is much much smaller and neater then either of the versions I came up with. I will definitely be ditching my strange methods for this one. I am always happy to learn something new, and especially when its better then what I was previously doing. I also boldly said expressions were the hardest part of a recursive decent compiler in my previous post… well I certainly don’t think that’s true anymore!

A simple tiny expression parser

November 4, 2015 by BitHack

I mentioned in my previous post that I parse expressions in my mini language using a strange shunting yard algorithm. It mixes explicit stack management and recursive decent. Its operation makes intuitive sense to me which is why I use it, and the code to implement it happens to also be very short.

An expression parser is one of the hardest parts of a recursive descent parser to write, but with a little practice it doesn’t have to be hard at all. I’m sure there are much better ways out there to implement an expression parser so I would be interested to see them if anyone has any good links. Are they smaller then the ~100 lines (including comments) this one takes up?

The code presented below is almost identical in operation to the one from my mini parser. I added a tiny Lexer and REPL loop to it so it can be used as a very simple infix to postfix expression evaluator. I also used the C libraries setjmp() function to add very simple error handling.

It supports normal operators [+ – * /], parenthesis, the unary not operator [ ! ], and function calls. Only single character variables and literals are supported, so that the focus remains on the parser rather then the Lexer. Spaces are not supported either.

Try a simple expression such as:
> A*B-C(1+3)
> 1*(3+4)
> 2*!A

In the event that wordpress feels the need to mangle my source code here is an alternative link which will maintain the formatting: http://pastebin.com/TMkT17C6

#include <assert.h>
#include <stdint.h>
#include <vector>
#include <setjmp.h>
#include <stdio.h>

typedef char token_e;

namespace {
// return the precedence of an operator
int op_prec(token_e o) {
	switch (o) {
	case ('&'): return  1;
	case ('+'): 
	case ('-'): return  2;
	case ('*'): 
	case ('/'): return  3;
	default:    return  0;
	}
}

// return true if token is an operator
bool is_op(token_e o) {
	return op_prec(o) != 0;
}

// return true if token is a variable identifier
bool is_var(token_e o) {
	return (o>='A' && o<='Z') || (o>='a' && o<='z');
} 

// return true if token is a numeric literal 
bool is_num(token_e o) { 
	return (o>='0' && o<='9');
}
} // namespace {}

struct parser_t {

	std::vector<token_e> op_;
	const char * s_;
	jmp_buf error_buf;

	void error() {
		longjmp (error_buf, 1);
	}

	parser_t(const char * stream)
		: op_()
		, s_(stream)
	{}

	char found_op () { return (is_op  (*s_)) ? *(s_++) : 0; }
	char found_var() { return (is_var (*s_)) ? *(s_++) : 0; }
	char found_num() { return (is_num (*s_)) ? *(s_++) : 0; }

	// check for specific token in the token stream
	bool found(char ch) {
		if (*s_ == ch) {
			s_++;
			return true;
		}
		return false;
	}

	// unconditionally expect a token in the token stream
	void expect(char ch) {
		if (*(s_++) != ch) {
			printf ("\nexpected '%c'\n", ch);
			error ();
		}
	}

	// ---- ---- ---- ---- ---- ---- ---- ---- <expression parser>

	void pop_op() {
		assert(op_.size() > 0);
		token_e t = op_.back();
		putchar (t);
		op_.pop_back();
	}

	void push_op(uint32_t tide, token_e tok) {
		while (op_.size() > tide) {
			if (op_prec (op_.back ()) > op_prec (tok))
				pop_op ();
			else
				break;
		}
		op_.push_back(tok);
	}

	void pop_all_op(uint32_t tide) {
		while (op_.size() > tide)
			pop_op();
	}

	void parse_lhs(uint32_t tide) {
		// parenthesis
		if (found('(')) {
			parse_expr ();
			expect (')');
		}
		// identifier
		else if (char var = found_var()) {
			// identifier name
			putchar (var);
			// index an object
			while (found('.')) {
				if (char ix = found_var()) {
					putchar (ix);
					putchar ('@');
				}
				else {
					printf ("\nexpecting identifier, got '%c'\n", *s_);
					error ();
				}
			}
			// function call
			if (found('(')) {
				//
				uint32_t num_args = 0;
				// argument list
				if (!found(')')) {
					do {
						++num_args;
						parse_expr();
					} while (found(','));
					expect (')');
				}
				// call instruction
				printf("<%d>",-num_args);
			}
		}
		// numeric literal
		else if (char num = found_num()) {
			putchar (num);
		} 
		// unexpected token
		else {
			printf ("\nunexpected token '%c'\n", *s_);
			error ();
		}
	}

	// parse an expression with extended control
	void parse_expr_ex(uint32_t tide) {
		bool unary_not = found ('!');
		parse_lhs(tide);
		if (unary_not) {
			// logical not instruction
			putchar('!');
		}
		if (char op = found_op()) {
			push_op (tide, op);
			parse_expr_ex(tide);
		}
	}

	// parse a full expression
	void parse_expr() {
		uint32_t tide = op_.size();
		parse_expr_ex(tide);
		pop_all_op(tide);
	}

	// ---- ---- ---- ---- ---- ---- ---- ---- </expression parser>

	// parse root
	void parse() {
		if (!setjmp(error_buf)) {
			parse_expr ();
		} else {
			printf ("unable to parse\n");
		}
	}
};

// supported operators: 
//      &       and
//      + -     add sub
//      * /     mul div
//      !       unary not
// example input:
//      A+B*C*(0-3)
//      A&!B
//      2+A(B,C+1)
int main() {
	char buffer[1024];
	// little REPL loop
	for (;;) {
		printf ("expr: ");
		fgets(buffer, sizeof(buffer), stdin);
		if (buffer[0] == '\0')
			break;
		parser_t parser(buffer);
		printf ("post: ");
		parser.parse();
		printf ("\n\n");
	}
	return 0;
}

Mini Compiler and Stack Machine VM

November 4, 2015 by BitHack

This weekend past, I achieved a long-standing goal… I created a working compiler and Virtual Machine for a made up mini language. The target mini language is intentionally tiny and minimal.

Here are some buzzwords: Its a entirely handwritten recursive decent compiler. For expressions I use a slightly strange shunting yard implementation that I dreamed up with a while back. The target ISA is stack based similar to JAVA, Pascal or early LUA. There is only one data type… the 32bit integer.

At this point I’ll just throw up a link to the repo: https://github.com/8BitPimp/ccml
Be warned however that there a pungent reek of code smell emanating from there (it was a weekend project after all). So gas masks are advisable.

Some important constraints of the language are: Only integers are supported, there is also no globals or arrays or any such thing, also semantic checking is completely ignored.

It also helps to see a code fragment of this target language. Here is my factorial example I uses to prove to myself that recursion worked fine:

function factoral(val)
    if (val <= 1)
        return 1
    else
        return val * factoral(val - 1)
    end
end

function main()
    return factoral(4)
end

The above code currently translates into the following instructions:

-- factoral
0000 INS_GETV   -1
0005 INS_CONST  1
0010 INS_LEQ   
0011 INS_NOT   
0012 INS_CJMP   37
0017 INS_CONST  1
0022 INS_RET    1
0027 INS_CONST  1
0032 INS_CJMP   64
0037 INS_GETV   -1
0042 INS_GETV   -1
0047 INS_CONST  1
0052 INS_SUB   
0053 INS_CALL   0
0058 INS_MUL   
0059 INS_RET    1
0064 INS_CONST  0
0069 INS_RET    1

-- main
0074 INS_CONST  4
0079 INS_CALL   0
0084 INS_RET    0
0089 INS_CONST  0
0094 INS_RET    0

The syntax of my mini language is somewhat similar to LUA, who’s code I always found nice to look at. I first learned real programming using BlitzBasic so I have a soft spot for the BASIC syntax. Also the author of BlitzBasic, Mark Sibley is one of my heros!

One of the more interesting aspects of this little project, was the design and implementation of the stack machine the VM emulates. I have read a fair bit about stack machines in the past, but somehow I never quite groked their operation. My main sticking point was the ABI, how are parameters passed/used, how are local stored/used, how is a frame constructed?

I had visions that I would need loads of stacks for different things, locals, return points, arguments etc. This turned out to be way off, and I was able to make a stack machine that used just two stacks, however even this could certainly be reduced to just with a bit of rework. I think there are advantages to maintaining multiple stacks however.

Things clicked for me when I realized I could use one stack for computing expressions, as well as storing locals and arguments. The second stack stored procedure frames, which captured a ‘frame pointer’ and a ‘return location’. The top procedure frame on the stack always refers to the function currently being executed.

In my system, the frame pointer is somewhat conventional, and procedure arguments can be found immediately bellow it on the stack, and local variables can be found immediately above it on the stack. This makes code generation simple, as accessing locals and arguments, becomes independent of the head of the stack, and each other.

During the course of execution the stack may look as follows:

08 scratch <-- top of stack is scratch for expression
07 var0    <-- FP (top)
06 arg1
05 arg0
04 scratch <-- may have scratch space if call part of incomplete expression
03 var1
02 var0    <-- FP (older)
01 arg1
00 arg0

Scratch space is the term I’m assigning to the fragments of a not yet fully evaluated expression. For something like ‘a+b’, both a and b will live on the stack before they are combined via the addition. If a function is called as part of an expression, these fragments will still hand around until it returns, and evaluation resumes.
My stack machine has two instructions GETV and SETV that provide everything I need to work with arguments and locals. Lets looks at the operation of GETV:

    GETV <int32_t operand>:
    1. index = frame.fp + operand
    2. push(stack[index])

GETV will index the stack relative to FP+<operand> and place that element on the top of the stack. This has the effect that operands >= 0 refer to locals and operands < 0 refer to arguments.

    SETV <int32_t operand>:
    1. index = frame.fp + operand
    2. stack[index] = pop()

SETV is much the same, however it pops the top item on the stack and overwrites the element on the stack that it indexes.

With these two instructions, my code can easily read and write arguments and locals. Perfect!

Unlike a regular register ISA, local variables are not allocated on the stack by moving the stack pointer, all that that is needed to allocate a local on the stack is to push an item on the stack and not pop it. This has the effect of incrementing the top of the stack, reserving space which can now be indexed relative to the fp. In my mini language, a variable declaration is preceded by the ‘var’ keyword which makes it really easy to know that we don’t need to pop anything after this statement has finished. It will happily sit there on the stack and can now be assigned to and read from.

One more trouble point is returning from a function… how do we know how to clean up the stack?

I took the easy route and made my RET instruction fairly complex. RET takes one operand, which is the number of items to discard from the stack. In this manner, RET can discard all local variables and arguments, similar to a callee save function. The parser knows how many arguments and locals have been provided, so is becomes trivial to emit this instruction.

There was one kink in this plan however… If we want to return a value from a function, how do we do that. Since the return value will be sitting on the top of the stack at the point that we return we cant simply discard all of the top elements.

My RET function operates as follows:

RET <int32_t operand>
    1. save the top item on the stack (return value)
    2. discard <operand> items from the stack
    3. push the previously saved top item
    4. pc = frame.return_value
    5. discard the top procedure frame

That is a fairly meaty RET function, but I didn’t spend too long thinking about it and it seemed justified.

To simplify things in my compiler I enforce that every function must return a value, which defaults to 0 if no return statement is issued in the program. This is crude and implemented by issuing a { CONST 0, RET } instruction pair at the end of every function.

I believe this is the same trick that LUA uses. It my seem wasteful, but it keeps things simple stupid.

So to conclude, how good is my mini language. Well…

…Its crap…

…but it works…

…and I learned a ton!
I would chalk this up as a complete success.

The code is highly smelly, but hey! this was a weekend project. Its up on github.

Kentucky route zero – Behind the scenes

September 30, 2015 by BitHack

I had a bit of fun here making several renders using the assets from the beautiful Kentucky Route Zero by Cardboard Computer; the aesthetic of which captivates me immensely. (click for hi-res)

I used 3DRipper to grab a frame from the game in .obj format, and then went at it with Cinema4D. The game plays out in pseudo 2D, which presented a challenge since a scene seems to be composed of many flat 2d layers as shown below. This could be me misunderstanding 3DRipper and its dumped post projection, but I imagine such a tool should correct for that and transform back into modelview space before dumping.

I’m still not sure how they achieve their flat shaded 2D look. I imagine they could disconnect a model based on color groups and set their vertex colour, or use UV’s to index a colour texture atlas.

The modeling of Conway (the main character) is fantastic. I have so much to learn.

Grep’n for macros

August 18, 2015 by BitHack

A colleague of mine spent a fair bit of time using grep to track down some functions in a very large body of source code. His greps were turning up a blank, and after some time discovered the functions were generated by macros at compile time.

Its fun to think that this could have been solved a little quicker if he had grep’d the output of the preprocessor.

gcc -E source_file.c | grep -B3 -A3 "thing to search for"

Pico-8

May 5, 2015 by BitHack

Pico-8 is a fantasy console from the creator of Voxatron http://www.lexaloffle.com/. Its a virtual console, similar in many respects to the game boy color, since it features tile/sprite based graphics, 128×128 screen, 4 channel synth.

One part of the Pico-8 that is particularly cool is that games written for it can be distributed as regular PNG images. Here are two examples:

Its not immediately obvious how the programs are stored inside the images themselves, however it seems this is a fine example of steganography:

Steganography (US ⁱ/ˌstɛ.ɡʌnˈɔː.ɡrʌ.fi/, UK /ˌstɛɡ.ənˈɒɡ.rə.fi/) is the practice of concealing a file, message, image, or video within another file, message, image, or video. The word steganography combines the Ancient Greek words steganos (στεγανός), meaning “covered, concealed, or protected”, and graphein (γράφειν) meaning “writing”.

A simple scheme for hiding data in pictures is to hijack the least significant bits of the color channels, and use them to transport the data, which will have the lowest visually perceivable impact. I had a hunch that was what was happening here.

The quickest way I could think to validate my hypothesis was to take two cartridges (the above two) and blend them using difference mode in Artweaver. I hoped that would cut out some of the cartridge surround, and make the data somewhat more visible. I also upped the contrast for good measure so those least significant bits move towards the more significant bits and become brighter.

Here is the result:

Structured noise is readily apparent, and is most certainly the hidden program data. Taking a quick look at the range of colors present in the noise, I would guess that two bits of each color channel are being hijacked, and for RGBA that would allow one full byte to be hidden per pixel. At an image size of 160×205 that would allow ~32k of data to be stored. That middle band of data also seems very random, leading me to guess it may be compressed.

I might one day whip up a program to try and extract and decode the actual data from these cartridges.

I love the idea of a fantasy virtual console, and I love the idea of distributing picture “cartridges” with their programs embedded. Hats off to lexaloffle, I think this is just fantastic stuff.

Simple safe printf snippet

May 1, 2015 by BitHack

A little prototype for an idea I had to use some C++11 features to make printf a bit more safe.


#include <initializer_list>
#include <stdio.h>

struct val_t {

    enum type_t { 

        e_int, 
        e_float, 
        e_cstr 
    } type_;

    union {

        int          int_;
        float        float_;
        const char * cstr_;
    };

    val_t( int v )          : type_( e_int   ), int_  ( v ) {}
    val_t( float v )        : type_( e_float ), float_( v ) {}
    val_t( const char * v ) : type_( e_cstr  ), cstr_ ( v ) {}
};

bool print( const char * fmt, std::initializer_list<val_t> vargs ) {

    // itterate over the strings
    for ( ; *fmt;fmt++ ) {
        // check for format specifier
        if ( *fmt == '%' ) {
            // argument index
            int index = fmt[1] - '0';
            if (index < 0 || index >= int(vargs.size()))
                return false;
            // access argument
            const val_t & v = vargs.begin()[index];
            switch (v.type_) {
            case (val_t::e_int  ): printf("%d", v.int_   ); break;
            case (val_t::e_float): printf("%f", v.float_ ); break;
            case (val_t::e_cstr ): printf("%s", v.cstr_  ); break;
            default:
                return false;
            }
            // skip argument index
            fmt++;
        }
        else {
            // output character
            putchar(*fmt);
        }
    }
    return true;
}

int main( void ) {
    print( "%0 and %1, %0, %2", { 3, 0.1f, "hello world" } );
    getchar();
    return 0;
}

The output is:
3 and 0.100000, 3, hello world

OpenGL Rasterizer – Part1 – The Setup

February 14, 2015 by BitHack

I love software rendering and I also love pushing myself to learn new thing. True two myself, I hatched a crazy idea a number of weeks ago. Could I find a game that uses a small subset of OpenGL, and implement it myself using software rasterization for all of the drawing operations. I had read a lot about OpenGL but I lacked practical experience with it, and theory and practice rarely overlap completely. This project would force me to learn OpenGL in every detail and from an unusual perspective.

The game I picked was Quake2, since it was one of the first OpenGL enabled games, is very well documented, open source and still actively developed. Quake2 uses a small subset of OpenGL1.1 which is entirely immediate mode and about as minimal as I could have found.
I chose the yquake2 fork, a very clean highly portable, 64bit compatible version of the quake2 engine.

I checked out the repo, built the source via mingw using their makefiles and soon had my own 64bit executable to become my victim.

The first thing I needed was an idea of just how much GL I would have to be implementing. I fired up dependency walker, so that I could have a look at what functions Quake2 would be importing from OpenGL32.dll. Just 59 functions, from an api which contains hundreds. That was a good start, and I was sure that not all of them would be needed before I could have something up and running on my screen.

My first task was to create an OpenGL32.dll that satisfies yquake2’s import requirements, and place it in the same directory which will cause yquake2 to load my OpenGL instead.

Function can be exported from a dll easily as follows:

extern "C" {
__declspec( dllexport ) void __stdcall glEnable( GLenum );
}

There are a few things going on here which I will explain. C++ uses a mechanism known as name mangling, where it will modify a functions symbolic name to append details of its arguments and nesting to avoid name collisions with other functions. I am sure its much more complicated then this but I’m not overly familiar with the details of mangling. The extern “C” directive tells the compiler this function is externally visible outside of its compilation unit and should use C style name conventions, which are relatively unmangled, but I will come back to this in just a moment.

The next interesting part is the __declspec( dllexport ) which instructs the compiler to add this function to the dll’s export table. As windows loads a programs it tries to resolve all functions in the executables dll import table, which involves walking the export table of the required dll and looking for a matching function. If a match is found then that dll import is said to be resolved.

On windows, the OpenGL API uses __stdcall calling conventions just like the rest of the windows api. A calling convention is an agreement between two functions, the caller and the callee, about how they will share registers and stack space during a function call. __cdecl is the default calling convention used by visual studio so I have to explicitly state that want to use a different one.

After I had exported all 59 opengl functions in this way, I fired up dependency walker so that I could inspect my dll’s export table. Unfortunately all of my functions were still being mangled in the C convention. C mangling will prepend and ‘_’ at the beginning of a function and append the sum number of bytes of all arguments. This mangling can be circumvented using a .def file, which tells the linker how to construct the export table of a dll. For my purposes the .def file need only contain a list of function names, and they will be exported without any mangling applied.

The .def file is no more complicated then this:

EXPORTS
glAccum
glActiveTexture
glAlphaFunc
…

The work flow I would need is something like follows:

Write code
Compile dll
Copy to yquake2 folder
Execute yquake2
Attach and debug

Visual studio allows us to specify the debug target when you execute a project, which is handy because I cant execute my dll directly. I the debug target to yquake2, so that when it launch quake2 which in turn will load my dll, at which point visual studio will resolve the debug symbols for it, allowing me to debug the code in my dll normally.

One tricky point in the workflow is the copy step, where I needed to take my freshly compiled dll and transfer it to the yquake2 folder. This would be a real bottle neck to my development and be really annoying. Some friends at work suggested the perfect solutions…. symlinks.

Windows has the command ‘mklink’ which will create a file in the current directory but that file actually resides elsewhere on disk. I could use symlinks to effectively place an OpenGL32.dll in the same directory as yquake2 but have it point to the opengl32.dll in my compilation directory.

The command for this is:

cd d:/my_yquake2_dir/bin
mklink opengl32.dll d:/my_project_dir/bin/opengl32.dll

Thus my work flow would be reduced to the following fairly easy steps:

Write code
Compile dll
Debug project

At this stage I had an stub OpenGL32.dll file which satisfies the windows loader, and a neat work flow for my development. Now the real development could begin. The video below shows some early progress of my OpenGL implementation.

My current implementation moves far beyond this and I will try to document the steps I went through, adding features such as:

Adding threading and screen binning.
Perspective correct texture mapping.
Opengl blend modes.
SSE vectorization.
Avoiding combinatorial explosion.

Here are some great links that were invaluable during my development:

the-barycentric-conspiracy
http://www.fabiensanglard.net/quake2/index.php
http://www.dependencywalker.com/
https://github.com/yquake2/yquake2
https://technet.microsoft.com/en-us/library/cc753194.aspx

In the next article I will explain why OpenGL is not enough by itself, and why I needed to dip into the windows API. Stay tuned.

Paramatric Hexapod Animation Controller

October 28, 2014 by BitHack

I made a fun procedural animation controller for a hexapod like creature. I plan to soon write some detailed articles about how it works, the inverse kinematic algorithm, etc.

I’m likely going to combine this with my multi agent path finding algorithm, perhaps even as part of a procedural RTS for the proc jam.

Bit-Hacks Pixel Dungeon

Development log from the bit-hack