Kentucky route zero – Behind the scenes

kentuck_route_zero_rendI had a bit of fun here making several renders using the assets from the beautiful Kentucky Route Zero by Cardboard Computer; the aesthetic of which captivates me immensely. (click for hi-res)

render7I used 3DRipper to grab a frame from the game in .obj format, and then went at it with Cinema4D.  The game plays out in pseudo 2D, which presented a challenge since a scene seems to be composed of many flat 2d layers as shown below.  This could be me misunderstanding 3DRipper and its dumped post projection, but I imagine such a tool should correct for that and transform back into modelview space before dumping.

I’m still not sure how they achieve their flat shaded 2D look.  I imagine they could disconnect a model based on color groups and set their vertex colour, or use UV’s to index a colour texture atlas.

wireThe modeling of Conway (the main character) is fantastic.  I have so much to learn.


Grep’n for macros

A colleague of mine spent a fair bit of time using grep to track down some functions in a very large body of source code.  His greps were turning up a blank, and after some time discovered the functions were generated by macros at compile time.

Its fun to think that this could have been solved a little quicker if he had grep’d the output of the preprocessor.

gcc -E source_file.c | grep -B3 -A3 "thing to search for"


Pico-8 is a fantasy console from the creator of Voxatron  Its a virtual console, similar in many respects to the game boy color, since it features tile/sprite based graphics, 128×128 screen, 4 channel synth.

One part of the Pico-8 that is particularly cool is that games written for it can be distributed as regular PNG images.  Here are two examples:


Its not immediately obvious how the programs are stored inside the images themselves, however it seems this is a fine example of steganography:

Steganography (US Listeni/ˌstɛ.ɡʌnˈɔː.ɡrʌ.fi/, UK /ˌstɛɡ.ənˈɒɡ.rə.fi/) is the practice of concealing a file, message, image, or video within another file, message, image, or video. The word steganography combines the Ancient Greek words steganos (στεγανός), meaning “covered, concealed, or protected”, and graphein (γράφειν) meaning “writing”.

A simple scheme for hiding data in pictures is to hijack the least significant bits of the color channels, and use them to transport the data, which will have the lowest visually perceivable impact.  I had a hunch that was what was happening here.

The quickest way I could think to validate my hypothesis was to take two cartridges (the above two) and blend them using difference mode in Artweaver.  I hoped that would cut out some of the cartridge surround, and make the data somewhat more visible.  I also upped the contrast for good measure so those least significant bits move towards the more significant bits and become brighter.

Here is the result:


Structured noise is readily apparent, and is most certainly the hidden program data.  Taking a quick look at the range of colors present in the noise, I would guess that two bits of each color channel are being hijacked, and for RGBA that would allow one full byte to be hidden per pixel.  At an image size of 160×205 that would allow ~32k of data to be stored.  That middle band of data also seems very random, leading me to guess it may be compressed.

I might one day whip up a program to try and extract and decode the actual data from these cartridges.

I love the idea of a fantasy virtual console, and I love the idea of distributing picture “cartridges” with their programs embedded.  Hats off to lexaloffle, I think this is just fantastic stuff.

Simple safe printf snippet

A little prototype for an idea I had to use some C++11 features to make printf a bit more safe.

#include <initializer_list>
#include <stdio.h>

struct val_t {

    enum type_t { 

    } type_;

    union {

        int          int_;
        float        float_;
        const char * cstr_;

    val_t( int v )          : type_( e_int   ), int_  ( v ) {}
    val_t( float v )        : type_( e_float ), float_( v ) {}
    val_t( const char * v ) : type_( e_cstr  ), cstr_ ( v ) {}

bool print( const char * fmt, std::initializer_list<val_t> vargs ) {

    // itterate over the strings
    for ( ; *fmt;fmt++ ) {
        // check for format specifier
        if ( *fmt == '%' ) {
            // argument index
            int index = fmt[1] - '0';
            if (index < 0 || index >= int(vargs.size()))
                return false;
            // access argument
            const val_t & v = vargs.begin()[index];
            switch (v.type_) {
            case (val_t::e_int  ): printf("%d", v.int_   ); break;
            case (val_t::e_float): printf("%f", v.float_ ); break;
            case (val_t::e_cstr ): printf("%s", v.cstr_  ); break;
                return false;
            // skip argument index
        else {
            // output character
    return true;

int main( void ) {
    print( "%0 and %1, %0, %2", { 3, 0.1f, "hello world" } );
    return 0;

The output is:

3 and 0.100000, 3, hello world

OpenGL Rasterizer – Part1 – The Setup

I love software rendering and I also love pushing myself to learn new thing.  True two myself, I hatched a crazy idea a number of weeks ago.  Could I find a game that uses a small subset of OpenGL, and implement it myself using software rasterization for all of the drawing operations.  I had read a lot about OpenGL but I lacked practical experience with it, and theory and practice rarely overlap completely.  This project would force me to learn OpenGL in every detail and from an unusual perspective.

The game I picked was Quake2, since it was one of the first OpenGL enabled games, is very well documented, open source and still actively developed.  Quake2 uses a small subset of OpenGL1.1 which is entirely immediate mode and about as minimal as I could have found.
I chose the yquake2 fork, a very clean highly portable, 64bit compatible version of the quake2 engine.

I checked out the repo, built the source via mingw using their makefiles and soon had my own 64bit executable to become my victim.

The first thing I needed was an idea of just how much GL I would have to be implementing.  I fired up dependency walker, so that I could have a look at what functions Quake2 would be importing from OpenGL32.dll.  Just 59 functions, from an api which contains hundreds.  That was a good start, and I was sure that not all of them would be needed before I could have something up and running on my screen.

My first task was to create an OpenGL32.dll that satisfies yquake2’s import requirements, and place it in the same directory which will cause yquake2 to load my OpenGL instead.

Function can be exported from a dll easily as follows:

extern "C" {
__declspec( dllexport ) void __stdcall glEnable( GLenum );

There are a few things going on here which I will explain.  C++ uses a mechanism known as name mangling, where it will modify a functions symbolic name to append details of its arguments and nesting to avoid name collisions with other functions.  I am sure its much more complicated then this but I’m not overly familiar with the details of mangling. The extern “C” directive tells the compiler this function is externally visible outside of its compilation unit and should use C style name conventions, which are relatively unmangled, but I will come back to this in just a moment.

The next interesting part is the __declspec( dllexport ) which instructs the compiler to add this function to the dll’s export table.  As windows loads a programs it tries to resolve all functions in the executables dll import table, which involves walking the export table of the required dll and looking for a matching function.  If a match is found then that dll import is said to be resolved.

On windows, the OpenGL API uses __stdcall calling conventions just like the rest of the windows api. A calling convention is an agreement between two functions, the caller and the callee, about how they will share registers and stack space during a function call. __cdecl is the default calling convention used by visual studio so I have to explicitly state that want to use a different one.

After I had exported all 59 opengl functions in this way, I fired up dependency walker so that I could inspect my dll’s export table.  Unfortunately all of my functions were still being mangled in the C convention.  C mangling will prepend and ‘_’ at the beginning of a function and append the sum number of bytes of all arguments.  This mangling can be circumvented using a .def file, which tells the linker how to construct the export table of a dll.  For my purposes the .def file need only contain a list of function names, and they will be exported without any mangling applied.

The .def file is no more complicated then this:


The work flow I would need is something like follows:

Write code
Compile dll
Copy to yquake2 folder
Execute yquake2
Attach and debug

Visual studio allows us to specify the debug target when you execute a project, which is handy because I cant execute my dll directly.  I the debug target to yquake2, so that when it launch quake2 which in turn will load my dll, at which point visual studio will resolve the debug symbols for it, allowing me to debug the code in my dll normally.

One tricky point in the workflow is the copy step, where I needed to take my freshly compiled dll and transfer it to the yquake2 folder.  This would be a real bottle neck to my development and be really annoying.  Some friends at work suggested the perfect solutions…. symlinks.

Windows has the command ‘mklink’ which will create a file in the current directory but that file actually resides elsewhere on disk.  I could use symlinks to effectively place an OpenGL32.dll in the same directory as yquake2 but have it point to the opengl32.dll in my compilation directory.

The command for this is:

cd d:/my_yquake2_dir/bin
mklink opengl32.dll d:/my_project_dir/bin/opengl32.dll

Thus my work flow would be reduced to the following fairly easy steps:

Write code
Compile dll
Debug project

At this stage I had an stub OpenGL32.dll file which satisfies the windows loader, and a neat work flow for my development.  Now the real development could begin.  The video below shows some early progress of my OpenGL implementation.

My current implementation moves far beyond this and I will try to document the steps I went through, adding features such as:

Adding threading and screen binning.
Perspective correct texture mapping.
Opengl blend modes.
SSE vectorization.
Avoiding combinatorial explosion.

Here are some great links that were invaluable during my development:


In the next article I will explain why OpenGL is not enough by itself, and why I needed to dip into the windows API.  Stay tuned.

Paramatric Hexapod Animation Controller

I made a fun procedural animation controller for a hexapod like creature. I plan to soon write some detailed articles about how it works, the inverse kinematic algorithm, etc.

I’m likely going to combine this with my multi agent path finding algorithm, perhaps even as part of a procedural RTS for the proc jam.

Prototyping for Procjam 2014

Procjam (Procedural Generation Game Jam) is coming up very soon, and I’m going to take part.  I have started to prototype some things to see if my idea is feasible.  So far I have been playing around with different ways to software render a unit sphere.  I’ve got some funky ideas for these things.


Rapid multi agent pathfinding

I have been trying to design a good path finding algorithm that will support a large number of agents by default. A* is not very good at supporting more then once agent at a time, where their paths may intersect, and can require lots of fudges to get it to work correctly in these situations.  Also when A* is implemented over a grid based world, it can require additional processing to produce non grid aligned paths (path smoothing).

For this demo, I threw A* out of the window and took a potential field approach.  I used bi-linear interpolation when sampling the potential field so that each agent wouldn’t have to be grid aligned. The results are really cool, and I can simulate upward of 1000 agents without any performance hit. Perfect for a nice hack and slash dungeon crawler.

Coroutines, x64 and Visual Studio

I really like co-routines, finding them really useful for programming game AI and scripting.  Since visual studio has dropped support for inline assembly when compiling x64 code I thought I would be out of luck trying to implement co-routines for this target.  Fortunately visual studio can still compile assembly files using masm in x64 mode.  In order to implement co-routines I had to familiarities myself with x64 calling conventions, something that has changed quite a bit from x86.  The first 4 arguments are passed on the stack, there are many more callee save registers, and more obviously all registers and pointers have been extended to 64bits.

Below is a small demo showing how to implement co-routines in visual studio for x64 targets.

#include <stdio.h>

typedef unsigned long long u64;
typedef void * user_t;
typedef void *stack_t;

typedef void (*cofunc_t)( stack_t *token, user_t arg );

/* coroutine functions */
extern "C" void yield_( stack_t *token );
extern "C" void enter_( stack_t *token, user_t arg );

/* artificial stack */
const int nSize = 1024 * 1024;
static char stack[ nSize ] = { 0 };

/* prepare a coroutine stack */
void prepare( stack_t *token, void *stack, u64 size, cofunc_t func )
    u64 *s64     = (u64*)( (char*)stack + size);
         s64    -= 10;                 // 10 items exist on stack
         s64[0]  = 0;                  // R15
         s64[1]  = 0;                  // R14
         s64[2]  = 0;                  // R13
         s64[3]  = 0;                  // R12
         s64[4]  = 0;                  // RSI
         s64[5]  = 0;                  // RDI
         s64[6]  = (u64) s64 + 64;     // RBP
         s64[7]  = 0;                  // RBX
         s64[8]  = (u64) func;         // return address
         s64[9]  = (u64) yield_;       // coroutine return address
          *token = (stack_t) s64;      // save the stack for yield

/* coroutine function */
void threadFunc( stack_t *token, user_t arg )
    for ( int i=0; i<10; i++ )
        printf( "  coroutine %d\n", i );
        yield_( token );

/* program entry point */
int main( )
    stack_t token = nullptr;

    /* prepare the stack */
    prepare( &token, stack, nSize, threadFunc );
    /* enter the coroutine */
    enter_( &token, (void*)0x12345678 );
    /* simple test loop */
    for ( int i=0; i<10; i++ )
        printf( "main thread %d\n", i );
        yield_( &token );
    /* program done */
    printf( "program exit\n" );
    getchar( );



;---- ---- ---- ---- ---- ---- ---- ----
; coroutine yield function
;   : void yield_( void * token );
;   'token' -&amp;gt; RCX
yield_ proc

    push RBX
    push RBP
    push RDI
    push RSI
    push R12
    push R13
    push R14
    push R15

    mov  RAX ,  RSP
    mov  RSP , [RCX]
    mov [RCX],  RAX

    pop R15
    pop R14
    pop R13
    pop R12
    pop RSI
    pop RDI
    pop RBP
    pop RBX


yield_ endp

;---- ---- ---- ---- ---- ---- ---- ----
; enter a co-routine
;   : void enter_( void * token, void * arg1, ... );
;   'token'     -&amp;gt; RCX
;   'arg1, ...' -&amp;gt; RDX, R8, and R9
enter_ proc

    jmp yield_

enter_ endp