random.h is a header only random number generation library for games, image processing and procedural content generation.

A few years back I wrote some handy floating point random number generation routines for a raytracer.  Time and again, I have needed these routines for other programs I have been working on.  It seems like a good idea to wrap them up in a small header and release them in the public domain for others to use too.

Each RNG function takes an 64bit integer as a seed value and permutes it in place.  The resulting bits are used for the generation of the result.  I like this design as it keeps the RNG functions stateless and thread safe.  You just have to keep track of the uint64_t you have been using as your RNG stream.

Included are various 1, 2 and 3 dimentional random number generators with a range of different distributions.

Triangular random noise is very usefull when dithering audio and images during the process of bitdepth conversion.  Typicaly, triangular noise is added to the signal immediately before type casting (at the same magnitude as the LSB of the destination bit depth).

The 2d and 3d vector RNG functions generate statisticaly unbiased vectors inside and on the unit circle or sphere (Generating vectors by randomizing the components and normalizing would bias the vector to the corners of unit cube).  I have used these functions successfully in the past for generating ambient occlusion maps.

I wrote a little program to generate some images to display the output distribution of each function.  A 3d point is generated by filling its component from the RNG function under test, and the resulting point is projected in space, as well as projected on each component axis (the red, blue and green planes).  Hopefully the images help a little to convey the distribution of each function.

Again, you can find the library here:

randfu() - unsigned random 
range [0,1]


randfs() - signed random 
range [-1,+1]


trandfs() - triangular random 
range[-1,+1], tends to 0


grandfs() - gaussian random 
range ~[-1,+1], tends to 0


vrand3d() - 3d vector random 
magnitude range: [0, 1]


nvrand3d() - normalized 3d vector random 
magnitude = 1


vrand2d() - 2d vector random
magnitude range: [0,1]


nvrand2d() - normalized 2d vector random
magnitude = 1


Binary Heap

Despite lack of activity here I have been very busy programming cool stuff which I hope to get around documenting and sharing with you all.

I need to implement the A* algorythm for one of my projects. Its a fairly common to speed up of the implemention to use a binary heap for the open list because of its O(log(n)) insertion and O(1) removal properties.  As I code it up I thought it would be nice to share my binary heap implementation, something which can be a bit fiddly to get right in practice.  I wrote a little fuzz tester to nuke it, so that I can be fairly sure it works correctly.

Gist: BinaryHeap

I chose to make the size fixed at compile time for simplicity, and becaues I am happy to terminate a pathfinding search that exceeds a fixed open list size, as old school games used to do.  It should however not require much work at all to grow the heap size when a node is pushed when currently at full capacity.

I will also leave you with this picture from my latest project:


Coroutines, x64 and Visual Studio

I really like co-routines, finding them really useful for programming game AI and scripting.  Since visual studio has dropped support for inline assembly when compiling x64 code I thought I would be out of luck trying to implement co-routines for this target.  Fortunately visual studio can still compile assembly files using masm in x64 mode.  In order to implement co-routines I had to familiarities myself with x64 calling conventions, something that has changed quite a bit from x86.  The first 4 arguments are passed on the stack, there are many more callee save registers, and more obviously all registers and pointers have been extended to 64bits.

Below is a small demo showing how to implement co-routines in visual studio for x64 targets.

#include <stdio.h>

typedef unsigned long long u64;
typedef void * user_t;
typedef void *stack_t;

typedef void (*cofunc_t)( stack_t *token, user_t arg );

/* coroutine functions */
extern "C" void yield_( stack_t *token );
extern "C" void enter_( stack_t *token, user_t arg );

/* artificial stack */
const int nSize = 1024 * 1024;
static char stack[ nSize ] = { 0 };

/* prepare a coroutine stack */
void prepare( stack_t *token, void *stack, u64 size, cofunc_t func )
    u64 *s64     = (u64*)( (char*)stack + size);
         s64    -= 10;                 // 10 items exist on stack
         s64[0]  = 0;                  // R15
         s64[1]  = 0;                  // R14
         s64[2]  = 0;                  // R13
         s64[3]  = 0;                  // R12
         s64[4]  = 0;                  // RSI
         s64[5]  = 0;                  // RDI
         s64[6]  = (u64) s64 + 64;     // RBP
         s64[7]  = 0;                  // RBX
         s64[8]  = (u64) func;         // return address
         s64[9]  = (u64) yield_;       // coroutine return address
          *token = (stack_t) s64;      // save the stack for yield

/* coroutine function */
void threadFunc( stack_t *token, user_t arg )
    for ( int i=0; i<10; i++ )
        printf( "  coroutine %d\n", i );
        yield_( token );

/* program entry point */
int main( )
    stack_t token = nullptr;

    /* prepare the stack */
    prepare( &token, stack, nSize, threadFunc );
    /* enter the coroutine */
    enter_( &token, (void*)0x12345678 );
    /* simple test loop */
    for ( int i=0; i<10; i++ )
        printf( "main thread %d\n", i );
        yield_( &token );
    /* program done */
    printf( "program exit\n" );
    getchar( );



;---- ---- ---- ---- ---- ---- ---- ----
; coroutine yield function
;   : void yield_( void * token );
;   'token' -&amp;gt; RCX
yield_ proc

    push RBX
    push RBP
    push RDI
    push RSI
    push R12
    push R13
    push R14
    push R15

    mov  RAX ,  RSP
    mov  RSP , [RCX]
    mov [RCX],  RAX

    pop R15
    pop R14
    pop R13
    pop R12
    pop RSI
    pop RDI
    pop RBP
    pop RBX


yield_ endp

;---- ---- ---- ---- ---- ---- ---- ----
; enter a co-routine
;   : void enter_( void * token, void * arg1, ... );
;   'token'     -&amp;gt; RCX
;   'arg1, ...' -&amp;gt; RDX, R8, and R9
enter_ proc

    jmp yield_

enter_ endp


Fuzzy Compression


I recenty programmed a very strange and almost worthless image compressor for fun.

The image on the left is 230400 bytes and when reduced to greyscale takes up 76800 bytes.  The image on the right was reconstructed from 3072 bytes… thats 4% of the original size.  The compression time was somewhere in the region of an hour running on a single core.  It is very very lossy.

The process is fairly cool however.  The compressed image is reconstructed by performing 512 random walks over the screen, each leaving a pixel trail behind.  If the start locations and random number seeds for each random walk is computed correctly then the final result can be trained to produce any desired image.  The training I used for this demo was a very basic genetic algorythm / progressive refinement technique.  I imagine if left to its own devices to learn for a few days, the output image could look much cleaner, but I dont have that kind of patience.

The concept has been done before and much better:

see here: http://www.pouet.net/prod.php?which=62917

It seems the creator of this demo managed to fit that into 256 bytes including the unpacker.  Here are some of the things he says about his demo:

There are 64 pseudo random brush strokes of cycling color. Each stroke is shorter than previous to increase details. Intro code generates set of random numbers starting from initial 32bit seed with Galois LFSRs as a PRNG. This set is used to control direction of brush movement and and starting position. Each stroke has initial 16bit value that influences the seed. Therefore data is 4 bytes (initial seed) + 64*2 bytes of stroke PRNG seeds. Picture of Mona Lisa (3072 bytes) was compressed to those random brush strokes by external optimization program (a few days of work for modern CPU/GPU combo). The process can be seen as lossy compression (23x) of picture into random brush movements.

The seeds were found with iterative brute-force approach. Each brush stroke is a layer which covers the previous one, therefore is mostly independent in calculations. For each layer there is a 16bit seed to be found that generates the best picture possible. Therefore there are 64 layers * 65536 possible values per layer and the program iterates through all of them (4 million of combinations – acceptable). This should be done for each initial 32bit seed (means *4 billion combinations) but can be performed in parallel. I’ve checked only a few thousands of randomly chosen initial seeds to find a nice looking picture. Details mask had to be used to multiply difference error on critical parts of the picture (eyes, nose, mouth). Otherwise too many details were put in the background.

Minecraft 4K C++ Port

A while ago I discovered Minecraft4k, a java demo written by Notch fitting into a 4kb object file.  He used software raycasting to rasterize a large 3d array of voxels. The original demo can be played here as an applet:


At the same time I found that Notch had made a Javascript port of the same basic engine, minus user input.


One awesome thing about these two demos is that all of the textures are synthesized proceduraly for space saving reasons.

Seeing his work, I got a strong urge to port the code to C++ and SDL.

Grab the source code here:


Playing with MicroThreads

I recently bought Game Programming Gems 2 and was captivated by an article titled ‘Micro-Threads for Game Object AI’.

A really neat technique is describes that can performing very fast cooperative multitasking, also sometimes referred to as co routines.  The heart of this technique involves manipulating the processor stack pointer to switch between multiple contexts of execution.  Its very similar in concept to true multitasking, however each thread decides when to share the processor.  The concept is simple enough, but really so powerful and radically different from many other techniques for AI I have used.  I knew I would have to program up a text to play with this idea.  Using this method, each object in my game can pretend it is running in its own thread and I am sure this can find application elsewhere too.

The article provides a nice description, but doesn’t provide enough information to directly copy an implementation, and my copy arrived without the CD, but that is fine however and I am having fun of working it out my own.  I have made a few projects in the past which manipulated the CPU registers to perform various tricks, so I can build upon that knowledge.

My first implementation appeared to operate correctly until exit where my call to delete[] threw up an assert.  This is because the fake stack was getting sprayed with bad data somehow from within my virtual thread. I will be spending lots of time in the debugger until I find out the cause.

Update 1:

I believe that the debug run time version of printf() is writing past the end of the 1kb stack, causing a buffer overflow. Hmm….

update 2:

I went back and read the article all over again and there were many hints that I simply didn’t have enough stack space at all.  The author also proposed a very nice improvement, where we effectively extend the stack size by copying each micro thread into much larger stack before execution.  Thus the Micro Thread code can make use of a large stack space, but individually they only take up a small amount of space.  After execution, we extract the smaller part of the stack again.  This places the limitation that _yield() cant be called deep in some code in the thread, but that is hardly a restriction at all.

Here is my demo code for anyone that want to have a look at its implementation.  I wrapped it up with a little API as well so it should be somewhat readable.


update 3:

My brain must have been thinking of improvements overnight since I woke up with three ideas I had to experiment with. In each case it was to address the stability and protect against stack overflows. The most major improvement is the use of VirtualProtect(), which is used here to set the bottom of the shared stack region to a guard page. This wastes 4k of stack space, but it is well worth it since now a nice exception is thrown if we write to the stack anywhere in the bottom 4k.

The second improvement, is to check the value of the stack pointer upon return from a micro thread. If it falls within the space allocated to each thread then everything is fine, otherwise we will need to allocate more stack space for that thread. If this check fails, and esp lies outside of our thread space, then we can be sure that we wont be saving all of that threads state during the task switch. That would lead to very very bad things.

A final improvement is to address what happen when the micro thread function returns. Currently, there is no return address on the stack, so execution could jump literally anywhere.
The solution is to ensure that we have a valid address for this function to return to. We can set the return address to a custom function that can handle the situation on one of two ways. We can setup the stack and processor state so that it will execute the micro thread function once more, thus it will feel like our micro thread function is permanently called. Alternatively we can mark that thread as finished, switch back to our main thread, and then kill the micro thread cleanly. Since both options have merit, I will implement both of then and let the user specify the handling during thread creation.

update 4:

I am now playing with Vectored Exception handlers to see if I can recover from stack overflows, and automatically terminate a malfunctioning thread.  I have confirmed that an exception is being thrown when I intentionally overflow the stack from within a micro thread.  It shouldn’t be too hard to switch the context back from within a vectored exception handler.  I remember having played around with something similar, having self modifying code from an exception handler if a debugger is detected.  According to the original article, it states that structured exceptions no longer work correctly from micro threads since no handlers are on the stack, perhaps this is something I can look into addressing also.

update 5:

Here is the latest version of the library provided as a source release.  The vectored exception handler really works amazingly well, terminating any micro thread that attempts to overwrite its stack.  If a micro thread returns from its function, the thread will terminate as well.  There are many more stability improvements and validity check making this quite a viable option for use in any project now.  I have included a small test program also so that the functionality can be demonstrated.


Lode Storm – 7 Day RTS Compo

I have for the last few days been squinting in the dark at my monitor in the early hours of the morning.  I couldn’t resist entering this months Mini Ludum Dare since the challenge was to make a RTS game in 7 days.  An RTS game is something that I have always have wanted to make, from a technical point of view as much as anything.  I spent countless hours playing C&C, Dune2 and dungeon keeper in my childhood and the AI in this genre in particular always fascinated me.

My initial research on RTS AI directed me to some great papers written about potential fields and their use in the RTS genre.  This is a method I had read about previously, but these papers really gave me the vision for just how powerful they could be.  The papers can be found a the bottom of this great article at AIGameDev. http://aigamedev.com/open/tutorials/potential-fields/.

All the programming I had done on my framework is now being tested to the limit for this project, and so far (touch wood) it hasn’t failed me. Additionally, I am particularly proud of my efforts on this project because I have made all of the graphics myself from scratch and this really is my most complete game to date, since I usually get sidetracked while working on the technical details of a project.  Provided below is a screenshot of the current work in progress.  Several potential fields are being displayed for debugging by eye in the lower left corners.  Top left shows the fog of war maps for both the human player and the AI player respectively.


Binding Tiny C Compiler

Years ago I found a interesting library called TinyCC, which provided a simple and small C compiler with one major twist…  it can be embedded inside another application.  Because of this, a program can compile new functions for itself at run time and call them using function pointers.  In a sense it can be used like an immensely powerful scripting language, with the benefit that there is no speed hit due to interpretation or emulation.  Its weakness as a scripting language is that you have to account for portability and security problems due to the execution of these ‘scripts’ running as native code on the host processor.

So far, I have used TinyCC for a few projects in the past and recently in a texture generator as well as my text adventure engine.  The reason for this post however, is that TinyCC is not the most intuitive thing to embed into a windows application, so I wanted to take a moment to explain one clean method to do so.

There is no static library for use with visual C++ so that option is out, however since a fully compiled DLL file is included in the download package, it seemed that it would be easiest to dynamically link the DLL through code at run time.  This requires loading the DLL and requesting all of the functions that are exported from it which is a fairly easy operation on any platform and generally differnt OS’s have much the same interface for doing so. The only tricky part is actually specifying all of the annoying function pointer casts correctly during binding.

Without farther talk, here is some straight forward code to load TinyCC and bind the interface for you. Which I hope it should all be fairly straight forward and easy to understand.



Let the adventures begin

Recently my girlfriend asked me if I would help her program a text adventure.  She has wanted to learn the program in C for a long time and also has been looking for a project we could work on together.  I thought this was an awesome idea, so I set about programming a basic shell of an adventure engine for her to work with.  I wanted to take the load off her so she can focus on developing the story rather then dealing with blitting, messaging, etc.

I managed to pack the entire DOS font of 256 characters in to 3k of space and embed it into the program, which was pretty cool, so each bit encodes only one pixel.  I could perhaps have achieved more using RLE but there really isn’t any need to go that far.

My library emulated a console screen of 40×20 characters.  It has a cursor that can be moved, text and symbols can be written to the screen and it handles user input,  so in short, all of the basics for a text adventure.  There is no scrolling implemented yet however since that didn’t seem necessary.  Conceivably an awesome rogue like could be easily written with this library.

Plain text however just seems too digital and static.  I wanted something more like and old analogue CRT monitor with a little too much interference, and misaligned deflection coils.  So a second achievement was to finally get around to writing my ‘ghetto blur’ as suggested long ago for my yet unnamed dungeon game.  I managed to do this efficiently using some cool tricks, using only ANDs, ORs and shifts.  It looks fairly retro and below you can see it for yourself, however the magic is seeing it animated as it subtly flickers and wobbles.