Wednesday, June 26, 2013

Implementing a generic single value container in c++

In my previous post I explained the format of the script system for ZEngine. Each Puzzle has a Results section which essentially stores function names and their arguments:
results {
    action:assign(5985, 0)
    background:music:5252(1 a000h1tc.raw 1)    
I wanted to be able to store each action inside a struct, and then have a linked list of all the structs. However, the problem is that both the number of arguments and the size of the arguments are variable. Marisa Chan's solution was to store all the arguments in a space delimited char array. IE:
char arguments[25] = "1 a00h1tc.raw 1";

Simple, but not without it's problems.
  1. Since the char array is in a struct, the size is fixed. In order to make sure we never overflow, we have to allocate a fairly large array. That said, in this particular case, each 'large' array in this case would only be ~30 bytes per struct.
  2. By storing everything as strings, we put off parsing till the action function is actually called. At first glace, this doesn't seem too bad, since the data will have to be parsed anyway. However, this method forces it to be parsed at every call to that action function.

Another option was to have everything stored in a linked list of void pointers. However, I don't think I need to convince anyone that void pointers are just gross and using them would be just asking for problems.

What I really wanted was a typed way to store a variably typed (and sized) value. Therefore I created what I'm calling the "Object" class. (I'm up for suggestions for a better name)

The heart of the class is a union that stores a variety of pointers to different types and an enum that defines what type is being stored:
class Object {
    enum ObjectType : byte {

    ObjectType _objectType;

    union {
        bool *boolVal;
        byte *byteVal;
        int16 *int16Val;
        uint16 *uint16Val;
        int32 *int32Val;
        uint32 *uint32Val;
        float *floatVal;
        double *doubleVal;
        Common::String *stringVal;
    } _value;
_objectType keeps track of what type of data the object is storing and _value points to the actual data. If _value were instead to hold the actual data value, the union would be forced to sizeof(Common::String), which is quite large (~34 bytes), due to internal caching. Then we're back to the argument of storing things in containers much larger than what they need. By putting the data on the heap and only storing pointers to the data, we save the wasted space, but at the CPU cost of heap allocation.

Now that the data is stored, how do we get it back? My original idea was to have implicit cast operators:
operator bool();
operator byte();
operator int16();
However, LordHoto, one of the GSoC mentors and ScummVM developers, brought my attention to the problems that can arise when using implicit casting. For example, a user could try to cast the data to a type that wasn't stored in the Object and the cast would work, but the data would be completely corrupted. Also, from a user point of view, it wasn't intuitive.

Therefore, I removed the cast operators and created accessor methods:
bool getBoolValue(bool *returnValue) const;
bool getByteValue(byte *returnValue) const;
bool getInt16Value(int16 *returnValue) const;

bool Object::getBoolValue(bool *returnValue) const {
    if (_objectType !=  BOOL) {
        warning("'Object' not of type bool.");
        return false;

    *returnValue = *_value.boolVal;
    return true;
This adds a layer of type semi-protection to the class.

Lastly, I added assigment operators to the class, but rather than making this post even longer, I'll just link the full source here and here.

Advantages of 'Object' class
  • Can store relatively 'any' type of data. (Any type not currently supported could be trivially added)
  • Only uses as much space as needed.
  • Transforms dynamically typed data into a statically typed 'box' that can be stored in arrays, linked lists, hashmaps, etc. and can be iterated upon
Disadvantages of 'Object' class
  • Adds a small memory overhead per object. ( 1 byte + sizeof(Operating System pointer) )
  • Adds one heap memory allocation per object

So is it better than Marisa Chan's implementation? It really depends on what you define as better. While it does save memory, only requires data to be parsed once, and, in my opinion, adds a great deal of elegance to handling the Results arguments, it does so at the cost of heap storage. Not only the cost of the initial allocation, but the cost of potential defragmentation runs. But then again, is the cost of heap storage really that big, especially since the data should have a relatively long life? (On average, the time an end user spends in a room in the game) That I don't know, since it all depends on the memory allocator implementation.

In the end, I believe both methods perform well, and as such I choose the eloquence of using the 'Object' class. I am very much open to your thoughts on both the class as a whole or on your take of the problem. Also, if I misspoke about something please, please, please let me know.

Thanks for reading and have fun coding,

Edit: Upon further inspection I noticed that by using Common::String I'm not only negating any memory size benefits from using 'Object', but potentially even using more memory, since Common::String has such a huge size.
Marisa Chan:
char arguments[25] = "1 a00h1tc.raw 1";
size = 25;

Object arg1 = 1;
Object arg2 = "a00h1tc.raw";
Object arg3 = 1;

size = (3 *sizeof(Object)) + sizeof(byte) + sizeof(Common::String) + sizeof(byte);
size = 15 + 1 + 34 + 1;
size = 51;
I could instead store the data in a char array, but it would mean that the Object class would need a stringLength member, which adds another 1 - 4 bytes on every instance of the class. Even with this new insight, I think I will continue to use 'Object', again for the added eloquence it adds. The memory difference is still rather small.


  1. How I remember, in original game engine parser creates action objects(with abstract base class) with constructor that parse arguments and store it in internal variables.
    Yeah, original game engine was writen on c++ with classes and other things. Why I created it like C code, because I don't know how OOP code looks in assmbler and I want's to write it with minimal use of memory + optimized rendering speed.

  2. If you're just using this for per-room puzzle results then indeed it probably doesn't matter too much. And as I said before, it's the kind of thing you worry about later. :)

    Note that you can avoid the heap memory allocation for non-string objects by just putting them directly in the union: you end up using a little more memory on the base class on 32-bit architectures if you do it with double too (since that is (almost universally) 8 bytes), but the overhead of the individual allocations would be more than that.

    Also -- while I'm rambling -- I imagine that your object type member is likely to end up actually using 4/8 bytes whether you try and force it to be a byte or not, whether it gets padded in the class itself or whether it gets padded by the allocator.

    1. Ooooo, good ideas! And, yea... I totally forgot about alignment and padding.

  3. Yes, in the original engine each action is a seperate class which parses itself and stores the arguments in normal class members. It then has a virtual method (something like "execute") which then does the actual work.

    1. Hmmm, that could work nicely. But damn that's that a lot of classes... lol

    2. However it's good practice and classes will contain two/three functions - constructor and execute action. After GSoC I want to make changes for up level of compatibility with original engine (I found some new things but I don't want to clog your brain).
      Making results actions and controls as classes - it's right way :)

  4. Remember Knuth's immortal words: premature optimization is the root of all evil.
    Especially in something not performance critical as this.