Friday, September 13, 2013

I never knew moving a lever could be so hard

In the Zork games, there are switches/twist knobs/turn-tables:

These are all controlled by the Lever control:
control:624 lever {
    descfile(knocker.lev)
    cursor(handpt)
}

The knocker.lev file looks like this:
animation_id:631~
filename:te2ea21c.rlf~
skipcolor:0~
anim_coords:200 88 343 315~
mirrored:1~
frames:11~
elsewhere:0 0 511 319~
out_of_control:0 0 511 319~
start_pos:0~
hotspot_deltas:42 39~
0:241 252 D=1,90 ^=P(0 to 1) P(1 to 0) P(0 to 1) P(1 to 0) E(0)~
1:234 260 D=2,90 D=0,270 ^=P(1 to 0) E(0)~
2:225 258 D=3,90 D=1,270 ^=P(2 to 0) P(0 to 1) P(1 to 0) E(0)~
3:216 255 D=4,90 D=2,270 ^=P(3 to 0) P(0 to 1) P(1 to 0) E(0)~
4:212 234 D=5,90 D=3,270 ^=P(4 to 0) P(0 to 2) P(2 to 0) E(0)~
5:206 213 D=6,90 D=4,270 ^=P(5 to 0) P(0 to 3) P(3 to 0) E(0)~
6:212 180 D=7,90 D=5,270 ^=P(6 to 0) P(0 to 3) P(3 to 0) E(0)~
7:214 147 D=8,90 D=6,270 ^=P(7 to 0) P(0 to 4) P(4 to 0) E(0)~
8:222 114 D=9,90 D=7,270 ^=P(8 to 0) P(0 to 5) P(4 to 0) E(0)~
9:234 106 D=10,90 D=8,270 ^=P(9 to 0) P(0 to 5) P(4 to 0) E(0)~
10:234 98 D=9,270~
  • animation_id is unused.
  • filename refers to the animation file used.
  • skip color is unused.
  • anim_coords refers to the location the control will be rendered
  • mirrored says that the reverse of the animation is appended to the end of the file. Ex: 0, 1, 2, 3, 3, 2, 1, 0
  • frames refers to how many animation frames there are (If mirrored = 1, frames = animationFile::frameCount / 2)
  • elsewhere is unused
  • out_of_control is unused
  • start_pos refers to the first animation frame used by the control
  • hotspot_deltas refers to the width and height of the hotspots used to grab a control with the mouse

The last section is a bit tricky. It's formatted like so:
`
[frameNumber]:[hotspotX] [hotspotY] D=[directionToFrame],[directionAngle] .....(potentially more directions) ^=P([from] to [to]) P([from] to [to]) ... (potentially more return paths) E(0)~
 

  • frameNumber corresponds the animationFile frame that should be displayed when the lever is in that state
  • hotspotX is the X coordinate of the hotspot rectangle in which the user can grab the control
  • hotspotY is the Y coordinate of the hotspot rectangle in which the user can grab the control

D refers to "Direction". Let's say we're at frame 0. D=1,90 means: "To get to frame 1, the mouse needs to be moving at a 90 degree angle." (I'll cover how the angles work in a bit)

P refers to "Path". This is what frames should be rendered after the user lets go of a control. For example, lets say we let go of the knocker at frame 6. The .lev file reads: ^=P(6 to 0) P(0 to 3) P(3 to 0). This says to render every frame from 6 to 0, then every frame from 0 to 3, then every frame from 3 to 0. So written out:
6, 5, 4, 3, 2, 1, 0, 0, 1, 2, 3, 3, 2, 1, 0

This allows for some cool effects such as the knocker returning to the lowest position and bouncing as though it had gravity.


So what is that angle I was talking about? It refers to the direction the mouse is moving while the user is holding down left mouse button.
So let's go over a typical user interaction:
  1. User hovers over the control. The cursor changes to a hand.
  2. User presses down the left mouse button
  3. Test if the mouse is within the current frame's hotspot
  4. If so, begin a drag:
    1. Calculate the distance between the last mouse position and the current
    2. If over 64 (a heuristic), calculate the angle. (Only calculating the angle when we're sufficiently far from the last mouse position saves calculations as well as makes the lever less "twitchy"
    3. Test the angle against the directions
    4. If one passes, render the new frame
  5. User moves a couple more times
  6. User releases the left mouse button
    1. Follow any return paths set out in the .lev file


And that's it! Let me know if you have any questions or comments. The full source code can be found here and here. Until next time

-RichieSams

One frame at a time

So, we're entering into the final weekend before the soft pencil's down for GSoC. It's been a very busy couple of weeks since university here in the US started 3 weeks ago. So I've been juggling homework, labs, and working on this. But you're not here to for that, so let's launch into the actual post:

Animations in-game come in two formats: AVI and a custom format called RLF. AVI is simple because I can use the ScummVM AVI decoder. But I had to reverse engineer the RLF file format so it can be played as a video or frame by frame.

Before I go into the format of the file, I want to explain the general schema of animations, or more specifically, video frame compression techniques. (For another reference, the article here is pretty good) The first frame of an animation has to include every pixel, aka, no compression. These frames are called I-frames or key frames. For the next frame, we could store every pixel, but that seems kind of wasteful. Instead, we store only the pixels that changed between the last frame and this frame. These are called P-frames. Optionally, a frame can also store the pixels that changed between the next frame and this frame. These are called B-frames. This allows animations to be played forwards or backwards. With P-frames, we can only go forwards. In order to seek within an animation, we have to find the closest I-frame, then add P/B-frames until we're at the frame we want. To make this less painful for long animations, video encoders insert I-frames every 120 frames or so. (About 1 I-frame every 5 seconds. Assuming 24 fps, 24 * 5 = 120).

RLF files only use I-frames and P-frames. If they need to go backwards, the whole animation is encoded both forwards and backwards. For example: 0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0
It seems pretty wasteful in my opinion, but that's what they do.

The RLF file starts off with a header to describe the information it contains:
bool RlfAnimation::readHeader() {
    if (_file.readUint32BE() != MKTAG('F', 'E', 'L', 'R')) {
        return false;
    }

    // Read the header
    _file.readUint32LE();                // Size1
    _file.readUint32LE();                // Unknown1
    _file.readUint32LE();                // Unknown2
    _frameCount = _file.readUint32LE();  // Frame count

    // Since we don't need any of the data, we can just seek right to the
    // entries we need rather than read in all the individual entries.
    _file.seek(136, SEEK_CUR);

    //// Read CIN header
    //_file.readUint32BE();          // Magic number FNIC
    //_file.readUint32LE();          // Size2
    //_file.readUint32LE();          // Unknown3
    //_file.readUint32LE();          // Unknown4
    //_file.readUint32LE();          // Unknown5
    //_file.seek(0x18, SEEK_CUR);    // VRLE
    //_file.readUint32LE();          // LRVD
    //_file.readUint32LE();          // Unknown6
    //_file.seek(0x18, SEEK_CUR);    // HRLE
    //_file.readUint32LE();          // ELHD
    //_file.readUint32LE();          // Unknown7
    //_file.seek(0x18, SEEK_CUR);    // HKEY
    //_file.readUint32LE();          // ELRH

    //// Read MIN info header
    //_file.readUint32BE();          // Magic number FNIM
    //_file.readUint32LE();          // Size3
    //_file.readUint32LE();          // OEDV
    //_file.readUint32LE();          // Unknown8
    //_file.readUint32LE();          // Unknown9
    //_file.readUint32LE();          // Unknown10
    _width = _file.readUint32LE();   // Width
    _height = _file.readUint32LE();  // Height

    // Read time header
    _file.readUint32BE();                    // Magic number EMIT
    _file.readUint32LE();                    // Size4
    _file.readUint32LE();                    // Unknown11
    _frameTime = _file.readUint32LE() / 10;  // Frame time in microseconds

    return true;
}

The magic number 'FELR' refers to the run-length encoding used in the file. I'll explain the specifics later on. I'm kind of curious what all the extra information in the header is used for, so if you guys have any ideas, I'm all ears. The useful information is pretty self-explanatory.

After the header is the actual frame data. Each frame also has a header.
RlfAnimation::Frame RlfAnimation::readNextFrame() {
    RlfAnimation::Frame frame;

    _file.readUint32BE();                        // Magic number MARF
    uint32 size = _file.readUint32LE();          // Size
    _file.readUint32LE();                        // Unknown1
    _file.readUint32LE();                        // Unknown2
    uint32 type = _file.readUint32BE();          // Either ELHD or ELRH
    uint32 headerSize = _file.readUint32LE();    // Offset from the beginning of this frame to the frame data. Should always be 28
    _file.readUint32LE();                        // Unknown3

    frame.encodedSize = size - headerSize;
    frame.encodedData = new int8[frame.encodedSize];
    _file.read(frame.encodedData, frame.encodedSize);

    if (type == MKTAG('E', 'L', 'H', 'D')) {
        frame.type = Masked;
    } else if (type == MKTAG('E', 'L', 'R', 'H')) {
        frame.type = Simple;
        _completeFrames.push_back(_lastFrameRead);
    } else {
        warning("Frame %u doesn't have type that can be decoded", _lastFrameRead);
    }

    _lastFrameRead++;
    return frame;
}

If a frame is of type DHLE, it is a P-frame, if it is of type HRLE, it's an I-frame. We hold off decoding until we actually need to render the frame. This allows for less memory use.

So now we've read in all our data. How do we render a frame? The simplest case is to render the next frame. Note: _currentFrameBuffer is a Graphics::Surface that stores the current frame.
const Graphics::Surface *RlfAnimation::getNextFrame() {
    assert(_currentFrame + 1 < (int)_frameCount);

    if (_stream) {
        applyFrameToCurrent(readNextFrame());
    } else {
        applyFrameToCurrent(_currentFrame + 1);
    }

    _currentFrame++;
    return &_currentFrameBuffer;
}

void RlfAnimation::applyFrameToCurrent(uint frameNumber) {
    if (_frames[frameNumber].type == Masked) {
        decodeMaskedRunLengthEncoding(_frames[frameNumber].encodedData, (int8 *)_currentFrameBuffer.getPixels(), _frames[frameNumber].encodedSize, _frameBufferByteSize);
    } else if (_frames[frameNumber].type == Simple) {
        decodeSimpleRunLengthEncoding(_frames[frameNumber].encodedData, (int8 *)_currentFrameBuffer.getPixels(), _frames[frameNumber].encodedSize, _frameBufferByteSize);
    }
}

void RlfAnimation::applyFrameToCurrent(const RlfAnimation::Frame &frame) {
    if (frame.type == Masked) {
        decodeMaskedRunLengthEncoding(frame.encodedData, (int8 *)_currentFrameBuffer.getPixels(), frame.encodedSize, _frameBufferByteSize);
    } else if (frame.type == Simple) {
        decodeSimpleRunLengthEncoding(frame.encodedData, (int8 *)_currentFrameBuffer.getPixels(), frame.encodedSize, _frameBufferByteSize);
    }
}

The decode....() functions simultaneously decode the frame data we read in earlier, and then blit it directly on-top of the _currentFrameBuffer pixels. I'll explain the details of each function further down.

You might be wondering what the _stream variable refers to? I've created the RlfAnimation class so that it can decode in two different ways: it can load all the data from the file into memory and then do all decoding/blitting from memory, or it can stream the data from file, one frame at a time. The first option allows you to seek within the animation, but it uses quite a bit of memory (roughly the size of the file). The second option uses far less memory, but you can only play the animation forwards and can not seek.

On to the decoding functions:

I-frames contain every single pixel within a frame. Again, we could store every one of these, but that would be kind of expensive. So we use a simple compression algorithm called Run Length Encoding. (There are tons of frame compression algorithms out there. This is just the one they chose to use). Consider this image:


And then, let's choose a specific line of pixels:


If we were to encode each pixel we would need to store:
YYYYYYYBYYYYYYBYYYYYYY
where Y means yellow and B means black. That's a lot of repeated yellows. Lets instead store this:
7Y1B6Y1B
The numbers represent how many of the following pixels are of the same color. So the decoder would interpret that as: render 7 yellow pixels, 1 black pixel, 6 yellow pixels, 1 black pixel, then 7 yellow pixels.

The RLF files take this idea further. Consider this line of data, where G means green, R means red:
YYYYYBGRYBYGBYYYYYY
If we use the same method as before we get:
5Y1B1G1R1Y1B1Y1G1B6Y
It's almost as long as the original data! If a color doesn't have any repetition, using encoding actually takes up more space. To counter that, the RLF files do the following:
5Y-8BGRYBYGB6Y
If the number is negative, the next N pixels are copied directly to the destination. If it's positive, the next N pixels are filled with the color directly following the number.

Here's that algorithm in code form:
void RlfAnimation::decodeSimpleRunLengthEncoding(int8 *source, int8 *dest, uint32 sourceSize, uint32 destSize) const {
    uint32 sourceOffset = 0;
    uint32 destOffset = 0;

    while (sourceOffset < sourceSize) {
        int8 numberOfSamples = source[sourceOffset];
        sourceOffset++;

        // If numberOfSamples is negative, the next abs(numberOfSamples) samples should
        // be copied directly from source to dest
        if (numberOfSamples < 0) {
            numberOfSamples = ABS(numberOfSamples);

            while (numberOfSamples > 0) {
                if (sourceOffset + 1 >= sourceSize) {
                    return;
                } else if (destOffset + 1 >= destSize) {
                    return;
                }

                byte r, g, b;
                _pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);
                uint16 destColor = _pixelFormat565.RGBToColor(r, g, b);
                WRITE_UINT16(dest + destOffset, destColor);

                sourceOffset += 2;
                destOffset += 2;
                numberOfSamples--;
            }

        // If numberOfSamples is >= 0, copy one sample from source to the 
        // next (numberOfSamples + 2) dest spots
        } else {
            if (sourceOffset + 1 >= sourceSize) {
                return;
            }

            byte r, g, b;
            _pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);
            uint16 sampleColor = _pixelFormat565.RGBToColor(r, g, b);
            sourceOffset += 2;

            numberOfSamples += 2;
            while (numberOfSamples > 0) {
                if (destOffset + 1 >= destSize) {
                    return;
                }

                WRITE_UINT16(dest + destOffset, sampleColor);
                destOffset += 2;
                numberOfSamples--;
            }
        }
    }
}

To encode the P-frames, we use a similar method as above. Remember that P-frames are partial frames. They only include the pixels that changed from the last frame. An example pixel line could look like this, where O is a placeholder for empty space:
OOOOBRGOOYYBRGOOOOO
To encode this we do the following:
4-3BRG2-5YYBRG5
If the number read is positive, the next N pixels should be skipped. If the number is negative, the next N pixels should be copied directly to the destination.

Here is that algorithm in code form:
void RlfAnimation::decodeMaskedRunLengthEncoding(int8 *source, int8 *dest, uint32 sourceSize, uint32 destSize) const {
    uint32 sourceOffset = 0;
    uint32 destOffset = 0;

    while (sourceOffset < sourceSize) {
        int8 numberOfSamples = source[sourceOffset];
        sourceOffset++;

        // If numberOfSamples is negative, the next abs(numberOfSamples) samples should
        // be copied directly from source to dest
        if (numberOfSamples < 0) {
            numberOfSamples = ABS(numberOfSamples);

            while (numberOfSamples > 0) {
                if (sourceOffset + 1 >= sourceSize) {
                    return;
                } else if (destOffset + 1 >= destSize) {
                    return;
                }

                byte r, g, b;
                _pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);
                uint16 destColor = _pixelFormat565.RGBToColor(r, g, b);
                WRITE_UINT16(dest + destOffset, destColor);

                sourceOffset += 2;
                destOffset += 2;
                numberOfSamples--;
            }

        // If numberOfSamples is >= 0, move destOffset forward ((numberOfSamples * 2) + 2)
        // This function assumes the dest buffer has been memset with 0's.
        } else {
            if (sourceOffset + 1 >= sourceSize) {
                return;
            } else if (destOffset + 1 >= destSize) {
                return;
            }

            destOffset += (numberOfSamples * 2) + 2;
        }
    }
}

Whew! Almost there. The last thing to talk about is frame seeking. This requires that you're not streaming directly from disk. (Well, you could do it, but it would probably be more trouble than it was worth). As we read in the frames, we stored which frames were I-frames. So to seek to a frame, we iterate through that list of I-frames and find the I-frame closest to our destination frame. Then we use applyFrameToCurrent() to move from the I-frame to the destination frame:
void RlfAnimation::seekToFrame(int frameNumber) {
    assert(!_stream);
    assert(frameNumber < (int)_frameCount || frameNumber >= -1);

    if (frameNumber == -1) {
        _currentFrame = -1;
        return;
    }

    int closestFrame = _currentFrame;
    int distance = (int)frameNumber - _currentFrame;
    for (Common::List<uint>::const_iterator iter = _completeFrames.begin(); iter != _completeFrames.end(); iter++) {
        int newDistance = (int)frameNumber - (int)(*iter);
        if (newDistance > 0 && (closestFrame == -1 || newDistance < distance)) {
            closestFrame = (*iter);
            distance = newDistance;
        }
    }

    for (int i = closestFrame; i <= frameNumber; i++) {
        applyFrameToCurrent(i);
    }

    _currentFrame = frameNumber;
}

That's it! If you want to look at the full class, you can find it here and here. And as always, if you have ANY questions, feel free to comment. Happy coding!

-RichieSams