tag:blogger.com,1999:blog-40167903570961569342024-03-12T19:38:37.881-05:00RichieSam's Adventures in Code-villeRichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-4016790357096156934.post-34408855225231775872016-03-14T22:53:00.000-05:002016-07-26T12:29:33.753-05:00Woes of the number 1.0So, here I am, programming away on my path tracer. All is well in the world. When all of a sudden, I notice that after ~3 seconds of rendering (it's a progressive path tracer), black pixels start popping up on the screen. (One on the green sphere, a whole chain to the left of the bottom blue sphere, etc.)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://imgur.com/s1QyLky.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="360" src="https://imgur.com/s1QyLky.png" width="640" /></a></div>
<br />
<br />
Uhhhh, WHAT?!? WHY?!?!?<br />
<br />
This is really odd, since a progressive path tracer is pretty much a running average, so how the hell can I get a perfectly black pixel pop up out of nowhere? So, I slapped a conditional breakpoint on the framebuffer color splat, using the exact pixel coordinates of the black pixels, and sure enough, NaN color.<br />
<br />
Ok, that's all fine and dandy, but how am I getting a NaN color from the integrator? "Okay, look for all the things that can make a NaN......... AHA!!"<br />
<br />
<pre style="background: white; color: black; font-family: "consolas"; font-size: 13; overflow-x: scroll; padding: 3px 8px;"><span style="color: green;">// Get the new ray direction</span>
<span style="color: green;">// Choose the direction based on the material</span>
<span style="color: blue;">float</span> pdf;
<span style="color: #216f85;">float3a</span> normal = <span style="color: #850000;">normalize</span>(ray.Ng);
<span style="color: #216f85;">float3a</span> wi = material-><span style="color: #850000;">Sample</span>(ray.dir, normal, sampler, &pdf);
<span style="color: green;">// Accumulate the diffuse/specular weight</span>
weights <span style="color: #850000;">=</span> weights <span style="color: #850000;">*</span> material-><span style="color: #850000;">Eval</span>(wi, <span style="color: #850000;">normalize</span>(ray.dir), normal) <span style="color: #850000;">/</span> pdf;</pre>
<br />
That divide by pdf looks suspicious. If pdf is zero, then weights would be NaN. Ok, so let's dive into Sample()<br />
<br />
<pre style="background: white; color: black; font-family: "consolas"; font-size: 13; overflow-x: scroll; padding: 3px 8px;"><span style="color: green;">/**</span>
<span style="color: green;"> * Creates a random direction in the hemisphere defined by the normal, weighted by a cosine lobe</span>
<span style="color: green;"> *</span>
<span style="color: green;"> * Based on http://www.rorydriscoll.com/2009/01/07/better-sampling/</span>
<span style="color: green;"> *</span>
<span style="color: green;"> * @param wi The direction of the incoming light</span>
<span style="color: green;"> * @param normal The normal that defines the hemisphere</span>
<span style="color: green;"> *</span>
<span style="color: green;"> * @param sampler The sampler to use for internal random number generation</span>
<span style="color: green;"> * @return A cosine weighted random direction in the hemisphere</span>
<span style="color: green;"> */</span>
<span style="color: #216f85;">float3a</span> <span style="color: #850000;">Sample</span>(<span style="color: #216f85;">float3a</span> wi, <span style="color: #216f85;">float3a</span> normal, <span style="color: #216f85;">UniformSampler</span> *sampler, <span style="color: blue;">float</span> *pdf) <span style="color: blue;">override</span> {
<span style="color: green;">// Create random coordinates in the local coordinate system</span>
<span style="color: blue;">float</span> rand = sampler-><span style="color: #850000;">NextFloat</span>();
<span style="color: blue;">float</span> r = <span style="color: #216f85;">std</span>::<span style="color: #850000;">sqrtf</span>(rand);
<span style="color: blue;">float</span> theta = sampler-><span style="color: #850000;">NextFloat</span>() * 6.28318530718f <span style="color: green;">/* 2 PI */</span>;
<span style="color: blue;">float</span> x = r * <span style="color: #216f85;">std</span>::<span style="color: #850000;">cosf</span>(theta);
<span style="color: blue;">float</span> y = r * <span style="color: #216f85;">std</span>::<span style="color: #850000;">sinf</span>(theta);
<span style="color: blue;">float</span> z = <span style="color: #216f85;">std</span>::<span style="color: #850000;">sqrtf</span>(1.0f - x * x - y * y);
<span style="color: green;">// Find an axis that is not parallel to normal</span>
<span style="color: #216f85;">float3</span> majorAxis;
<span style="color: blue;">if</span> (<span style="color: #850000;">abs</span>(normal.x) < 0.57735026919f <span style="color: green;">/* 1 / sqrt(3) */</span>) {
majorAxis <span style="color: #850000;">=</span> <span style="color: #216f85;">float3</span>(1, 0, 0);
} <span style="color: blue;">else</span> <span style="color: blue;">if</span> (<span style="color: #850000;">abs</span>(normal.y) < 0.57735026919f <span style="color: green;">/* 1 / sqrt(3) */</span>) {
majorAxis <span style="color: #850000;">=</span> <span style="color: #216f85;">float3</span>(0, 1, 0);
} <span style="color: blue;">else</span> {
majorAxis <span style="color: #850000;">=</span> <span style="color: #216f85;">float3</span>(0, 0, 1);
}
<span style="color: green;">// Use majorAxis to create a coordinate system relative to world space</span>
<span style="color: #216f85;">float3</span> u = <span style="color: #850000;">normalize</span>(<span style="color: #850000;">cross</span>(normal, majorAxis));
<span style="color: #216f85;">float3</span> v = <span style="color: #850000;">cross</span>(normal, u);
<span style="color: #216f85;">float3</span> w = normal;
<span style="color: green;">// Transform from local coordinates to world coordinates</span>
<span style="color: #216f85;">float3</span> direction = <span style="color: #850000;">normalize</span>(u <span style="color: #850000;">*</span> x <span style="color: #850000;">+</span>
v <span style="color: #850000;">*</span> y <span style="color: #850000;">+</span>
w <span style="color: #850000;">*</span> z);
*pdf = <span style="color: #850000;">dot</span>(direction, normal) * <span style="color: #6f008a;">M_1_PI</span>;
<span style="color: blue;">return</span> direction;
}</pre>
<br />
It creates a cosine-weighted random direction in the hemisphere. Hmmm, the only way for pdf to be zero is if dot(direction, normal) is zero. Aka, the new direction is completely perpendicular to the normal.<br />
<br />
Ok, so if direction is perpendicular to normal, then z == 0.0 (since x, y, z are locale coordinates relative to the normal, where the normal == z). How can we get z == 0.0?<br />
<br />
Like this:<br />
<pre style="background: white; color: black; font-family: "consolas"; overflow-x: scroll; padding: 3px 8px; font-size: 13;"><span style="color: blue;">float</span> rand = sampler-><span style="background: white; color: #850000;">NextFloat</span>(); </pre>
NextFloat() will return a float in the range [0.0, 1.0]. So, let's suppose it returns 1.0<br />
<pre style="background: white; color: black; font-family: "consolas"; font-size: 13;"><span style="color: blue;">float</span> r = <span style="color: #216f85;">std</span>::<span style="color: #850000;">sqrtf</span>(rand);
</pre>
r = 1.0, since sqrt(1.0) == 1.0<br />
<br />
<pre style="background: white; color: black; font-family: "consolas"; overflow-x: scroll; padding: 3px 8px; font-size: 13;"><span style="color: blue;">float</span> theta = sampler-><span style="color: #850000;">NextFloat</span>() * 6.28318530718f <span style="color: green;">/* 2 PI */</span>;
<span style="color: blue;">float</span> x = r * <span style="color: #216f85;">std</span>::<span style="color: #850000;">cosf</span>(theta);
<span style="color: blue;">float</span> y = r * <span style="color: #216f85;">std</span>::<span style="color: #850000;">sinf</span>(theta);
<span style="color: blue;">float</span> z = <span style="color: #216f85;">std</span>::<span style="color: #850000;">sqrtf</span>(1.0f - x * x - y * y);</pre>
Since r = 1.0, x * x + y * y will always equal 1.0, so z == 0.0<br />
<br />
This is really annoying! It makes perfect sense for 1.0 to be a valid random number, but it completely destroys this particular algorithm....<br />
<br />
So, we can either check for 1.0 and reject it, or redefine our random number generator to only generate on [0.0, 1.0). Sample() is going to get called ALOT, so adding a branch is no fun. Granted, branch prediction is going to give us a big help, but still seems kind of gross.<br />
<br />
So I went for the latter approach, and fixed the random number generator to [0.0, 1.0). This seems to be the standard for other mathematical things, so perhaps there are other algorithms that don't play well with 1.0?<br />
<br />
So in the end, random black pixels on the screen were caused by:
<br />
<ol>
<li>Random number generator procs a 1.0</li>
<li>The cosine-weighted sampler can't handle 1.0, causing the pdf to be 0.0</li>
<li>The pdf is later divided through the sample (as per Monte Carlo integration)</li>
<li>Which causes a NaN</li>
<li>And since anything added to NaN is NaN, the pixel is now permanently NaN.</li>
<li>When the frame buffer is passed to OpenGL to render, it interprets NaN as (0.0, 0.0, 0.0, 1.0)</li>
</ol>
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com1tag:blogger.com,1999:blog-4016790357096156934.post-42937872777264197672015-04-09T22:11:00.001-05:002015-04-24T11:35:45.269-05:00Making Our First Pretty PictureThis is the fourth post in a series documenting my adventures in creating a GPU path tracer from scratch<br />
1. <a href="http://richiesams.blogspot.com/2015/03/tracing-light-in-virtual-world.html">Tracing Light in a Virtual World</a><br />
2. <a href="http://richiesams.blogspot.com/2015/03/creating-randomness-and-acummulating.html">Creating Randomness and Accumulating Change</a><br />
3. <a href="http://richiesams.blogspot.com/2015/03/shooting-objects-from-across-way.html">Shooting Object from Across the Way</a><br />
<br />
<br />
So we're finally here! We've learned everything we need to actually implement a path tracer! First off, let me apologize for taking so long to get this post up. Things started getting busy at work again and I was in a car accident last week. So I've been recovering from that. Also, the implementation I showed a picture of last post turned out to be mathematically incorrect. (The current simulation is still far from physically correct, but I'll get to that later in the post) But, that's in the past now; let's get started.<br />
<br />
<br />
<h3>
A Quick Review</h3>
Before we launch into the code. let's first go over what we're trying to do. For each pixel we want to simulate the light that bouces around the scene in order to calculate a color. This calculation can be described by the Rendering Equation:<br />
\[L_{\text{o}}(p, \omega_{\text{o}}) = L_{e}(p, \omega_{\text{o}}) + \int_{\Omega} f(p, \omega_{\text{o}}, \omega_{\text{i}}) L_{\text{i}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right | d\omega_{\text{i}} \]<br />
Wow, that's looks intimidating. But once we break it down it's actually a pretty simple concept. In English, the equation is:<br />
<blockquote>
The outgoing light from a point equals the emmisive light from the point itself, plus all the incoming light attenuated by a function of the material (the <a href="http://en.wikipedia.org/wiki/Bidirectional_scattering_distribution_function">BSDF</a>) and the direction of the incoming light.</blockquote>
The integral is just a mathematical way of adding up all the incoming light coming from the hemisphere around the point.<br />
<br />
That said, once you start taking into account light reflected off of things, being absorbed by things, being refracted by things, etc, things start to become very complex. For almost all scenes, this integral can not be analytically solved.<br />
<br />
A very simple approximation of this integral is to only take into account the light that is directly coming from light sources. With this, the integral collapses into a simple sum over all the lights in the scene. Aka:<br />
\[L_{\text{o}}(p, \omega_{\text{o}}) = L_{e}(p, \omega_{\text{o}}) + \sum_{k = 0}^{n} f(p, \omega_{\text{o}}, \omega_{\text{i}}) L_{\text{i, k}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right |\]<br />
This is what Direct Lighting Ray Tracers do. They shoot rays from the point to each light, and test for obscurance between them.<br />
<br />
However, you can simplify it further by also ignoring the obscurance checks. That is, you light the pixel with the assumption that it is the only thing in the scene. This is exactly what most real-time renderers do. It is known as a<a href="http://penguin.ewu.edu/cscd570/2014/PDFNotes/GlobalLocalModels.pdf"> local lighting model</a>. It's extremely fast because each pixel can be lit independently; the incoming light at the pixel is determined only by the lights in the scene, not on the scene itself. However, the speed comes at a cost. By disregarding the scene, you lose all global illumination effects. Aka, shadows, light bleeding, indirect light, specular reflections, etc. These have to be approximated using tricks and/or hacks. (Shadow mapping, light baking, screen space reflections, etc.)<br />
<br />
<br />
Another way to approximate the integral is by using <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo Integration</a>.<br />
\[\int f(x) \, dx \approx \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{pdf(x_{i})}\]<br />
It says that you can approximate an integral by averaging successive random samples from the function. As $N$ gets large, the approximation gets closer and closer to the solution. $pdf(x_{i})$ represents the <a href="http://en.wikipedia.org/wiki/Probability_density_function">probability density function</a> of each random sample.<br />
<br />
Path tracing uses Monte Carlo Integration to solve the rendering equation. We sample by tracing randomly selected light paths around the scene.<br />
<br />
<br />
<h3>
The Path Tracing Kernel</h3>
I figure the best way to understand the path tracing algorithm is just to post the whole kernel, and then go over each line / function, explaining as we go. So here's the whole thing:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">PathTraceKernel</span>(<span style="color: blue;">unsigned</span> <span style="color: blue;">char</span> *textureData, <span style="color: #216f85;">uint</span> width, <span style="color: #216f85;">uint</span> height, <span style="color: #216f85;">size_t</span> pitch, <span style="color: #216f85;">DeviceCamera</span> *g_camera, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">SceneObjects</span> *g_sceneObjects, <span style="color: #216f85;">uint</span> hashedFrameNumber) {
<span style="color: green;">// Create a local copy of the arguments</span>
<span style="color: #216f85;">DeviceCamera</span> camera = *g_camera;
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">SceneObjects</span> sceneObjects = *g_sceneObjects;
<span style="color: green;">// Global threadId</span>
<span style="color: blue;">int</span> threadId = (blockIdx.<span style="color: purple;">x</span> + blockIdx.<span style="color: purple;">y</span> * gridDim.<span style="color: purple;">x</span>) * (blockDim.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">y</span>) + (threadIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">x</span>) + threadIdx.<span style="color: purple;">x</span>;
<span style="color: green;">// Create random number generator</span>
<span style="color: #216f85;">curandState</span> randState;
<span style="color: #880000;">curand_init</span>(hashedFrameNumber + threadId, 0, 0, &randState);
<span style="color: blue;">int</span> x = blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span> + threadIdx.<span style="color: purple;">x</span>;
<span style="color: blue;">int</span> y = blockIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">y</span> + threadIdx.<span style="color: purple;">y</span>;
<span style="color: green;">// Calculate the first ray for this pixel</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> ray = {camera.<span style="color: purple;">Origin</span>, <span style="color: #880000;">CalculateRayDirectionFromPixel</span>(x, y, width, height, camera, &randState)};
<span style="color: #216f85;">float3</span> pixelColor = <span style="color: #880000;">make_float3</span>(0.0f, 0.0f, 0.0f);
<span style="color: #216f85;">float3</span> accumulatedMaterialColor = <span style="color: #880000;">make_float3</span>(1.0f, 1.0f, 1.0f);
<span style="color: green;">// Bounce the ray around the scene</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> bounces = 0; bounces < 10; ++bounces) {
<span style="color: green;">// Initialize the intersection variables</span>
<span style="color: blue;">float</span> closestIntersection = <span style="color: #6f008a;">FLT_MAX</span>;
<span style="color: #216f85;">float3</span> normal;
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">LambertMaterial</span> material;
<span style="color: #880000;">TestSceneIntersection</span>(ray, sceneObjects, &closestIntersection, &normal, &material);
<span style="color: green;">// Find out if we hit anything</span>
<span style="color: blue;">if</span> (closestIntersection < <span style="color: #6f008a;">FLT_MAX</span>) {
<span style="color: green;">// We hit an object</span>
<span style="color: green;">// Add the emmisive light</span>
pixelColor += accumulatedMaterialColor * material.<span style="color: purple;">EmmisiveColor</span>;
<span style="color: green;">// Shoot a new ray</span>
<span style="color: green;">// Set the origin at the intersection point</span>
ray.<span style="color: purple;">Origin</span> = ray.<span style="color: purple;">Origin</span> + ray.<span style="color: purple;">Direction</span> * closestIntersection;
<span style="color: green;">// Offset the origin to prevent self intersection</span>
ray.<span style="color: purple;">Origin</span> += normal * 0.001f;
<span style="color: green;">// Choose the direction based on the material</span>
<span style="color: blue;">if</span> (material.<span style="color: purple;">MaterialType</span> == <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">MATERIAL_TYPE_DIFFUSE</span>) {
ray.Direction = CreateUniformDirectionInHemisphere(normal, &randState);
<span style="color: green;">// Accumulate the diffuse color</span>
accumulatedMaterialColor *= material.MainColor <span style="color: green;">/* * (1 / PI) <- this cancels with the PI in the pdf */</span> * dot(ray.Direction, normal);
<span style="color: green;">// Divide by the pdf</span>
accumulatedMaterialColor *= 2.0f; <span style="color: green;">// pdf == 1 / (2 * PI)</span>
} <span style="color: blue;">else</span> <span style="color: blue;">if</span> (material.<span style="color: purple;">MaterialType</span> == <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">MATERIAL_TYPE_SPECULAR</span>) {
ray.<span style="color: purple;">Direction</span> = <span style="color: #880000;">reflect</span>(ray.<span style="color: purple;">Direction</span>, normal);
<span style="color: green;">// Accumulate the specular color</span>
accumulatedMaterialColor *= material.<span style="color: purple;">MainColor</span>;
}
<span style="color: green;">// Russian Roulette</span>
<span style="color: blue;">if</span> (bounces > 3) {
<span style="color: blue;">float</span> p = <span style="color: #880000;">max</span>(accumulatedMaterialColor.<span style="color: purple;">x</span>, <span style="color: #880000;">max</span>(accumulatedMaterialColor.<span style="color: purple;">y</span>, accumulatedMaterialColor.<span style="color: purple;">z</span>));
<span style="color: blue;">if</span> (<span style="color: #880000;">curand_uniform</span>(&randState) > p) {
<span style="color: blue;">break</span>;
}
accumulatedMaterialColor *= 1 / p;
}
} <span style="color: blue;">else</span> {
<span style="color: green;">// We didn't hit anything, return the sky color</span>
pixelColor += accumulatedMaterialColor * <span style="color: #880000;">make_float3</span>(0.846f, 0.933f, 0.949f);
<span style="color: blue;">break</span>;
}
}
<span style="color: blue;">if</span> (x < width && y < height) {
<span style="color: green;">// Get a pointer to the pixel at (x,y)</span>
<span style="color: blue;">float</span> *pixel = (<span style="color: blue;">float</span> *)(textureData + y * pitch) + 4 <span style="color: green;">/*RGBA*/</span> * x;
<span style="color: green;">// Write pixel data</span>
pixel[0] += pixelColor.<span style="color: purple;">x</span>;
pixel[1] += pixelColor.<span style="color: purple;">y</span>;
pixel[2] += pixelColor.<span style="color: purple;">z</span>;
<span style="color: green;">// Ignore alpha, since it's hardcoded to 1.0f in the display</span>
<span style="color: green;">// We have to use a RGBA format since CUDA-DirectX interop doesn't support R32G32B32_FLOAT</span>
}
}</pre>
<br />
<br />
<h3>
Initializing all the Variables</h3>
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: green;">// Create a local copy of the arguments</span>
<span style="color: #216f85;">DeviceCamera</span> camera = *g_camera;
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">SceneObjects</span> sceneObjects = *g_sceneObjects;
<span style="color: green;">// Global threadId</span>
<span style="color: blue;">int</span> threadId = (blockIdx.<span style="color: purple;">x</span> + blockIdx.<span style="color: purple;">y</span> * gridDim.<span style="color: purple;">x</span>) * (blockDim.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">y</span>) + (threadIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">x</span>) + threadIdx.<span style="color: purple;">x</span>;
<span style="color: green;">// Create random number generator</span>
<span style="color: #216f85;">curandState</span> randState;
<span style="color: #880000;">curand_init</span>(hashedFrameNumber + threadId, 0, 0, &randState);</pre>
<br />
The first thing we do is create a local copy of some of the pointer arguments that are passed into the kernel. It means one less indirection when fetching from global memory. We fetch multiple variables from the camera and we fetch the scene objects many times.<br />
<br />
Next we calculate the global id for the thread and use it to create a random state so we can generate random numbers. We covered this in the <a href="http://richiesams.blogspot.com/2015/03/creating-randomness-and-acummulating.html">second post</a> of this series.<br />
<br />
<br />
<h3>
Shooting the First Ray</h3>
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: blue;">int</span> x = blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span> + threadIdx.<span style="color: purple;">x</span>;
<span style="color: blue;">int</span> y = blockIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">y</span> + threadIdx.<span style="color: purple;">y</span>;
<span style="color: green;">// Calculate the first ray for this pixel</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> ray = {camera.<span style="color: purple;">Origin</span>, <span style="color: #880000;">CalculateRayDirectionFromPixel</span>(x, y, width, height, camera, &randState)};</pre>
<br />
Then we shoot the initial ray from the camera origin through the pixel for this thread. We covered this in the <a href="http://richiesams.blogspot.com/2015/03/tracing-light-in-virtual-world.html">first post</a>. However, I added one thing to <span style="background-color: white;"><span style="color: #880000;">CalculateRayDirectionFromPixel</span><span style="color: black;">()</span></span>.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__device__</span> <span style="color: #216f85;">float3</span> <span style="color: #880000;">CalculateRayDirectionFromPixel</span>(<span style="color: #216f85;">uint</span> x, <span style="color: #216f85;">uint</span> y, <span style="color: #216f85;">uint</span> width, <span style="color: #216f85;">uint</span> height, <span style="color: #216f85;">DeviceCamera</span> &camera, <span style="color: #216f85;">curandState</span> *randState) {
<span style="color: #216f85;">float3</span> viewVector = <span style="color: #880000;">make_float3</span>((((x + <span style="color: #880000;">curand_uniform</span>(randState)) / width) * 2.0f - 1.0f) * camera.<span style="color: purple;">TanFovXDiv2</span>,
-(((y + <span style="color: #880000;">curand_uniform</span>(randState)) / height) * 2.0f - 1.0f) * camera.<span style="color: purple;">TanFovYDiv2</span>,
1.0f);
<span style="color: green;">// Matrix multiply</span>
<span style="color: blue;">return</span> <span style="color: #880000;">normalize</span>(<span style="color: #880000;">make_float3</span>(<span style="color: #880000;">dot</span>(viewVector, camera.<span style="color: purple;">ViewToWorldMatrixR0</span>),
<span style="color: #880000;">dot</span>(viewVector, camera.<span style="color: purple;">ViewToWorldMatrixR1</span>),
<span style="color: #880000;">dot</span>(viewVector, camera.<span style="color: purple;">ViewToWorldMatrixR2</span>)));
}</pre>
<br />
In the first post, we always shot the ray through the center of the pixel. ie: (x + 0.5, y + 0.5). Now we use a random number to jitter the ray within the pixel. Why would we do this? By jittering the camera ray, we effectively <a href="http://en.wikipedia.org/wiki/Supersampling">supersample </a>the scene. Since we're going to shoot the same number of rays either way, we effectively get supersampled anti-aliasing for free.<br />
<br />
Let's look at one of the renders from the third post with and without the jitter:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<a href="http://3.bp.blogspot.com/-KkFXZ6SK_4Q/VRMNPqX5f1I/AAAAAAAAATA/esDMYUg1yLo/s1600/no%2Bjitter.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://3.bp.blogspot.com/-KkFXZ6SK_4Q/VRMNPqX5f1I/AAAAAAAAATA/esDMYUg1yLo/s1600/no%2Bjitter.png" height="374" width="640" /></a><br />
<div style="text-align: center;">
<span style="font-size: x-small;">No jitter: Ewwwwwww. Look at all those jaggies...... </span></div>
<br />
<a href="http://1.bp.blogspot.com/-IEPQGQQ95mQ/VRMNPgKEb_I/AAAAAAAAAS8/Wq1BXrMNIME/s1600/with%2Bjitter.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://1.bp.blogspot.com/-IEPQGQQ95mQ/VRMNPgKEb_I/AAAAAAAAAS8/Wq1BXrMNIME/s1600/with%2Bjitter.png" height="374" width="640" /></a><br />
<div style="text-align: center;">
<span style="font-size: x-small;">With jitter: Silky smooth :)
</span></div>
<br />
<br />
<h3>
Start Bouncing!</h3>
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: #216f85;">float3</span> pixelColor = <span style="color: #880000;">make_float3</span>(0.0f, 0.0f, 0.0f);
<span style="color: #216f85;">float3</span> accumulatedMaterialColor = <span style="color: #880000;">make_float3</span>(1.0f, 1.0f, 1.0f);
<span style="color: green;">// Bounce the ray around the scene</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> bounces = 0; bounces < 10; ++bounces) {</pre>
<br />
Here we initialize two float3's used to accumulate color as we bounce. I'll cover them in detail below. Then we start a for loop the determines how many times we can bounce around the scene.<br />
<br />
<br />
<h3>
Shooting the Scene</h3>
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: green;">// Initialize the intersection variables</span>
<span style="color: blue;">float</span> closestIntersection = <span style="color: #6f008a;">FLT_MAX</span>;
<span style="color: #216f85;">float3</span> normal;
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">LambertMaterial</span> material;
<span style="color: #880000;">TestSceneIntersection</span>(ray, sceneObjects, &closestIntersection, &normal, &material);</pre>
<br />
We initialize the intersection variables and then shoot the ray through the scene. <span style="background-color: white;"><span style="color: #880000;">TestSceneIntersection</span><span style="color: black;">()</span></span> loops through each object in the scene and tries to intersect it. If it hits, it tests if the new intersection is closer than the current. If so, it replaces the current intersection with the new one. (And updates the normal / material).<br />
<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__device__</span> <span style="color: blue;">void</span> <span style="color: #880000;">TestSceneIntersection</span>(<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> &ray, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">SceneObjects</span> &sceneObjects, <span style="color: blue;">float</span> *closestIntersection, <span style="color: #216f85;">float3</span> *normal, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">LambertMaterial</span> *material) {
<span style="color: green;">// Try to intersect with the planes</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> j = 0; j < sceneObjects.<span style="color: purple;">NumPlanes</span>; ++j) {
<span style="color: green;">// Make a local copy</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Plane</span> plane = sceneObjects.<span style="color: purple;">Planes</span>[j];
<span style="color: #216f85;">float3</span> newNormal;
<span style="color: blue;">float</span> intersection = <span style="color: #880000;">TestRayPlaneIntersection</span>(ray, plane, newNormal);
<span style="color: blue;">if</span> (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.<span style="color: purple;">Materials</span>[plane.<span style="color: purple;">MaterialId</span>];
}
}
<span style="color: green;">// Try to intersect with the rectangles;</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> j = 0; j < sceneObjects.<span style="color: purple;">NumRectangles</span>; ++j) {
<span style="color: green;">// Make a local copy</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Rectangle</span> rectangle = sceneObjects.<span style="color: purple;">Rectangles</span>[j];
<span style="color: #216f85;">float3</span> newNormal;
<span style="color: blue;">float</span> intersection = <span style="color: #880000;">TestRayRectangleIntersection</span>(ray, rectangle, newNormal);
<span style="color: blue;">if</span> (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.<span style="color: purple;">Materials</span>[rectangle.<span style="color: purple;">MaterialId</span>];
}
}
<span style="color: green;">// Try to intersect with the circles;</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> j = 0; j < sceneObjects.<span style="color: purple;">NumCircles</span>; ++j) {
<span style="color: green;">// Make a local copy</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Circle</span> circle = sceneObjects.<span style="color: purple;">Circles</span>[j];
<span style="color: #216f85;">float3</span> newNormal;
<span style="color: blue;">float</span> intersection = <span style="color: #880000;">TestRayCircleIntersection</span>(ray, circle, newNormal);
<span style="color: blue;">if</span> (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.<span style="color: purple;">Materials</span>[circle.<span style="color: purple;">MaterialId</span>];
}
}
<span style="color: green;">// Try to intersect with the spheres;</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> j = 0; j < sceneObjects.<span style="color: purple;">NumSpheres</span>; ++j) {
<span style="color: green;">// Make a local copy</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Sphere</span> sphere = sceneObjects.<span style="color: purple;">Spheres</span>[j];
<span style="color: #216f85;">float3</span> newNormal;
<span style="color: blue;">float</span> intersection = <span style="color: #880000;">TestRaySphereIntersection</span>(ray, sphere, newNormal);
<span style="color: blue;">if</span> (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.<span style="color: purple;">Materials</span>[sphere.<span style="color: purple;">MaterialId</span>];
}
}
}</pre>
<br />
<br />
I added support for two more intersections since the last post. <a href="https://github.com/RichieSams/rapt/blob/78db0e323dab01891be79f1c6816faebdb747b70/source/scene/object_intersection.cuh#L165">Ray-Circle</a> and <a href="https://github.com/RichieSams/rapt/blob/78db0e323dab01891be79f1c6816faebdb747b70/source/scene/object_intersection.cuh#L107">Ray-Rectangle</a>. (It actually supports any parallelogram, not just rectangles). The idea for both is to first test intersection with the plane, then test if the intersection point is inside the respective shapes. The code/comments should be self explanatory, but feel free to comment if you have a question.<br />
<br />
<br />
<h3>
Direct Hit!!!! Or Perhaps Not....</h3>
This is where the true path tracing starts to come in. Once we've traced the ray through the scene, there are 2 outcomes: either we hit something, or we missed.<br />
<br />
For real-life materials, light doesn't just bounce off the surface of the material, but rather, it also enters the material itself, bouncing and reflecting off the molecules.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-8QyGKgCDlY8/VSGc-mX8JCI/AAAAAAAAATg/HtQEkAs5b_w/s1600/actual%2Blight%2Binteraction.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-8QyGKgCDlY8/VSGc-mX8JCI/AAAAAAAAATg/HtQEkAs5b_w/s1600/actual%2Blight%2Binteraction.png" height="172" width="400" /></a></div>
<div style="text-align: center;">
<span style="font-size: x-small;">Image from Naty Hoffman's Siggraph 2013 Presentation on Physically Based Shading</span></div>
<br />
As you can image, it would be prohibitively expensive to try to calculate the exact scattering of light within the material. Therefore, a common simplification is to calculate the interaction of light with the surface in two parts: diffuse and specular.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-oyozXukNVpk/VSGd4v8vKXI/AAAAAAAAATo/xAYV2SuG0nI/s1600/split%2Blighting%2Bmodel.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-oyozXukNVpk/VSGd4v8vKXI/AAAAAAAAATo/xAYV2SuG0nI/s1600/split%2Blighting%2Bmodel.png" height="171" width="400" /></a></div>
<div style="text-align: center;">
<span style="font-size: x-small;">Image from Naty Hoffman's Siggraph 2013 Presentation on Physically Based Shading</span></div>
<br />
The diffuse term represents the refraction, absorption, and scattering of light within the material. It is convenient to ignore entry to exit distance for the diffuse term. Doing so allows us to compute the diffuse lighting at a single point.<br />
<br />
The specular term represents the reflected light off the surface, aka, mirror-like reflections.<br />
<br />
For now, we're going to assume a material is either a perfect diffuse <a href="http://en.wikipedia.org/wiki/Lambert%27s_cosine_law">Lambertian</a> diffuse material, or a perfect specular mirror in order to keep things simple.<br />
<br />
<br />
First, let's look at the case where we hit a perfectly diffuse surface lit by a single light source, $L_{\text{i}}$. If we plug that into the Monte Carlo Estimate of the Rendering Equation for a single sample, $\text{k}$, we get:<br />
<br />
\[L_{\text{o, k}} = L_e + \frac{L_{\text{i, k}} \: \frac{materialColor_{\text{diffuse}}}{\pi} \: (n \cdot v)}{pdf}\]<br />
where $n$ is the surface normal where we hit, and $v$ is the unit vector pointing from the surface to the light. We use the fact that $\left | \cos \theta \right | \equiv (n \cdot v)$ since both $n$ and $v$ are unit length.<br />
<br />
For path tracing, $L_{\text{i}}$ represents the light coming from the <i>next</i> path. ie:<br />
\[L_{\text{i, k}} = L_{\text{o, k + 1}}\]<br />
It's a classic recursive algorithm. ie:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #216f85;">float3</span> <span style="color: #880000;">CalculateColorForRay</span>(<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> &ray, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">SceneObjects</span> &sceneObjects) {
<span style="color: green;">// Initialize the intersection variables</span>
<span style="color: blue;">float</span> closestIntersection = <span style="color: #6f008a;">FLT_MAX</span>;
<span style="color: #216f85;">float3</span> normal;
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">LambertMaterial</span> material;
<span style="color: #880000;">TestSceneIntersection</span>(ray, sceneObjects, &closestIntersection, &normal, &material);
<span style="color: green;">// Find out if we hit anything</span>
<span style="color: blue;">if</span> (closestIntersection < <span style="color: #6f008a;">FLT_MAX</span>) {
<span style="color: green;">// We hit an object</span>
<span style="color: green;">// Shoot a new ray</span>
ray.<span style="color: purple;">Origin</span> = ray.<span style="color: purple;">Origin</span> + ray.<span style="color: purple;">Direction</span> * closestIntersection;
<span style="color: blue;">if</span> (material.<span style="color: purple;">MaterialType</span> == <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">MATERIAL_TYPE_DIFFUSE</span>) {
<span style="color: green;">// We hit a diffuse surface</span>
ray.<span style="color: purple;">Direction</span> = <span style="color: #880000;">CreateNewRayDirection</span>();
} <span style="color: blue;">else</span> <span style="color: blue;">if</span> (material.<span style="color: purple;">MaterialType</span> == <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">MATERIAL_TYPE_SPECULAR</span>) {
<span style="color: green;">// We hit a specular surface</span>
ray.<span style="color: purple;">Direction</span> = <span style="color: #880000;">reflect</span>(ray.<span style="color: purple;">Direction</span>, normal);
}
<span style="color: blue;">return</span> material.<span style="color: purple;">EmmisiveColor</span> + <span style="color: #880000;">CalculateColorForRay</span>(ray, sceneObjects) * material.<span style="color: purple;">MainColor</span> * <span style="color: #880000;">dot</span>(ray.<span style="color: purple;">Direction</span>, normal) * 2.0f <span style="color: green;">/* pdf */</span>;
} <span style="color: blue;">else</span> {
<span style="color: green;">// We didn't hit anything, return the sky color</span>
<span style="color: blue;">return</span> <span style="color: #880000;">make_float3</span>(0.846f, 0.933f, 0.949f);
}
}</pre>
<br />
Unfortunately for us, the GPU doesn't handle recursion well. In fact, it's forbidden in most, if not all, of the GPU programming languages. That said, since multiplication is both <a href="http://en.wikipedia.org/wiki/Commutative_property">commutative </a>and <a href="http://en.wikipedia.org/wiki/Distributive_property">distributive</a>, we can factor out all the $\frac{\frac{materialColor_{\text{diffuse}}}{\pi} \: (n \cdot v)}{pdf}$ terms and turn our path tracing algorithm into a iterative solution:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #216f85;">float3</span> pixelColor = <span style="color: #880000;">make_float3</span>(0.0f, 0.0f, 0.0f);
<span style="color: #216f85;">float3</span> accumulatedMaterialColor = <span style="color: #880000;">make_float3</span>(1.0f, 1.0f, 1.0f);
<span style="color: green;">// Bounce the ray around the scene</span>
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> bounces = 0; bounces < 10; ++bounces) {
<span style="color: green;">// Initialize the intersection variables</span>
<span style="color: blue;">float</span> closestIntersection = <span style="color: #6f008a;">FLT_MAX</span>;
<span style="color: #216f85;">float3</span> normal;
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">LambertMaterial</span> material;
<span style="color: #880000;">TestSceneIntersection</span>(ray, sceneObjects, &closestIntersection, &normal, &material);
<span style="color: green;">// Find out if we hit anything</span>
<span style="color: blue;">if</span> (closestIntersection < <span style="color: #6f008a;">FLT_MAX</span>) {
<span style="color: green;">// We hit an object</span>
<span style="color: green;">// Add the emmisive light</span>
pixelColor += accumulatedMaterialColor * material.<span style="color: purple;">EmmisiveColor</span>;
<span style="color: green;">// Shoot a new ray</span>
<span style="color: green;">// Set the origin at the intersection point</span>
ray.<span style="color: purple;">Origin</span> = ray.<span style="color: purple;">Origin</span> + ray.<span style="color: purple;">Direction</span> * closestIntersection;
<span style="color: green;">// Offset the origin to prevent self intersection</span>
ray.<span style="color: purple;">Origin</span> += normal * 0.001f;
<span style="color: green;">// Choose the direction based on the material</span>
<span style="color: blue;">if</span> (material.<span style="color: purple;">MaterialType</span> == <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">MATERIAL_TYPE_DIFFUSE</span>) {
ray.Direction = CreateUniformDirectionInHemisphere(normal, &randState);
<span style="color: green;">// Accumulate the diffuse color</span>
accumulatedMaterialColor *= material.MainColor <span style="color: green;">/* * (1 / PI) <- this cancels with the PI in the pdf */</span> * dot(ray.Direction, normal);
<span style="color: green;">// Divide by the pdf</span>
accumulatedMaterialColor *= 2.0f; <span style="color: green;">// pdf == 1 / (2 * PI)</span>
} <span style="color: blue;">else</span> <span style="color: blue;">if</span> (material.<span style="color: purple;">MaterialType</span> == <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">MATERIAL_TYPE_SPECULAR</span>) {
ray.<span style="color: purple;">Direction</span> = <span style="color: #880000;">reflect</span>(ray.<span style="color: purple;">Direction</span>, normal);
<span style="color: green;">// Accumulate the specular color</span>
accumulatedMaterialColor *= material.<span style="color: purple;">MainColor</span>;
}
} <span style="color: blue;">else</span> {
<span style="color: green;">// We didn't hit anything, return the sky color</span>
pixelColor += accumulatedMaterialColor * <span style="color: #880000;">make_float3</span>(0.846f, 0.933f, 0.949f);
<span style="color: blue;">break</span>;
}
}</pre>
<br />
<br />
<h3>
<a href="https://www.youtube.com/watch?v=J0r482PIbzw">Bouncing is What <strike>Tiggers</strike> Rays Do Best</a></h3>
The last thing to do is to choose the direction for the next path. For mirror specular reflections, this is as simple as using the <a href="http://mathworld.wolfram.com/Reflection.html">reflect function</a>:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;">ray.<span style="color: purple;">Direction</span> = <span style="color: #880000;">reflect</span>(ray.<span style="color: purple;">Direction</span>, normal);
</pre>
<br />
Since it is a perfect reflection, the probability of the reflection in the chosen direction is 1. ie, $pdf_{\text{specular}} = 1$<br />
<br />
<br />
However, for diffuse reflections, it's a different story. By definition, the diffuse term can represent any internal reflection. Therefore, the light can come out in any direction within the unit hemisphere defined by the surface normal:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-hmLTyROQgU8/VSG-cFaPMRI/AAAAAAAAAT4/_GNZAKGNzGo/s1600/hemisphere.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-hmLTyROQgU8/VSG-cFaPMRI/AAAAAAAAAT4/_GNZAKGNzGo/s1600/hemisphere.png" height="259" width="320" /></a></div>
<br />
So, we need to randomly pick a direction in the unit hemisphere. For now, we'll use use a uniform distribution of directions. In a later post, we'll explore other distributions.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__device__</span> <span style="color: #216f85;">float3</span> <span style="color: #880000;">CreateUniformDirectionInHemisphere</span>(<span style="color: #216f85;">float3</span> normal, <span style="color: #216f85;">curandState</span> *randState) {
<span style="color: green;">// Create a random coordinate in spherical space</span>
<span style="color: green;">// Then calculate the cartesian equivalent</span>
<span style="color: blue;">float</span> z = <span style="color: #880000;">curand_uniform</span>(randState);
<span style="color: blue;">float</span> r = <span style="color: #880000;">sqrt</span>(1.0f - z * z);
<span style="color: blue;">float</span> phi = <span style="color: #6f008a;">TWO_PI</span> * <span style="color: #880000;">curand_uniform</span>(randState);
<span style="color: blue;">float</span> x = <span style="color: #880000;">cos</span>(phi) * r;
<span style="color: blue;">float</span> y = <span style="color: #880000;">sin</span>(phi) * r;
<span style="color: green;">// Find an axis that is not parallel to normal</span>
<span style="color: #216f85;">float3</span> majorAxis;
<span style="color: blue;">if</span> (<span style="color: #880000;">abs</span>(normal.<span style="color: purple;">x</span>) < <span style="color: #6f008a;">INV_SQRT_THREE</span>) {
majorAxis = <span style="color: #880000;">make_float3</span>(1, 0, 0);
} <span style="color: blue;">else</span> <span style="color: blue;">if</span> (<span style="color: #880000;">abs</span>(normal.<span style="color: purple;">y</span>) < <span style="color: #6f008a;">INV_SQRT_THREE</span>) {
majorAxis = <span style="color: #880000;">make_float3</span>(0, 1, 0);
} <span style="color: blue;">else</span> {
majorAxis = <span style="color: #880000;">make_float3</span>(0, 0, 1);
}
<span style="color: green;">// Use majorAxis to create a coordinate system relative to world space</span>
<span style="color: #216f85;">float3</span> u = <span style="color: #880000;">normalize</span>(<span style="color: #880000;">cross</span>(majorAxis, normal));
<span style="color: #216f85;">float3</span> v = <span style="color: #880000;">cross</span>(normal, u);
<span style="color: #216f85;">float3</span> w = normal;
<span style="color: green;">// Transform from spherical coordinates to the cartesian coordinates space</span>
<span style="color: green;">// we just defined above, then use the definition to transform to world space</span>
<span style="color: blue;">return</span> <span style="color: #880000;">normalize</span>(u * x +
v * y +
w * z);
}</pre>
<br />
We use two uniformly distributed random numbers to create a random point in the spherical coordinate system. Then we use some trig to convert to Cartesian coordinates. Finally, we transform the random point from local coordinates (ie, relative to the normal) to world coordinates.<br />
<br />
To create the transformation coordinate system, we first find the world axis that is <i>most </i>perpendicular to the normal by comparing each axis of the normal to $\frac{1}{\sqrt{3}}$. $\frac{1}{\sqrt{3}}$ is significant because it's the length at which all 3 axes are equal (since we know the normal is a unit vector). So if any are less than $\frac{1}{\sqrt{3}}$, there is a good chance that axis is the smallest.<br />
<br />
Next, we use the cross product to find an axis that is perpendicular to the normal. And finally, we use the cross product again to find an axis that is perpendicular to both the new axis and the normal.<br />
<br />
<br />
The $pdf$ for a uniformly distributed direction in a hemisphere is:<br />
\[pdf_{\text{hemi}} = \frac{1}{2 \pi}\]<br />
This corresponds to the <a href="http://mathworld.wolfram.com/SolidAngle.html">solid angle</a> that the random directions can come from. A hemisphere has a solid angle of $2\pi$ steradians.<br />
<br />
<br />
<h3>
Shooting Yourself in the Head</h3>
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: green;">// Russian Roulette</span>
<span style="color: blue;">if</span> (bounces > 3) {
<span style="color: blue;">float</span> p = <span style="color: #880000;">max</span>(accumulatedMaterialColor.<span style="color: purple;">x</span>, <span style="color: #880000;">max</span>(accumulatedMaterialColor.<span style="color: purple;">y</span>, accumulatedMaterialColor.<span style="color: purple;">z</span>));
<span style="color: blue;">if</span> (<span style="color: #880000;">curand_uniform</span>(&randState) > p) {
<span style="color: blue;">break</span>;
}
accumulatedMaterialColor /= p;
}</pre>
<br />
During path tracing, we keep bouncing until we miss the scene or exceed the maximum number of bounces. But if the scene is closed and maxBounces is large, we could end up bouncing for a very long time. Remember though, that at each bounce, we accumulate the light attenuation of the material we hit ie: <span style="font-family: Consolas; font-size: 13px;">accumulatedMaterialColor</span> will end up getting closer and closer to zero the more bounces we do. Since <span style="font-family: Consolas; font-size: 13px;">accumulatedMaterialColor </span>ends up being multiplied by the emitted light color, it makes very little sense to keep bouncing after a while, since the result won't add very much to the final pixel color.<br />
<br />
However, we can't arbitrarily stop bouncing after <span style="font-family: Consolas; font-size: 13px;">accumulatedMaterialColor</span> goes below a certain threshold, since this would create bias in the Monte Carlo Integration. What we <b><i>can</i></b> do, though, is use a random probability to determine the stopping point, as long as we divide out the $pdf$ of the random sample. This is known as Russian Roulette.<br />
<br />
In this implementation, we create a uniform random sample and then compare it against the max component of <span style="font-family: Consolas; font-size: 13px;">accumulatedMaterialColor.</span> If the random sample is greater than the max component then we terminate the tracing. If not, we keep the integration unbiased by dividing by the $pdf$ of our calculation. Since we used a uniform random variable, the $pdf$ is just:<br />
\[pdf = \text{p}\]<br />
<br />
<br />
<h3>
Success!</h3>
If we whip up a simple scene with a plane and 9 balls we get:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/qC5t1Un.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/qC5t1Un.png" height="374" width="640" /></a></div>
<br />
Whooo!! Look at those nice soft shadows! And the color bleeding! Houston, this is our first small step towards a battle-ready photo-realistic GPU renderer.<br />
<br />
Let's do another! Obligatory Cornell Box:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/bqe68uT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/bqe68uT.png" height="374" width="640" /></a></div>
<br />
<br />
One thing to note: since the Cornell box scene has a light with very small surface area in comparison to the surface area of the rest of the scene, the probability that a ray will hit the light is relatively low. Therefore, it takes a <i><b>long </b></i>time to converge. The render above was around 30 minutes. In comparison, the 9 balls scene took about 60 seconds to resolve to an acceptable level.<br />
<br />
<br />
<h3>
Closing Words</h3>
There we go. Our very own path tracer. <a href="https://www.youtube.com/watch?v=R8i7uMmDxuE">I shall call him Squishy and he shall be mine, and he shall be my Squishy.</a> (I really need to stop watching videos right before writing this blog)<br />
<br />
The code for everything in this post is on <a href="https://github.com/RichieSams/rapt">GitHub</a>. It's open source under the Apache license, so feel free to use it in your own projects.<br />
<br />
As always, feel free to ask questions, make comments, and if you find an error, please let me know.<br />
<br />
Happy coding!<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com9tag:blogger.com,1999:blog-4016790357096156934.post-1379856799324626402015-03-13T17:23:00.002-05:002015-04-24T11:35:45.265-05:00Shooting Objects from Across the WayThis is the third post in a series documenting my adventures in creating a GPU path tracer from scratch<br />
1. <a href="http://richiesams.blogspot.com/2015/03/tracing-light-in-virtual-world.html">Tracing Light in a Virtual World</a><br />
2. <a href="http://richiesams.blogspot.com/2015/03/creating-randomness-and-acummulating.html">Creating Randomness and Accumulating Change</a><br />
<br />
<br />
If you recall, the very high level algorithm for path tracing is:<br />
<ol>
<li>From each pixel, fire a ray out into the scene from the eye.</li>
<li>If the ray hits an object</li>
<ol type="a">
<li>Use the material properties to accumulate attenuation</li>
<li>Bounce the ray in a new direction</li>
<li>GOTO 2</li>
</ol>
<li>If the ray hits a light</li>
<ol type="a">
<li>Multiply the light and the attenuation</li>
<li>Add the result to the accumulation buffer</li>
<li>GOTO 5</li>
</ol>
<li>If the ray doesn't hit anything</li>
<ol type="a">
<li>GOTO 5</li>
</ol>
<li>GOTO 1 until sufficiently converged</li>
</ol>
<div>
<br /></div>
<div>
The first post in the series covered step 1 and the second post covered step 3b. This post is going to cover how to test if a ray hits an object in the scene (aka, 2 and 3).</div>
<div>
<br /></div>
<div>
<br /></div>
<h3>
Ray - Sphere Intersection</h3>
<div>
<span style="font-size: x-small;">Foreword: Scratchapixel has an <a href="http://www.scratchapixel.com/old/lessons/3d-basic-lessons/lesson-7-intersecting-simple-shapes/ray-sphere-intersection/">excellent lesson</a> covering ray-object intersections. Much of what I say below is based on their lessons. Much props to the authors.</span></div>
<div>
<br /></div>
<div>
<br /></div>
A sphere can be represented mathematically with vector representing the location of the sphere's center, and its radius. (You can save a multiplication in a later calculation if you store the radius squared instead)<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: blue;">struct</span> <span style="color: #216f85;">Sphere</span> {
<span style="color: #216f85;">float3</span> <span style="color: navy;">Center</span>;
<span style="color: blue;">float</span> <span style="color: navy;">RadiusSquared</span>;
};</pre>
<br />
A ray can be represented as a vector representing its origin and a unit vector representing its direction:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: blue;">struct</span> <span style="color: #216f85;">Ray</span> {
<span style="color: #216f85;">float3</span> <span style="color: navy;">Origin</span>;
<span style="color: #216f85;">float3</span> <span style="color: navy;">Direction</span>;
};</pre>
<br />
Let's look at a picture of the most common type of ray - sphere intersection:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-PM6udP7qWa0/VQNUaLuEWQI/AAAAAAAAASI/E_wlSlrvvP4/s1600/figure0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-PM6udP7qWa0/VQNUaLuEWQI/AAAAAAAAASI/E_wlSlrvvP4/s1600/figure0.png" height="357" width="400" /></a></div>
\[\begin{align}<br />
\overrightarrow{R_{o}} &= ray \: origin \\<br />
\overrightarrow{R_{d}} &= ray \: direction \\<br />
\overrightarrow{S_{c}} &= sphere \: center \\<br />
S_{r} &= sphere \: radius \\<br />
\overrightarrow{P_{0}} &= first \: intersection \: point \\<br />
\overrightarrow{P_{1}} &= second \: intersection \: point \\<br />
t_{0} &= distance \: from \: \overrightarrow{R_{o}} \: to \: \overrightarrow{P_{0}} \\<br />
t_{1} &= distance \: from \: \overrightarrow{R_{o}} \: to \: \overrightarrow{P_{1}} \\<br />
\end{align}\]<br />
<br />
We would like to find $\overrightarrow{P_{0}}$ and $\overrightarrow{P_{1}}$. Mathematically, they are defined as:<br />
\[\begin{align}<br />
\overrightarrow{P_{0}} &= \overrightarrow{R_{o}} + t_{0} \overrightarrow{R_{d}} \\<br />
\overrightarrow{P_{1}} &= \overrightarrow{R_{o}} + t_{1} \overrightarrow{R_{d}} \\<br />
\end{align}\]<br />
<br />
We already know $\overrightarrow{R_{o}}$ and $\overrightarrow{R_{d}}$, so we just need to find $t_{0}$ and $t_{1}$. In order to do so, let's define a few new variables.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-OaLbo9F2pCc/VQNUaPQ7wWI/AAAAAAAAASU/AjgS7gM9tag/s1600/figure1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-OaLbo9F2pCc/VQNUaPQ7wWI/AAAAAAAAASU/AjgS7gM9tag/s1600/figure1.png" height="347" width="400" /></a></div>
<br />
<br />
And then define $t_{0}$ and $t_{1}$ in terms of $t_{ca}$ and $t_{hc}$:<br />
\[\begin{align}<br />
t_{0} &= t_{ca} - t_{hc} \\<br />
t_{1} &= t_{ca} + t_{hc} \\<br />
\end{align}\]<br />
<br />
<br />
Now we can begin solving for our unknown variables. To start, let's look at the right triangle formed by $t_{ca}$, $\overrightarrow{L}$, and $d$.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-Uu9PBg1OD6g/VQNUaKZiegI/AAAAAAAAASM/NfhRZeoVoEY/s1600/figure2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-Uu9PBg1OD6g/VQNUaKZiegI/AAAAAAAAASM/NfhRZeoVoEY/s1600/figure2.png" height="231" width="400" /></a></div>
<br />
<br />
We can solve for $t_{ca}$ by using the definition of cosine and the dot product:<br />
\[\cos\left ( \theta \right ) = \frac{adjacent}{hypotenuse} = \frac{t_{ca}}{\left \| \overrightarrow{L} \right \|}\]<br />
\[\overrightarrow{m} \cdot \overrightarrow{n} = \left \| m \right \| \left \| n \right \| \cos \left ( \theta \right )\]
<br />
<hr width="50%" />
\[\begin{split} ie: \end{split} \qquad \qquad<br />
\begin{split}<br />
\overrightarrow{R_{d}} \cdot \overrightarrow{L} &= \left \| \overrightarrow{R_{d}} \right \| \left \| \overrightarrow{L} \right \| \cos \left ( \theta \right ) \\<br />
&= \frac{\left \| \overrightarrow{L} \right \| t_{ca}}{\left \| \overrightarrow{L} \right \|} \\<br />
&= t_{ca}<br />
\end{split}\]<br />
<br />
$\overrightarrow{R_{d}}$ is a unit vector. Therefore, $\left \| \overrightarrow{R_{d}} \right \| = 1$ and cancels out. Then, if we replace $\cos \left ( \theta \right )$ with its definition, we can cancel $\left \| \overrightarrow{L} \right \|$ from top and bottom. Thus, we're left with just $t_{ca}$.<br />
<br />
<br />
<br />
Using the Pythagorean Theorem and a trick with the dot product, we can solve for d:<br />
\[\overrightarrow{m} \cdot \overrightarrow{m} \equiv \left \| \overrightarrow{m} \right \| ^{2}\]
<br />
<hr width="50%" />
\[\begin{align} t_{ca} \: ^{2} + d^{2} &= \left \| \overrightarrow{L} \right \| ^{2}\\<br />
t_{ca} \: ^{2} + d^{2} &= \overrightarrow{L} \cdot \overrightarrow{L} \\<br />
d^{2} &= \overrightarrow{L} \cdot \overrightarrow{L} - t_{ca} \: ^{2}<br />
\end{align}\]<br />
<br />
<br />
To solve for $t_{hc}$, let's look at the triangle formed by $S_{r}$, $t_{hc}$, and $d$.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-cYTcFIPg128/VQNUasZDIWI/AAAAAAAAASE/Oimw01SSH7o/s1600/figure3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-cYTcFIPg128/VQNUasZDIWI/AAAAAAAAASE/Oimw01SSH7o/s1600/figure3.png" height="320" width="302" /></a></div>
<br />
<br />
Using the Pythagorean Theorem again, we can solve for $t_{hc}$:<br />
\[\begin{align}<br />
t_{hc} \: ^{2} + d^{2} &= S_{r} \: ^{2} \\<br />
t_{hc} &= \sqrt{S_{r} \: ^{2} - d^{2}}<br />
\end{align}\]<br />
<br />
<br />
Whew! We made it! Using the above equations, we can find the two intersection points. Let's look at some code that implements the above and walk through some special cases / early outs.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: green;">/**</span>
<span style="color: green;"> * Test for the intersection of a ray with a sphere</span>
<span style="color: green;"> *</span>
<span style="color: green;"> * @param ray The ray to test</span>
<span style="color: green;"> * @param sphere The sphere to test</span>
<span style="color: green;"> * @param normal_out Filled with normal of the surface at the intersection point. Not changed if no intersection.</span>
<span style="color: green;"> * @return The distance from the ray origin to the nearest intersection. -1.0f if no intersection</span>
<span style="color: green;"> */</span>
<span style="color: #6f008a;">__device__</span> <span style="color: blue;">float</span> <span style="color: #880000;">TestRaySphereIntersection</span>(<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> &ray, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Sphere</span> &sphere, <span style="color: #216f85;">float3</span> &normal_out) {
<span style="color: #216f85;">float3</span> L = sphere.<span style="color: purple;">Center</span> - ray.<span style="color: purple;">Origin</span>;
<span style="color: blue;">float</span> t_ca = <span style="color: #880000;">dot</span>(L, ray.<span style="color: purple;">Direction</span>);
<span style="color: green;">// Ray points away from the sphere</span>
<span style="color: blue;">if</span> (t_ca < 0) {
<span style="color: blue;">return</span> -1.0f;
}
<span style="color: blue;">float</span> d_squared = <span style="color: #880000;">dot</span>(L, L) - t_ca * t_ca;
<span style="color: green;">// Ray misses the sphere</span>
<span style="color: blue;">if</span> (d_squared > sphere.<span style="color: purple;">RadiusSquared</span>) {
<span style="color: blue;">return</span> -1.0f;
}
<span style="color: blue;">float</span> t_hc = <span style="color: #880000;">sqrt</span>(sphere.<span style="color: purple;">RadiusSquared</span> - d_squared);
<span style="color: blue;">float</span> t_0 = t_ca - t_hc;
<span style="color: blue;">float</span> t_1 = t_ca + t_hc;
<span style="color: blue;">float</span> nearestIntersection;
<span style="color: blue;">float</span> normalDirection;
<span style="color: blue;">if</span> (t_0 > 0 && t_1 > 0) {
<span style="color: green;">// Two intersections</span>
<span style="color: green;">// Return the nearest of the two</span>
nearestIntersection = <span style="color: #880000;">min</span>(t_0, t_1);
normalDirection = 1.0f;
} <span style="color: blue;">else</span> {
<span style="color: green;">// Ray starts inside the sphere</span>
<span style="color: green;">// Return the far side of the sphere</span>
nearestIntersection = max(firstIntersection, secondIntersection);
<span style="color: green;">// We reverse the direction of the normal, since we are inside the sphere</span>
normalDirection = -1.0f;
}
normal_out = <span style="color: #880000;">normalize</span>(((ray.<span style="color: purple;">Origin</span> + (ray.<span style="color: purple;">Direction</span> * nearestIntersection)) - sphere.<span style="color: purple;">Center</span>) * normalDirection);
<span style="color: blue;">return</span> nearestIntersection;
}</pre>
<br />
<br />
The algorithm can be summarized as follows:<br />
<ol>
<li>Calculate $t_{ca}$</li>
<li>If $t_{ca}$ is negative, $\overrightarrow{R_{d}}$ is pointing away from the sphere. Thus, there can not be an intersection</li>
<li>Calculate $d^{2}$</li>
<li>If $d^{2}$ is greater than $S_{r} \: ^{2}$, the ray misses the sphere.</li>
<li>Calculate $t_{hc}$</li>
<li>Calculate $t_{0}$ and $t_{1}$.</li>
<li>If $t_{0}$ and $t_{1}$ are both positive, the ray starts outside the sphere and intersects it twice. Choose the closest of the two intersections</li>
<li>If either $t_{0}$ or $t_{1}$ is negative, the ray starts inside the sphere and intersects it on the way out. Choose the positive intersection.</li>
<ul>
<li>They both can't both be negative, since that would mean the ray is pointing away from the sphere, and we already checked for that.</li>
<li>See following picture.</li>
</ul>
</ol>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-pGN44UUK3Ts/VQNUbP2d1WI/AAAAAAAAASA/gHO4ctDA9qM/s1600/figure4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-pGN44UUK3Ts/VQNUbP2d1WI/AAAAAAAAASA/gHO4ctDA9qM/s1600/figure4.png" height="328" width="400" /></a></div>
<br /></div>
<br />
<br />
<br />
<h3>
Ray - Plane Intersection</h3>
A plane can be represented using any point on the plane, and a unit normal vector from the plane.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: blue;">struct</span> <span style="color: #216f85;">Plane</span> {
<span style="color: #216f85;">float3</span> <span style="color: navy;">Point</span>;
<span style="color: #216f85;">float3</span> <span style="color: navy;">Normal</span>;
};</pre>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-RplBceCOW84/VQNUbuWY3iI/AAAAAAAAAR8/V5VKI8-mC0I/s1600/figure5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-RplBceCOW84/VQNUbuWY3iI/AAAAAAAAAR8/V5VKI8-mC0I/s1600/figure5.png" height="400" width="262" /></a></div>
\[\begin{align}<br />
\overrightarrow{P_{p}} &= plane \: point \\<br />
\overrightarrow{P_{n}} &= plane \: normal<br />
\end{align}\]<br />
<br />
Let's look at the intersection from the top.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-czxD85q0Wcg/VQNUcGsR_jI/AAAAAAAAAR0/S41Pc6TuMLs/s1600/figure6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-czxD85q0Wcg/VQNUcGsR_jI/AAAAAAAAAR0/S41Pc6TuMLs/s1600/figure6.png" height="280" width="400" /></a></div>
<br />
<br />
We would like to find $d$. We can start by taking the dot product between $\overrightarrow{R_{d}}$ and $\overrightarrow{P_{n}}$. We get $\cos \left ( \theta_{1} \right )$, since both $\overrightarrow{R_{d}}$ and $\overrightarrow{P_{n}}$ are unit vectors.<br />
\[\begin{align}<br />
\overrightarrow{R_{d}} \cdot \overrightarrow{P_{n}} &= \left \| \overrightarrow{R_{d}} \right \| \left \| \overrightarrow{P_{n}} \right \| \cos \left ( \theta_{1} \right ) \\<br />
&= \cos \left ( \theta_{1} \right )<br />
\end{align}\]<br />
<br />
Let's look at the triangle formed by $\overrightarrow{P_{n}}$ and $\overrightarrow{L}$, where $\overrightarrow{L}$ is the vector between $\overrightarrow{P_{p}}$ and $\overrightarrow{R_{o}}$.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-X7vmMPpC16Y/VQNUcamTuiI/AAAAAAAAARw/mU9NOzQfYQU/s1600/figure7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-X7vmMPpC16Y/VQNUcamTuiI/AAAAAAAAARw/mU9NOzQfYQU/s1600/figure7.png" height="242" width="320" /></a></div>
<br />
<br />
If we take the dot product between $\overrightarrow{P_{n}}$ and $\overrightarrow{L}$, we get $a$.<br />
<br />
\[\begin{align}<br />
\overrightarrow{P_{n}} \cdot \overrightarrow{L} &= \left \| \overrightarrow{P_{n}} \right \| \left \| \overrightarrow{L} \right \| \cos \left ( \theta_{2} \right ) \\<br />
&= \frac{\left \| \overrightarrow{L} \right \| a}{\left \| \overrightarrow{L} \right \|} \\<br />
&= a<br />
\end{align}\]<br />
<br />
<br />
Finally, let's look at the two triangles formed between $\overrightarrow{P_{n}}$, $\overrightarrow{R_{d}}$, and the plane itself.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-8Co06_Hq4Ws/VQNUc3_WNfI/AAAAAAAAAR4/MKNKPMnLtu8/s1600/figure8.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-8Co06_Hq4Ws/VQNUc3_WNfI/AAAAAAAAAR4/MKNKPMnLtu8/s1600/figure8.png" height="320" width="267" /></a></div>
<br />
<br />
They are similar triangles, so we can solve for $d$ using the <a href="http://www.mathopenref.com/similartriangles.html">Law of Similar Triangles</a> and the definition of cosine:<br />
\[\cos \left ( \theta_{1} \right ) = \frac{b}{c}\]
<br />
<hr width="50%" />
\[\begin{align}<br />
\frac{a + b}{c + d} &= \frac{b}{c} \\<br />
c \left ( a + b \right ) &= b \left ( c + d \right ) \\<br />
ac + bc &= bc + bd \\<br />
ac &= bd \\<br />
d &= \frac{ac}{b} \\<br />
d &= \frac{a}{\cos \left ( \theta_{1} \right )}<br />
\end{align}\]<br />
<br />
<br />
Let's look at the code implementation of the above and walk through it.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: green;">/**</span>
<span style="color: green;"> * Test for the intersection of a ray with a plane </span>
<span style="color: green;"> *</span>
<span style="color: green;"> * @param ray The ray to test</span>
<span style="color: green;"> * @param plane The plane to test</span>
<span style="color: green;"> * @param normal_out Filled with normal of the surface at the intersection point. Not changed if no intersection.</span>
<span style="color: green;"> * @return The distance from the ray origin to the nearest intersection. -1.0f if no intersection</span>
<span style="color: green;"> */</span>
<span style="color: #6f008a;">__device__</span> <span style="color: blue;">float</span> <span style="color: #880000;">TestRayPlaneIntersection</span>(<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> &ray, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Plane</span> &plane, <span style="color: #216f85;">float3</span> &normal_out) {
<span style="color: blue;">float</span> cos_theta1 = <span style="color: #880000;">dot</span>(plane.<span style="color: purple;">Normal</span>, ray.<span style="color: purple;">Direction</span>);
<span style="color: green;">// If cos_theta1 is greater than -epison,</span>
<span style="color: green;">// the ray is perpendicular to the plane or points away from the plane normal</span>
<span style="color: blue;">if</span> (cos_theta1 > -1.0e-6f) {
<span style="color: blue;">return</span> -1.0f;
}
normal_out = plane.<span style="color: purple;">Normal</span>;
<span style="color: blue;">float</span> a = <span style="color: #880000;">dot</span>(plane.<span style="color: purple;">Normal</span>, plane.<span style="color: purple;">Point</span> - ray.<span style="color: purple;">Origin</span>);
<span style="color: blue;">return</span> a / cos_theta1;
}</pre>
<br />
<br />
The algorithm can be summarized as follows:<br />
<ol>
<li>Calculate $\cos \left ( \theta_{1} \right )$</li>
<li>If $\cos \left ( \theta_{1} \right )$ is greater than or equal to zero, the ray is perpendicular to the plane, or faces away</li>
<ul>
<li>In the code, we use a small episilon to account for floating point innacuracies</li>
</ul>
<li>Calculate a</li>
<li>Return $\frac{a}{\cos \left ( \theta_{1} \right )}$</li>
</ol>
<br />
<br />
<h3>
Testing Them Out</h3>
In order to test the code out, I added on to the eye ray code we created in the first post. After creating the ray, I tried to intersect it with an object in the scene. If it hit, I did some basic Lambertian shading with a hard-coded directional light. If it didn't, I colored the pixel black. Here's the results:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/81PigOR.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/81PigOR.png" height="374" width="640" /></a></div>
<br />
Very nice! If I do say so myself. I don't have a plane in the scene because I didn't implement the plane intersection code until I had skipped ahead to the path tracing itself. (I got too excited. Ha ha!) So you'll have to take my word for it that the code does work. Below is the kernel code used to create the image above:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">PathTraceKernel</span>(<span style="color: blue;">unsigned</span> <span style="color: blue;">char</span> *textureData, <span style="color: #216f85;">uint</span> width, <span style="color: #216f85;">uint</span> height, <span style="color: #216f85;">size_t</span> pitch, <span style="color: #216f85;">DeviceCamera</span> *g_camera, <span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Sphere</span> *g_spheres, <span style="color: #216f85;">uint</span> numSpheres, <span style="color: #216f85;">uint</span> hashedFrameNumber) {
<span style="color: blue;">int</span> x = blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span> + threadIdx.<span style="color: purple;">x</span>;
<span style="color: blue;">int</span> y = blockIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">y</span> + threadIdx.<span style="color: purple;">y</span>;
<span style="color: green;">// Create a local copy of the camera</span>
<span style="color: #216f85;">DeviceCamera</span> camera = *g_camera;
<span style="color: green;">// Global threadId</span>
<span style="color: blue;">int</span> threadId = (blockIdx.<span style="color: purple;">x</span> + blockIdx.<span style="color: purple;">y</span> * gridDim.<span style="color: purple;">x</span>) * (blockDim.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">y</span>) + (threadIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">x</span>) + threadIdx.<span style="color: purple;">x</span>;
<span style="color: green;">// Create random number generator</span>
<span style="color: #216f85;">curandState</span> randState;
<span style="color: #880000;">curand_init</span>(hashedFrameNumber + threadId, 0, 0, &randState);
<span style="color: green;">// Calculate the first ray for this pixel</span>
<span style="color: #216f85;">Scene</span>::<span style="color: #216f85;">Ray</span> ray = {camera.<span style="color: purple;">Origin</span>, <span style="color: #880000;">CalculateRayDirectionFromPixel</span>(x, y, width, height, camera, &randState)};
<span style="color: green;">// Generate a uniform random number</span>
<span style="color: green;">//float randNum = curand_uniform(&randState);</span>
<span style="color: green;">// Try to intersect with the spheres;</span>
<span style="color: blue;">float</span> closestIntersection = <span style="color: #6f008a;">FLT_MAX</span>;
<span style="color: #216f85;">float3</span> normal;
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> i = 0; i < numSpheres; ++i) {
<span style="color: #216f85;">float3</span> newNormal;
<span style="color: blue;">float</span> intersection = <span style="color: #880000;">TestRaySphereIntersection</span>(ray, g_spheres[i], newNormal);
<span style="color: blue;">if</span> (intersection > 0.0f && intersection < closestIntersection) {
closestIntersection = intersection;
normal = newNormal;
}
}
<span style="color: #216f85;">float3</span> pixelColor;
<span style="color: blue;">if</span> (closestIntersection < <span style="color: #6f008a;">FLT_MAX</span>) {
<span style="color: blue;">float</span> attentuation = <span style="color: #880000;">max</span>(<span style="color: #880000;">dot</span>(normal, <span style="color: #880000;">make_float3</span>(0.70710678118f, 0.70710678118f, -0.70710678118f)), 0.0f);
pixelColor = <span style="color: #880000;">make_float3</span>(0.846, 0.933, 0.949) * attentuation + <span style="color: #880000;">make_float3</span>(0.15f, 0.15f, 0.15f);
} <span style="color: blue;">else</span> {
pixelColor = <span style="color: #880000;">make_float3</span>(0.0f, 0.0f, 0.0f);
}
<span style="color: blue;">if</span> (x < width && y < height) {
<span style="color: green;">// Get a pointer to the pixel at (x,y)</span>
<span style="color: blue;">float</span> *pixel = (<span style="color: blue;">float</span> *)(textureData + y * pitch) + 4 <span style="color: green;">/*RGBA*/</span> * x;
<span style="color: green;">// Write out pixel data</span>
pixel[0] += pixelColor.<span style="color: purple;">x</span>;
pixel[1] += pixelColor.<span style="color: purple;">y</span>;
pixel[2] += pixelColor.<span style="color: purple;">z</span>;
<span style="color: green;">// Ignore alpha, since it's hardcoded to 1.0f in the display</span>
<span style="color: green;">// We have to use a RGBA format since CUDA-DirectX interop doesn't support R32G32B32_FLOAT</span>
}
</pre>
<br />
<br />
<h3>
Conclusion</h3>
There we go! We now have everything we need to do path tracing. The next post will be just that: creating a basic path tracing kernel. I've already implemented it in code, so it's just a matter of how quickly I can write up the post. (<a href="http://i.imgur.com/Z7XHne6.png">teaser picture</a>). Keep an eye out!<br />
<br />
The code for everything in this post is on <a href="https://github.com/RichieSams/rapt">GitHub</a>. It's open source under the Apache license, so feel free to use it in your own projects. The path tracing code is already pushed to the repo, if you're feeling antsy, you can give that a look.<br />
<br />
As always, feel free to ask questions, make comments, and if you find an error, please let me know.<br />
<br />
<br />
Happy coding!<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com12tag:blogger.com,1999:blog-4016790357096156934.post-60357329018731197482015-03-06T14:20:00.003-06:002015-03-07T13:58:53.659-06:00Creating Randomness and Acummulating ChangeThis is the second post in a series documenting my adventures in creating a GPU path tracer from scratch. If you missed it, the first post is <a href="http://richiesams.blogspot.com/2015/03/tracing-light-in-virtual-world.html" target="_blank">here</a>.<br />
<br />
<br />
Path tracing uses <a href="http://en.wikipedia.org/wiki/Monte_Carlo_integration" target="_blank">Monte Carlo Integration</a> to estimate the Global Illumination in the scene. In our case, Monte Carlo Integration boils down to taking a large number of<b> </b><i><b>random</b> </i>samples of the scene, and averaging them together. Random is the key word here. If we don't randomly sample, the resulting image will have artifacts in the form of patterns, banding, etc.<br />
<br />
<br />
<h3>
Creating Randomness from the Non-Random</h3>
So how do we create random numbers? This is a really old topic that has been extensively researched, so rather than reiterate it here, I'll just point you to <a href="http://lmgtfy.com/?q=random+number+generator" target="_blank">Google</a>. The point we do care about, though, is <i><b>where </b></i>we create the random numbers. As far as I can see, our options are as follows:<br />
<ol>
<li>Generate a large number of random numbers on the CPU using classic psuedo-random number generators (PRNG), and transfer them to the GPU to be consumed as needed.</li>
<li>Create a random number generator on the GPU, and access it from each thread</li>
<li>Create a random number generator per thread on the GPU</li>
</ol>
<div>
<br /></div>
<div>
While option 1 looks simple and straightforward, path tracing will use a large number of random numbers, and each number consumed has to be transferred across the bus from the CPU to the GPU. This is going to be SLOW.</div>
<div>
<br /></div>
<div>
Ok, with the CPU out of the picture, we need to find a PRNG for the GPU. Luckily, CUDA comes with a library that does just that: <a href="http://docs.nvidia.com/cuda/curand/" target="_blank">curand</a>. But, how many PRNGs should we have, and where should they live?</div>
<div>
<br /></div>
<div>
To better understand the problem, let's briefly go over how PRNGs work. PRNGs use math to simulate random sequences. In order to get different numbers, they need to access and store internal state, aka data. The size of this state depends on the PRNG algorithm. </div>
<div>
<br /></div>
<div>
If we choose option 2, every single thread will want access to a single generator. There will be massive contention, and random number generation will degrade to a serial operation. Again, since path tracing requires a large number of random numbers, this option will be slow.</div>
<div>
<br /></div>
<div>
So option 3 it is! It turns out that the size of the state for the default curand generator isn't that big, so storing state per thread isn't too bad. </div>
<div>
<br /></div>
<div>
<br /></div>
<h3>
Implementing curand in the Kernel</h3>
<div>
In order to create a generator in the kernel, you have to create a curandstate object and then call curand_init on it, passing in a seed, a sequence number, and an offset.<br />
<br /></div>
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: #216f85;">curandState</span> randState;
<span style="color: #880000;">curand_init</span>(seed, sequenceNum, offset, &randState);</pre>
<br />
Then, you can generate random numbers using:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: #216f85;">uint64</span> randInteger = <span style="color: #880000;">curand</span>(&randState);
<span style="color: blue;">float</span> randNormalizedFloat = <span style="color: #880000;">curand_uniform</span>(&randState);</pre>
<br />
Two states with different seeds will create a different sequence of random numbers. Two states with the same seed will create the same sequence of random numbers.<br />
<br />
Two states with the same seed, but different sequenceNum will use the same sequence, but be offset to different blocks of the sequence. (Specifically, in increments of 2<sup>67</sup>) Why would you want this? According to the documentation, "Sequences generated with the same seed and different sequence numbers will not have statistically correlated values."<br />
<br />
Offset just manually skips ahead <i>n </i>in the sequence.<br />
<br />
In the curand documentation, the creators mention that curand_init() can be relatively slow, so if you're launching the same kernel multiple times, it's usually better to keep one curandstate per thread, but to store it in global memory between kernel launches. ie:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">setupRandStates</span>(<span style="color: #216f85;">curandState</span> *state) {
<span style="color: blue;">int</span> id = threadIdx.<span style="color: purple;">x</span> + blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span>;
<span style="color: green;">// Each thread gets same seed, a different sequence number, no offset</span>
<span style="color: #880000;">curand_init</span>(1234, id, 0, &state[id]);
}
<span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">useRandStates</span>(<span style="color: #216f85;">curandState</span> *state) {
<span style="color: blue;">int</span> id = threadIdx.<span style="color: purple;">x</span> + blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span>;
<span style="color: green;">// Copy state to local memory for efficiency </span>
<span style="color: #216f85;">curandState</span> localState = state[id];
<span style="color: green;">// Use localState to generate numbers....</span>
<span style="color: green;">// Copy state back to global memory </span>
state[id] = localState;
}
<span style="color: blue;">int</span> <span style="color: #880000;">main</span>() {
<span style="color: blue;">int</span> numThreads = 64;
<span style="color: blue;">int</span> numBlocks = 64;
<span style="color: green;">// Allocate space on the device to store the random states</span>
<span style="color: #216f85;">curandState</span> *d_randStates;
<span style="color: #880000;">cudaMalloc</span>(&d_randStates, numBlocks * numThreads * <span style="color: blue;">sizeof</span>(<span style="color: #216f85;">curandState</span>));
<span style="color: green;">// Setup and use the randStates</span>
<span style="color: #880000;">setupRandStates</span><<<numBlocks, numThreads>>>(d_randStates);
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> i = 0; i < NUM_ITERATIONS; ++i) {
<span style="color: #880000;">useRandStates</span><<<numBlocks, numThreads>>>(d_randStates);
}
<span style="color: blue;">return</span> 0;
}</pre>
<br />
This would be really nice, but there is one problem: storing all the states:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: #880000;">cudaMalloc</span>(&d_randStates, numBlocks * numThreads * <span style="color: blue;">sizeof</span>(<span style="color: #216f85;">curandState</span>));
</pre>
<br />
In our case, we'll be launching a thread for every pixel on the screen. aka, millions of threads. While curandState isn't that large, storing millions of them is not feasible. So what can we do instead? It turns out that curand_init() is only slow <a href="https://devtalk.nvidia.com/default/topic/492200/trying-to-understand-curand-curand_init-sequence-input-parameter/" target="_blank">if you use sequenceNum and offset</a>. This is quite intuitive, since using those requires the generator to skip ahead a large amount. So if we keep both sequenceNum and offset equal to zero, curand_init() is quite fast.<br />
<br />
In order to give each thread different random numbers we give them unique seeds. A simple method I came up with is to hash the frameNumber and then add the id of thread.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #216f85;">uint32</span> <span style="color: #880000;">WangHash</span>(<span style="color: #216f85;">uint32</span> a) {
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
<span style="color: blue;">return</span> a;
}
<span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">generateRandNumbers</span>(<span style="color: #216f85;">uint</span> hashedFrameNumber) {
<span style="color: green;">// Global threadId</span>
<span style="color: blue;">int</span> threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
<span style="color: green;">// Create random number generator</span>
curandState randState;
curand_init(hashedFrameNumber + threadId, 0, 0, &randState);
<span style="color: green;">// Use randState to generate numbers...</span>
}
<span style="color: blue;">int</span> <span style="color: #880000;">main</span>() {
<span style="color: #216f85;">uint</span> frameNumber = 0;
<span style="color: #216f85;">uint</span> width = 1024;
<span style="color: #216f85;">uint</span> height = 256;
<span style="color: #216f85;">dim3</span> Db = <span style="color: #216f85;">dim3</span>(16, 16); <span style="color: green;">// block dimensions are fixed to be 256 threads</span>
<span style="color: #216f85;">dim3</span> Dg = <span style="color: #216f85;">dim3</span>((width + Db.<span style="color: purple;">x</span> - 1) / Db.<span style="color: purple;">x</span>, (height + Db.<span style="color: purple;">y</span> - 1) / Db.<span style="color: purple;">y</span>);
<span style="color: blue;">for</span> (<span style="color: #216f85;">uint</span> i = 0; i < NUM_ITERATIONS; ++i) {
<span style="color: #880000;">generateRandNumbers</span><<<Dg, Db >>>(WangHash(frameNumber++));
}
<span style="color: blue;">return</span> 0;
}
</pre>
<br />
<br />
Yay! Now we can generate lots of random numbers! If we generate a random number on each thread and output it as a greyscale color, we can make some nice white noise.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/LBnBXo0.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/LBnBXo0.jpg" height="372" width="640" /></a></div>
<br />
<br />
<h3>
Accumulating and Averaging Colors</h3>
The last part of Monte Carlo Integration is the averaging of all the samples taken. The simplest solution is to just accumulate the colors from each frame, adding one frame to the next. Then, at the end, we divide the color at each pixel by the number of frames. I integrated this into my code by passing the frame number into the pixel shader that draws the texture to the screen.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: blue;">cbuffer</span> constants {
<span style="color: blue;">float</span> gInverseNumPasses;
};
<span style="color: blue;">Texture2D</span><<span style="color: blue;">float3</span>> gHDRInput : <span style="color: blue;">register</span>(t0);
<span style="color: blue;">float4</span> CopyCudaOutputToBackbufferPS(CalculatedTrianglePixelIn input) : SV_TARGET {
<span style="color: blue;">return</span> <span style="color: blue;">float4</span>(gHDRInput[input.positionClip.xy] * gInverseNumPasses, 1.0f);
}</pre>
<br />
This way, I can see the accumulated output as it's being generated. If we create a simple kernel that outputs either pure red, green, or blue, depending on a random number, we can test if the accumulation buffer is working.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">AccumulationBufferTest</span>(<span style="color: blue;">unsigned</span> <span style="color: blue;">char</span> *textureData, <span style="color: #216f85;">uint</span> width, <span style="color: #216f85;">uint</span> height, <span style="color: #216f85;">size_t</span> pitch, <span style="color: #216f85;">DeviceCamera</span> camera, <span style="color: #216f85;">uint</span> hashedFrameNumber) {
<span style="color: blue;">int</span> x = blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span> + threadIdx.<span style="color: purple;">x</span>;
<span style="color: blue;">int</span> y = blockIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">y</span> + threadIdx.<span style="color: purple;">y</span>;
<span style="color: blue;">if</span> (x >= width || y >= height) {
<span style="color: blue;">return</span>;
}
<span style="color: green;">// Global threadId</span>
<span style="color: blue;">int</span> threadId = (blockIdx.<span style="color: purple;">x</span> + blockIdx.<span style="color: purple;">y</span> * gridDim.<span style="color: purple;">x</span>) * (blockDim.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">y</span>) + (threadIdx.<span style="color: purple;">y</span> * blockDim.<span style="color: purple;">x</span>) + threadIdx.<span style="color: purple;">x</span>;
<span style="color: green;">// Create random number generator</span>
<span style="color: #216f85;">curandState</span> randState;
<span style="color: #880000;">curand_init</span>(hashedFrameNumber + threadId, 0, 0, &randState);
<span style="color: green;">// Generate a uniform random number</span>
<span style="color: blue;">float</span> randNum = <span style="color: #880000;">curand_uniform</span>(&randState);
<span style="color: green;">// Get a pointer to the pixel at (x,y)</span>
<span style="color: blue;">float</span> *pixel = (<span style="color: blue;">float</span> *)(textureData + y * pitch) + 4 <span style="color: green;">/*RGBA*/</span> * x;
<span style="color: blue;">if</span> (x < width && y < height) {
<span style="color: green;">// Write out pixel data</span>
<span style="color: blue;">if</span> (randNum < 0.33f) {
pixel[0] += 1.0f;
pixel[1] += 0.0f;
pixel[2] += 0.0f;
pixel[3] = 1.0f;
} <span style="color: blue;">else</span> <span style="color: blue;">if</span> (randNum < 0.66f) {
pixel[0] += 0.0f;
pixel[1] += 1.0f;
pixel[2] += 0.0f;
pixel[3] = 1.0f;
} <span style="color: blue;">else</span> {
pixel[0] += 0.0f;
pixel[1] += 0.0f;
pixel[2] += 1.0f;
pixel[3] = 1.0f;
}
}
}</pre>
<br />
After the first 60 frames, the image is quite noisy:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/WEy9lRy.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/WEy9lRy.jpg" height="374" width="640" /></a></div>
<br />
However, if we let it sit for a bit, the image converges to the a gray (0.33, 0.33, 0.33) as expected:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/FsOknVy.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/FsOknVy.jpg" height="374" width="640" /></a></div>
<br />
<br />
<h3>
Conclusion</h3>
Well, there we go! We can generate random numbers and average the results from several frames. The next post will cover ray-object intersections and maybe start in on path tracing itself. Stay tuned!<br />
<br />
The code for everything in this post is on <a href="https://github.com/RichieSams/rapt">GitHub</a>. It's open source under Apache license, so feel free to use it in your own projects.<br />
<br />
As always, feel free to ask questions, make comments, and if you find an error, please let me know.<br />
<br />
<br />
Happy coding!<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com3tag:blogger.com,1999:blog-4016790357096156934.post-541964319456978162015-03-05T20:35:00.000-06:002015-03-09T22:54:19.900-05:00Tracing Light in a Virtual WorldBoy it sure has been a while since I've written a blog post! Apologies for that.<br />
<br />
I'm still working on <a href="https://github.com/RichieSams/thehalflingproject" target="_blank">The Halfling Project</a>, with the newest project being Physically Based Rendering. I tried to implement environment maps in order to have specular reflections, but I got frustrated. So, I decided put The Halfling Project aside for a bit and try something new, specifically, Path Tracing.<br />
<br />
I had a basic idea of how path tracing worked:<br />
<ol>
<li>Fire a crap-load of rays around the scene from the eye.</li>
<li>At each bounce, use the material properties to accumulate attenuation</li>
<li>If the ray hits a light, add the light to the pixel, taking into account the attenuation</li>
</ol>
<div>
Sweet. Now give me 2 hours and I'll have the next Arnold Renderer, right?!? HA!<br />
<br /></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-wywxw-At-cc/VPk1LLep55I/AAAAAAAAAQs/TMipFpt73vE/s1600/37879091.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-wywxw-At-cc/VPk1LLep55I/AAAAAAAAAQs/TMipFpt73vE/s1600/37879091.jpg" height="200" width="200" /></a></div>
<br /></div>
<div>
<br /></div>
<div>
Ok. For real though. Googling pulled up lots of graphics papers and a few example projects. However, most, if not all of the graphics papers were advanced topics that expanded on path tracing, rather than explaining basic path tracing. (ie. Metropolis Light Transport, Half Vector Space Light Transport, Manifold Exploration Metropolis Light Transport, etc.) While they are really interesting (and some have example code), I felt they were way too much to try all at once. So, I was left with the examples I was able to find.<br />
<br />
<br /></div>
<h3>
Example Projects</h3>
<div>
The first is <a href="http://kevinbeason.com/smallpt/" target="_blank">smallpt</a>, or more specifically, the expanded <a href="https://docs.google.com/file/d/0B8g97JkuSSBwUENiWTJXeGtTOHFmSm51UC01YWtCZw/edit" target="_blank">code/presentation</a> done by Dr. David Cline at Oklahoma State University. Smallpt is true to its name, ie small. Therefore, as a learning tool, the reduced form is not very readable. However, Dr. Cline took the original code and expanded it out into a more readable form and created an excellent presentation going over each portion of the code. </div>
<div>
<br /></div>
<div>
The next example project I found was <a href="http://www.hxa.name/minilight/" target="_blank">Minilight</a>. It has excellent documentation and port to many different languages. It also has the algorithm overview in various levels of detail, which is really nice.</div>
<div>
<br /></div>
<div>
At this point, I realized that I had two choices. I could either implement the path tracer on the CPU (as in smallpt or Minilight), or on the GPU. Path tracing on the GPU is a bit of a recent advance, but it is possible and can work very <a href="http://render.otoy.com/" target="_blank">well</a>. So, being a bit of a masochist, and enjoying GPU programming thus far, I chose the GPU path.</div>
<div>
<br /></div>
<div>
The last extremely useful <a href="https://github.com/peterkutz/GPUPathTracer" target="_blank">example project</a> I found was a class project done by two students at The University of Pennsylvania, (Peter Kutz and Yining Karl Li), for a computer graphics class they took. The really great part, though, is that they kept a <a href="http://gpupathtracer.blogspot.com/" target="_blank">blog </a>and documented their progress and the hurdles they had to overcome. This is extremely useful, because I can see some of the decisions they made as they added on features. It also allowed me to create a series of milestones in creating a project using their progress as a model.</div>
<div>
<br /></div>
<div>
For example:</div>
<div>
<ol>
<li>Be able to cast a ray from the eye through a pixel and display a color representation of the rays.</li>
<li>Implement random number generation</li>
<li>Implement an accumulation buffer</li>
<li>Implement ray intersection tests</li>
<li>Implement basic path tracing with fixed number of bounces.</li>
<li>Implement Russian Routlette termination</li>
<li>Implement ray/thread compaction</li>
<li>Implement Specular / BRDFs</li>
<li>Etc.</li>
</ol>
<div>
<br /></div>
<h3>
Choosing a GPGPU API</h3>
<div>
With a basic plan set out, my last choice was in the GPGPU programming API. My choices were:</div>
<div>
<ul>
<li>DirectX / OpenGL Compute Shaders</li>
<li>CUDA</li>
<li>OpenCL</li>
</ul>
</div>
<div>
I did quite a bit of searching around, but I wasn't really able to find a clear winner. A few people believe that CUDA is a bit faster. However, a lot of the posts are old-ish, so I don't really know how they stack up against newer versions of hlsl/glsl Compute Shaders. I ended up choosing CUDA, but Compute Shaders or OpenCL could probably perform just as well. I chose CUDA mostly to learn something new. Also, many existing GPU path tracing examples happen to be in CUDA, so it easier to compare their code with mine if I choose CUDA.</div>
<div>
<br />
<br /></div>
</div>
<h3>
Off to the Realm of CUDA</h3>
<div>
First programs call for a "Hello World!" But how to do a hello world in a massively parallel environment? I mean, I guess we could write out "Hello World!" in every thread, but that's kind of boring, right? So let's store each letter used in one array, the offset to the letters in another array, and then calculate "Hello World!" in the threads. Now that's more like it!<br />
<br />
<br /></div>
We only store the necessary letters. "Hello World!" is stored as indices into the character array.<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: blue;">char</span> h_inputChars[10] = <span style="color: #a31515;">"Helo Wrd!"</span>;
<span style="color: #216f85;">uint</span> h_indexes[13] = {0, 1, 2, 2, 3, 4, 5, 3, 6, 2, 7, 8, 9};</pre>
<br />
Then the kernel calculates the output string using the thread index:<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; padding: 3px 8px;"><span style="color: #6f008a;">__global__</span> <span style="color: blue;">void</span> <span style="color: #880000;">helloWorldKernel</span>(<span style="color: blue;">char</span> *inputChars, <span style="color: #216f85;">uint</span> *indices, <span style="color: blue;">char</span> *output) {
<span style="color: #216f85;">uint</span> index = blockIdx.<span style="color: purple;">x</span> * blockDim.<span style="color: purple;">x</span> + threadIdx.<span style="color: purple;">x</span>;
output[index] = inputChars[indices[index]];
}</pre>
<br />
Yea, yea. Super inefficient. But hey, at least we're doing something.<br />
<br />
Ok, so now we know how to create the basic CUDA boilerplate code and do some calculations. The next step is figuring out how to talk between CUDA and DirectX so we can display something on the screen.<br />
<br />
<br />
<h3>
Bringing DirectX to the Party</h3>
Thankfully, the folks over at nVidia have made a library to do just that. They also have quite a few example projects to help explain the API. So, using one of the example projects as a guide and taking some boilerplate rendering code from The Halfling Project, I was able to create some cool pulsing plaid patterns:<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/Mwel5YIJBp4/0.jpg" frameborder="0" height="266" src="http://www.youtube.com/embed/Mwel5YIJBp4?feature=player_embedded" width="320"></iframe></div>
Untz! Untz! Untz! Untz! Huh? Oh.. this isn't the crazy new club?<br />
<br />
<br />
<h3>
Casting the First Rays</h3>
Ok, so now we can do some computation in CUDA, save it in an array, and DirectX will render the array as colors. On to rays!<br />
<br />
In path tracing, the first rays we shoot out are the ones that go from the eye through each pixel on the virtual screen.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-i2n6pgHH82o/VPknGpCkdhI/AAAAAAAAAP8/deUQ1DrnDps/s1600/shootingARay.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-i2n6pgHH82o/VPknGpCkdhI/AAAAAAAAAP8/deUQ1DrnDps/s1600/shootingARay.png" height="320" width="298" /></a></div>
<br />
<br />
In order to create the ray, we need to know the distances <i>a</i> and <i>b</i> in <b>world units</b>. Therefore, we need to convert pixel units into world units. To do this we need a define a camera. Let's define an example camera as follows:<br />
<div>
\[origin = \begin{bmatrix} 0 & 0 & 0 \end{bmatrix}\]<br />
\[coordinateSystem = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix}\]<br />
\[fov_{x} = 90^{\circ}\]<br />
\[fov_{y} = \frac{fov_{x}}{aspectRatio}\]<br />
\[nearClipPlaneDist = 1\]<br />
<br />
The field of view, or fov is an indirect way of specifying the ratio of pixel units to view units. Specifically, it is the viewing angle that is seen by the camera.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-bKbjgxpl3SQ/VPkx_xWv2GI/AAAAAAAAAQg/8TcZifxRiv0/s1600/fovEyePlane.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-bKbjgxpl3SQ/VPkx_xWv2GI/AAAAAAAAAQg/8TcZifxRiv0/s1600/fovEyePlane.png" height="320" width="311" /></a></div>
<br />
The higher the angle, the more of the scene is seen. But remember, changing the fov does not change the size of the screen, it merely squishes more or less of the scene into the same number of pixels.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-Ijhx2FAg2ZQ/VPkqzuwz2_I/AAAAAAAAAQI/rZjzpy3ccT8/s1600/fovExplanation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-Ijhx2FAg2ZQ/VPkqzuwz2_I/AAAAAAAAAQI/rZjzpy3ccT8/s1600/fovExplanation.png" height="201" width="640" /></a></div>
<br />
<br /></div>
Let's look at the triangle formed by fov<sub>x</sub> and the x-axis:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-a6aUdM9pwFI/VPkrii6awaI/AAAAAAAAAQQ/nRdMGKLyqSw/s1600/triangle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-a6aUdM9pwFI/VPkrii6awaI/AAAAAAAAAQQ/nRdMGKLyqSw/s1600/triangle.png" height="180" width="400" /></a></div>
<br />
<br />
We can use the definition of tangent to calculate the screenWidth in view units<br />
\[\tan \left (\theta \right) = \frac{opposite}{adjacent}\]<br />
\[screenWidth_{view}= 2 \: \cdot \: nearClipPlaneDist \: \cdot \: \tan \left (\frac{fov_{x}}{2} \right)\]<br />
<br />
Using that, we can calculate the view units of the pixel.<br />
\[x_{homogenous}= 2 \: \cdot \: \frac{x}{width} \: - \: 1\]<br />
\[x_{view} = nearClipPlaneDist \: \cdot \: x_{homogenous} \: \cdot \: \tan \left (\frac{fov_{x}}{2} \right)\]<br />
<br />
The last thing to do to get the ray is to transform from view space to world space. This boils down to a simple matrix transform. We negate y<sub>view</sub> because pixel coordinates go from the top left of the screen to the bottom right, but homogeneous coordinates go from (-1, -1) at the bottom left to (1, 1) at the top right.<br />
\[ray_{world}= \begin{bmatrix} x_{view} & -y_{view} & nearClipPlaneDist \end{bmatrix}\begin{bmatrix} & & \\ & cameraCoordinateSystem & \\ & & \end{bmatrix}\]<br />
<br />
The code for the whole process is below. (In the code, I assume the nearClipPlaneDist is 1, so it cancels out)<br />
<pre style="background: white; color: black; font-family: Consolas; font-size: 13; overflow-x: scroll; padding: 3px 8px; word-wrap: normal;"><span style="color: #6f008a;">__device__</span> <span style="color: #216f85;">float3</span> <span style="color: #880000;">CalculateRayDirectionFromPixel</span>(<span style="color: #216f85;">uint</span> x, <span style="color: #216f85;">uint</span> y, <span style="color: #216f85;">uint</span> width, <span style="color: #216f85;">uint</span> height, <span style="color: #216f85;">DeviceCamera</span> &camera) {
<span style="color: #216f85;">float3</span> viewVector = <span style="color: #880000;">make_float3</span>((((x + 0.5f) / width) * 2.0f - 1.0f) * camera.<span style="color: purple;">tanFovDiv2_X</span>,
-(((y + 0.5f) / height) * 2.0f - 1.0f) * camera.<span style="color: purple;">tanFovDiv2_Y</span>,
1.0f);
<span style="color: green;">// Matrix multiply</span>
<span style="color: blue;">return</span> <span style="color: #880000;">normalize</span>(<span style="color: #880000;">make_float3</span>(<span style="color: #880000;">dot</span>(viewVector, camera.<span style="color: purple;">x</span>),
<span style="color: #880000;">dot</span>(viewVector, camera.<span style="color: purple;">y</span>),
<span style="color: #880000;">dot</span>(viewVector, camera.<span style="color: purple;">z</span>)));
}</pre>
<br />
<br />
If we normalize the ray directions to renderable color ranges (add 1.0f and divide by 2.0f) and render out the resulting rays, we get a pretty gradient that varies from corner to corner.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/2vKgbEA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/2vKgbEA.png" height="374" width="640" /></a></div>
<br />
<br />
<br />
<br />
Well, as this post is getting pretty long, I feel like this is a nice stopping point. The next post will be a short one on generating random numbers on the GPU and implementing an accumulation buffer. I've already implemented the code, so the post should be pretty soon.<br />
<br />
After that, I will start implementing the object intersection algorithms, then the path tracing itself! Expect them in the coming week or so!<br />
<br />
The code for everything in this post is on <a href="https://github.com/RichieSams/rapt" target="_blank">GitHub</a>. It's open source under Apache license, so feel free to use it in your own projects.<br />
<br />
<br />
As always, feel free to ask questions, make comments, and if you find an error, please let me know.<br />
<br />
<br />
Happy coding!<br />
<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com6tag:blogger.com,1999:blog-4016790357096156934.post-58422458087460990642014-05-18T17:28:00.001-05:002014-05-18T17:28:05.977-05:00[HLSL] Turning float4's into a float4x4In one of my vertex shaders I need to turn a couple float4's into a float4x4. Specifically, I'm building a world matrix. (For those that are curious, it's instance data. The design I'm using is very similar to the one put forth by <a href="http://www.slideshare.net/DICEStudio/directx-11-rendering-in-battlefield-3" target="_blank">DICE on slide 29</a>)<br />
<br />
If the float4's are rows, then building a float4x4 is really easy:<br />
<br />
<pre class="brush:cpp">float4x4 CreateMatrixFromRows(float4 r0, float4 r1, float4 r2, float4 r3) {
return float4x4(r0, r1, r2, r3);
}
</pre>
<br />
float4x4 has a constructor that takes vectors as arguments. However, as you can see above, it assumes these are rows. I find this a bit odd since internally, DirectX, by default stores matrices as column-major. Therefore, behind the scenes, it will have to do lots of swizzle copy-swaps.<br />
<br />
If the float4's are columns, building the float4x4 becomes a bit more icky for our viewing, since we have to manually pick off each element and send it to the full float4x4 constructor. However, I suspect behind the scenes the compiler will know better.<br />
<br />
<pre class="brush:cpp">float4x4 CreateMatrixFromCols(float4 c0, float4 c1, float4 c2, float4 c3) {
return float4x4(c0.x, c1.x, c2.x, c3.x,
c0.y, c1.y, c2.y, c3.y,
c0.z, c1.z, c2.z, c3.z,
c0.w, c1.w, c2.w, c3.w);
}</pre>
<br />
<br />
I wasn't happy with just guessing what the compiler would and wouldn't do, so I created a simple HLSL vertex shader to see how many OPs each function produced. hlsli_util.hlsli contains the two functions defined above. (Yes, I know the position isn't being transformed to clip space. It's just a trivial shader)<br />
<br />
<pre class="brush:cpp">#include "hlsl_util.hlsli"
cbuffer cbPerObject : register(b1) {
uint gStartVector;
uint gNumVectorsPerInstance;
};
StructuredBuffer<float4> gInstanceBuffer : register(t0);
float4 main(float3 pos : POSITION, uint instanceId : SV_INSTANCEID) : SV_POSITION {
uint worldMatrixOffset = instanceId * gNumVectorsPerInstance + gStartVector;
float4 c0 = gInstanceBuffer[worldMatrixOffset];
float4 c1 = gInstanceBuffer[worldMatrixOffset + 1];
float4 c2 = gInstanceBuffer[worldMatrixOffset + 2];
float4 c3 = gInstanceBuffer[worldMatrixOffset + 3];
float4x4 instanceWorldCol = CreateMatrixFromCols(c0, c1, c2, c3);
//float4x4 instanceWorldRow = CreateMatrixFromRows(c0, c1, c2, c3);
return mul(float4(pos, 1.0f), instanceWorldCol);
}
</pre>
<br />
I compiled the shader as normal and then used the following command to disassemble the compiled byte code:<br />
<br />
<pre class="brush:text">fxc.exe /dumpbin /Fc <outputfile.txt> <compiledshader.cso>
</pre>
<br />
<br />
HUGE DISCLAIMER: This is the intermediate asm that fxc creates. The final number/form of OPs will depend on the final compile done by the graphics driver. However, I feel the intermediate asm will generally be <i>close-ish</i> to what is finally produced, and therefore, can be used as a rough gauge.<br />
<br />
<br />
Here is the asm code for creating the matrix from columns. I'll include the register signature for this one. The other asm code samples use the same register signature.<br />
<br />
<pre class="brush:cpp">// Resource Bindings:
//
// Name Type Format Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// gInstanceBuffer texture struct r/o 0 1
// cbPerObject cbuffer NA NA 1 1
//
//
// Input signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// POSITION 0 xyz 0 NONE float xyz
// SV_INSTANCEID 0 x 1 INSTID uint x
//
//
// Output signature:
//
// Name Index Mask Register SysValue Format Used
// -------------------- ----- ------ -------- -------- ------- ------
// SV_POSITION 0 xyzw 0 POS float xyzw
//
imad r0.x, v1.x, cb1[8].y, cb1[8].x
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r1.xyzw, r0.x, l(0), t0.xyzw
iadd r0.xyz, r0.xxxx, l(1, 2, 3, 0)
mov r2.xyz, v0.xyzx
mov r2.w, l(1.000000)
dp4 o0.x, r2.xyzw, r1.xyzw
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r1.xyzw, r0.x, l(0), t0.xyzw
dp4 o0.y, r2.xyzw, r1.xyzw
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r1.xyzw, r0.y, l(0), t0.xyzw
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r0.xyzw, r0.z, l(0), t0.xyzw
dp4 o0.w, r2.xyzw, r0.xyzw
dp4 o0.z, r2.xyzw, r1.xyzw
ret
// Approximately 13 instruction slots used
</pre>
<br />
Creating the matrix from columns was actually very clean. The compiler knew what we wanted and completely got rid of all the swizzles, and rather just directly copied each column and did a dot product to get the final position.<br />
<br />
Here is the asm for creating the matrix from rows:<br />
<br />
<pre class="brush:cpp">imad r0.x, v1.x, cb1[8].y, cb1[8].x
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r1.xyzw, r0.x, l(0), t0.xyzw
iadd r0.xyz, r0.xxxx, l(1, 2, 3, 0)
mov r2.x, r1.x
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r3.xyzw, r0.x, l(0), t0.xzyw
mov r2.y, r3.x
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r4.xyzw, r0.y, l(0), t0.xywz
ld_structured_indexable(structured_buffer, stride=16)(mixed,mixed,mixed,mixed) r0.xyzw, r0.z, l(0), t0.xyzw
mov r2.z, r4.x
mov r2.w, r0.x
mov r5.xyz, v0.xyzx
mov r5.w, l(1.000000)
dp4 o0.x, r5.xyzw, r2.xyzw
mov r2.y, r3.z
mov r2.z, r4.y
mov r2.w, r0.y
mov r2.x, r1.y
dp4 o0.y, r5.xyzw, r2.xyzw
mov r4.y, r3.w
mov r3.z, r4.w
mov r3.w, r0.z
mov r4.w, r0.w
mov r3.x, r1.z
mov r4.x, r1.w
dp4 o0.w, r5.xyzw, r4.xyzw
dp4 o0.z, r5.xyzw, r3.xyzw
ret
// Approximately 27 instruction slots used
</pre>
<br />
Wow! Look at all those mov OPs. So even though the HLSL constructor expects rows, giving it rows leads to huge number of mov's because the GPU uses column-major matrix representation.<br />
<br />
I also tried manually specifying the swizzles to see if that would help:<br />
<br />
<pre class="brush:cpp">float4x4 CreateMatrixFromRows(float4 r0, float4 r1, float4 r2, float4 r3) {
return float4x4(r0.x, r0.y, r0.z, r0.w,
r1.x, r1.y, r1.z, r1.w,
r2.x, r2.y, r2.z, r2.w,
r3.x, r3.y, r3.z, r3.w);
}
</pre>
<br />
However, the asm generated was identical to the constructor with 4 row vectors.<br />
<br />
So I guess the lesson learned here today is to always try to construct matrices in the way that they are stored. I want to mention that you can tell the compiler to use Row-major matrices, but Col-major matrices are generally favored because it can simplify some matrix math.<br />
<br />
My question to you all is: Are there better ways to do either of these two operations? As always, feel free to comment, ask questions, correct any errors I may have made, etc.<br />
<br />
Happy coding<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com2tag:blogger.com,1999:blog-4016790357096156934.post-10838880906422441452014-05-06T22:50:00.000-05:002014-05-06T22:50:57.741-05:00How to do you implement Geometry Instancing?So at some point in the graphics pipeline, you have a list of models that need to rendered. Normal scenario:<br />
<ol>
<li>Set the CBuffer variables</li>
<li>Call DrawIndexed</li>
</ol>
<div>
<br />
Simple. Ok next scenario is if the models are instanced. Yes I can use DrawIndexedInstanced, but my question is: What's the best way to send the instance data to the GPU?</div>
<br />
<h4>
So far, I can think of 3 ways: </h4>
<b>Option 1 </b>- Storing and using one Instance Buffer per model<br />
<br />
The render loop would then be something like this:
<br />
<br />
<pre class="brush:cpp">for (model in scene) {
if (model.hasInstances()) {
if (model.isDynamic()) {
model.UpdateInstanceBuffer(....);
}
model.DrawIndexedInstanced(....);
} else {
model.DrawIndexed(...);
}
}
</pre>
<br />
<b>Option 2 - </b>Using a single Instance buffer for the entire scene and updating it for each draw call<br />
<br />
The render loop would then be something like this:
<br />
<br />
<pre class="brush:cpp">for (model in scene) {
if (model.hasInstances()) {
UpdateInstanceBuffer(&model.InstanceData, ....);
model.DrawIndexedInstanced(....);
} else {
model.DrawIndexed(...);
}
}
</pre>
<br />
<b>Option 3</b> - Caching instances into a buffer for an entire batch (or if memory requirements aren't a problem, the whole frame). Directly inspired by <a href="http://www.slideshare.net/DICEStudio/directx-11-rendering-in-battlefield-3" target="_blank">Battlefield 3 - slide 30</a>.<br />
<br />
The render loop would then be something like this:<br />
<br />
<pre class="brush:cpp">std::vector<float4> instanceData;
std::vector<uint> offsets;
for (instancedModel in scene) {
offsets.push_back(instanceData.size());
for (float4 data in model.InstanceData) {
}
}
BindInstanceBufferAsVSResource(&instanceData);
uint instanceOffset = 0;
for (uint i = 0; i < scene.size(); ++i) {
UpdateVSCBuffer(offsets[i], ....);
model.DrawIndexedInstanced(....);
}
</uint></float4></pre>
<br />
<h4>
Pros and Cons: </h4>
<b>Option 1 - Individual InstanceBuffers per model</b><br />
Pros: <br />
<ol type="a">
<li>Static instancing is all cached, ie. you only have to map/update/unmap a buffer once.</li>
</ol>
Cons: <br />
<ol type="a">
<li>A ton of instance buffers. I may be over thinking things, but this seems like a lot of memory. Especially since all buffers are static size. So you either have to define exactly how many instances of an object can exist, or include some extra memory for wiggle room.</li>
</ol>
<b>Option 2 - Single InstanceBuffer for all models</b><br />
Pros: <br />
<ol type="a">
<li>Only one instance buffer. Potentially a much smaller memory footprint than Option 1. However, we need it to be as large as our largest number of instances.</li>
</ol>
Cons: <br />
<ol type="a">
<li>Requires a map/update/unmap for every model that needs to be instanced. I have no idea if this is expensive or not.</li>
</ol>
<b>Option 3 - CBuffer array with all the instances for a frame/batch</b><br />
Pros: <br />
<ol type="a">
<li>Much less map/update/unmap than Option 2</li>
<li>Can support multiple types of instance buffers (as long as they are multiples of float4)</li>
</ol>
Cons: <br />
<ol type="a">
<li>Static instances still need to be update every frame. </li>
<li>Indexes out of a cbuffer. (Can cause memory contention)</li>
</ol>
<div>
<br /></div>
<div>
So those are my thoughts. What are your thoughts? Would you choose any of the three options? Or is there a better option? Let me know if the comments below or on <a href="https://twitter.com/adastley" target="_blank">Twitter </a>or with a Pastebin/Gist. </div>
<div>
<br /></div>
<div>
Happy coding</div>
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-83021530911642643942014-05-05T17:39:00.003-05:002014-05-05T18:19:29.449-05:00Loading more interesting scenes - Part 2: The Halfling Model File Format Well, it's been quite a long time since my last post. School is in the last week and I've been quite busy, but you don't want to hear about that. You're here to see what I've been working on.<br />
<br />
In my last post I finished by showing how I loaded obj models directly into the engine. I also complained that it was taking a horrendously long time to load (especially for Debug builds). I looked around for faster ways to load obj's, but there really weren't any... (sort of*) Why aren't there any obj loader libraries?<br />
<div style="font-size: 10px; text-align: right;">
*There is assimp, but I'll get to that further down</div>
<br />
One answer would be that OBJs weren't designed for run-time model loading. Computers don't like parsing text. They would rather read binary; things have set sizes and can be read in chunks rather than single characters at a time. So next I looked around for a binary file format that would be faster to load. "Why re-invent the wheel" I thought?<br />
<br />
The problem is that standardized run-time binary file formats don't really exist either. This when it really dawned on me. For run-time, there's no point in storing things your engine doesn't need. And more than that, it would be great if the data you store is in the correct format for your engine. For example, you could store the raw vertex buffer data so you can directly cast it into DirectX vertex buffer data. Obviously, it would be extremely hard to get people to agree upon a set standard of what is "necessary", so it's common practice to have a specific binary file format for the engine that is specifically tailored to make loading the data as fast and easy as possible.<br />
<br />
Therefore, I set out to make my own binary file format, which, to stay with the Halfling theme, I dubbed the 'Halfling Model File'. Every indent represents a member variable of the level above it. 'String data' and 'Subset data' are arrays. (The format of the blog template makes the following table a bit hard to read. There is an ASCII version of the table<a href="https://raw.githubusercontent.com/RichieSams/thehalflingproject/7815e58dbe6c393b2a45d1cdc200d65beb435c6b/documentation/Halfling%20Model%20File%20format.txt" target="_blank"> here</a> if that's easier to read)<br />
<br />
<style id="Halfling Model File format_3009_Styles">
<!--table
{mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";}
.xl633009
{padding-top:1px;
padding-right:4px;
padding-left:4px;
border: 1px solid #888888;
mso-ignore:padding;
font-size:12.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:"Times New Roman", serif;
mso-font-charset:0;
mso-number-format:General;
text-align:general;
vertical-align:bottom;
mso-background-source:auto;
mso-pattern:auto;
white-space:nowrap;}
.xl643009
{padding-top:1px;
padding-right:4px;
padding-left:4px;
border: 1px solid #888888;
mso-ignore:padding;
font-size:16.0pt;
font-weight:700;
font-style:normal;
text-decoration:none;
font-family:"Times New Roman", serif;
mso-font-charset:0;
mso-number-format:General;
text-align:general;
vertical-align:bottom;
mso-background-source:auto;
mso-pattern:auto;
white-space:nowrap;}
.xl653009
{padding-top:1px;
padding-right:4px;
padding-left:4px;
border: 1px solid #888888;
mso-ignore:padding;
font-size:12.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:"Times New Roman", serif;
mso-font-charset:0;
mso-number-format:General;
text-align:center;
vertical-align:bottom;
mso-background-source:auto;
mso-pattern:auto;
white-space:nowrap;}
.xl663009
{padding-top:1px;
padding-right:4px;
padding-left:4px;
border: 1px solid #888888;
mso-ignore:padding;
font-size:12.0pt;
font-weight:700;
font-style:normal;
text-decoration:none;
font-family:"Times New Roman", serif;
mso-font-charset:0;
mso-number-format:General;
text-align:general;
vertical-align:bottom;
border-top:none;
border-right:none;
border-bottom:1.0pt solid windowtext;
border-left:none;
mso-background-source:auto;
mso-pattern:auto;
white-space:nowrap;}
.xl673009
{padding-top:1px;
padding-right:4px;
padding-left:4px;
border: 1px solid #888888;
mso-ignore:padding;
font-size:12.0pt;
font-weight:700;
font-style:normal;
text-decoration:none;
font-family:"Times New Roman", serif;
mso-font-charset:0;
mso-number-format:General;
text-align:left;
vertical-align:bottom;
border-top:none;
border-right:none;
border-bottom:1.0pt solid windowtext;
border-left:none;
mso-background-source:auto;
mso-pattern:auto;
white-space:nowrap;}
.xl683009
{padding-top:1px;
padding-right:4px;
padding-left:4px;
border: 1px solid #888888;
mso-ignore:padding;
font-size:12.0pt;
font-weight:400;
font-style:normal;
text-decoration:none;
font-family:"Times New Roman", serif;
mso-font-charset:0;
mso-number-format:General;
text-align:left;
vertical-align:bottom;
mso-background-source:auto;
mso-pattern:auto;
white-space:nowrap;}
-->
</style>
<table border="0" cellpadding="0" cellspacing="0" class="xl633009" style="background-color: #464646; border-collapse: collapse; table-layout: fixed; width: 1159px;">
<colgroup><col class="xl633009" style="mso-width-alt: 8667; mso-width-source: userset; width: 178pt;" width="237"></col>
<col class="xl653009" style="mso-width-alt: 6509; mso-width-source: userset; width: 134pt;" width="178"></col>
<col class="xl653009" style="mso-width-alt: 2633; mso-width-source: userset; width: 54pt;" width="72"></col>
<col class="xl633009" style="mso-width-alt: 24576; mso-width-source: userset; width: 504pt;" width="672"></col>
</colgroup><tbody>
<tr height="22" style="height: 16.5pt;">
<td class="xl663009" height="22" style="height: 16.5pt;">Item</td>
<td class="xl673009">Type</td>
<td class="xl673009">Required</td>
<td class="xl663009">Description</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">File Id</td>
<td class="xl653009">'\0FMH'</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Little-endian "HMF\0"</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">File format version</td>
<td class="xl653009">byte</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Version of the HMF format that this file uses</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Flags</td>
<td class="xl653009">uint64</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Bitwise-OR of flags used in the file. See the flags below</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"></td>
<td class="xl653009"></td>
<td class="xl653009"></td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">String Table</td>
<td class="xl653009"></td>
<td class="xl653009" style="background: red; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">F</td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Num strings</td>
<td class="xl653009">uint32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of strings in the table</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>String data</td>
<td class="xl653009"></td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>String length</td>
<td class="xl653009">uint16</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Length of the string</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>String</td>
<td class="xl653009">char[]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The string characters. DOES NOT HAVE A NULL TERMINATION</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"></td>
<td class="xl653009"></td>
<td class="xl653009"></td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Num Vertices</td>
<td class="xl653009">uint32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of vertices in the file</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Num Indices</td>
<td class="xl653009">uint32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of indices in the file</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"></td>
<td class="xl653009"></td>
<td class="xl653009"></td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">NumVertexElements</td>
<td class="xl653009">uint16</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of elements in the vertex description</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"></td>
<td class="xl653009"></td>
<td class="xl653009"></td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Vertex Buffer Desc</td>
<td class="xl653009">D3D11_BUFFER_DESC</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">A hard cast of the vertex buffer description</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Index Buffer Desc</td>
<td class="xl653009">D3D11_BUFFER_DESC</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">A hard cast of the index buffer description<span style="mso-spacerun: yes;"> </span></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Instance Buffer Desc</td>
<td class="xl653009">D3D11_BUFFER_DESC</td>
<td class="xl653009" style="background: red; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">F</td>
<td class="xl633009">A hard cast of the instance buffer description<span style="mso-spacerun: yes;"> </span></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"></td>
<td class="xl653009"></td>
<td class="xl653009"></td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Vertex data</td>
<td class="xl653009">void[]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Will be read in a single block using
VertexBufferDesc.ByteWidth</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Index data</td>
<td class="xl653009">void[]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Will be read in a single block using
IndexBufferDesc.ByteWidth</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Instance buffer data</td>
<td class="xl653009">void[]</td>
<td class="xl653009" style="background: red; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">F</td>
<td class="xl633009">Will be read in a single block using
InstanceBufferDesc.ByteWidth</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"></td>
<td class="xl653009"></td>
<td class="xl653009"></td>
<td class="xl633009"></td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Num Subsets</td>
<td class="xl653009">uint32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of subsets in the file</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;">Subset data</td>
<td class="xl653009">Subset[]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">Will read in a single block to a Subset[]</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Vertex Start</td>
<td class="xl653009">uint64</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The index to the first vertex used by the subset</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Vertex Count</td>
<td class="xl653009">uint64</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of vertices used by the subset (All used
vertices must be in the range VertexStart + VertexCount)</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Index Start</td>
<td class="xl653009">uint64</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The index to the first index used by the subset</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Index Count</td>
<td class="xl653009">uint64</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The number of indices used by the subset (All used indices
must be in the range IndexStart + IndexCount)</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Material Ambient Color</td>
<td class="xl653009">float[3]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The RGB ambient color values of the material</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Material Specular Intensity</td>
<td class="xl653009">float</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The Specular Intensity</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Material Diffuse Color</td>
<td class="xl653009">float[4]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The RGBA diffuse color values of the material</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Material Specular Color</td>
<td class="xl653009">float[3]</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The RGB specular color values of the material</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Material Specular Power</td>
<td class="xl653009">float</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">The Specular Power</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Diffuse Color Map Filename</td>
<td class="xl653009">int32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">An index to the string table. -1 if it doesn't exist.</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Specular Color Map Filename</td>
<td class="xl653009">int32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">An index to the string table. -1 if it doesn't exist.</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Specular Power Map Filename</td>
<td class="xl653009">int32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">An index to the string table. -1 if it doesn't exist.</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Alpha Map Filename</td>
<td class="xl653009">int32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">An index to the string table. -1 if it doesn't exist.</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Bump Map Filename</td>
<td class="xl653009">int32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">An index to the string table. -1 if it doesn't exist.
Mutually exclusive with Normal Map</td>
</tr>
<tr height="21" style="height: 15.75pt;">
<td class="xl633009" height="21" style="height: 15.75pt;"><span style="mso-spacerun: yes;"> </span>Normal Map Filename</td>
<td class="xl653009">int32</td>
<td class="xl653009" style="background: #00B050; color: black; font-family: "Times New Roman"; font-size: 12.0pt; font-weight: 400; mso-pattern: black none; text-decoration: none; text-line-through: none; text-underline-style: none;">T</td>
<td class="xl633009">An index to the string table. -1 if it doesn't exist.
Mutually exclusive with Bump Map</td>
</tr>
<tr height="0" style="display: none;">
<td style="width: 178pt;" width="237"></td>
<td style="width: 134pt;" width="178"></td>
<td style="width: 54pt;" width="72"></td>
<td style="width: 504pt;" width="672"></td>
</tr>
</tbody></table>
<br />
<br />
I designed the file format to make it as easy as possible to cast large chunks of memory directly from hard disk to arrays or usable engine structures. For example, the subset data is read in one giant chunk and cast directly to an array.<br />
<br />
There's only one problem: Binary is not really human-readable. It would be extremely arduous to create a HMF file manually, so I created a tool to automate the task. While my hand-written obj-parser fulfilled its purpose, it's was pretty bare-bones and made quite a few assumptions. Rather than spend the time to beef it up to what was necessary, I leveraged the wonderful tool <a href="http://assimp.sourceforge.net/" target="_blank">ASSIMP</a>. ASSIMP is a C++ library for loading arbitrary model file formats into a standard internal representation. It also has a number of algorithms for optimizing the model data. For example, calculating normals, triangulating meshes, or removing duplicate vertices. Therefore, I use ASSIMP to load and optimize the model, then I output ASSIMP's mesh data to the HMF format. The source code is a bit too long to directly post here, so instead I'll link you to <a href="https://github.com/RichieSams/thehalflingproject/blob/76194d57fe9d5521e206849de41591c2d8b62df8/source/obj_hfm_converter/hmf_converter.cpp" target="_blank">it on GitHub</a>. I'll also point you to a pre-compiled binary of the tool <a href="https://github.com/RichieSams/thehalflingproject/tree/7815e58dbe6c393b2a45d1cdc200d65beb435c6b/precompiled_binaries/hmf_converter" target="_blank">here</a>.<br />
<br />
As I was writing the the code for the tool, it became apparent that I needed a way for the user to tell the tool certain parameters about the mode. For example, what textures do you want to use? I could have passed these in with command line arguments, but that's not very readable. Therefore, I put all the possible arguments into an ini file and then have the user pass the path to the ini file in as a command line arg. Below is the ini file for the sponza.obj model:<br />
<br />
<pre><span style="color: #587ac5;">[Post-Processing]</span>
<span style="color: #88b788;">; If normals already exist, setting GenNormals to true will do nothing</span>
GenNormals = true
<span style="color: #88b788;">; If tangents already exist, setting GenNormals to true will do nothing</span>
CalcTangents = true
<span style="color: #88b788;">; The booleans represent a high level override for these material properties.
; If the boolean is false, the property will be set to NULL, even if the property
; exists within the input model file
; If the boolean is true, but the value doesn't exist within the input model file,
; the property will be set to NULL</span>
<span style="color: #587ac5;">[MaterialPropertyOverrides]</span>
AmbientColor = true
DiffuseColor = true
SpecColor = true
Opacity = true
SpecPower = true
SpecIntensity = true
<span style="color: #88b788;">; The booleans represent a high level override for these textures.
; If the boolean is false, the texture will be excluded, even if the texture
; exists within the input model file
; If the boolean is true, but the texture doesn't exist within the input model
; file properties, the texture will still be excluded</span>
<span style="color: #587ac5;">[TextureOverrides]</span>
DiffuseColorMap = true
NormalMap = true
DisplacementMap = true
AlphaMap = true
SpecColorMap = true
SpecPowerMap = true
<span style="color: #88b788;">; Usages can be 'default', 'immutable', 'dynamic', or 'staging'
; In the case of a mis-spelling, immutable is assumed</span>
<span style="color: #587ac5;">[BufferDesc]</span>
VertexBufferUsage = immutable
IndexBufferUsage = immutable
<span style="color: #88b788;">; TextureMapRedirects allow you to interpret certain textures as other kinds
; For example, OBJ doesn't directly support normal maps. Often, you will then see
; the normal map in the height (bump) map slot. These options allow you to specify
; what texture goes where.
;
; Any Maps that are excluded are treated as mapping to their own kind
; IE. excluding DiffuseColorMap is interpreted as:
; DiffuseColorMap = diffuse
;
; The available kinds are: 'diffuse', 'normal', 'height', 'displacement', 'alpha',
; 'specColor', and 'specPower'</span>
<span style="color: #587ac5;">[TextureMapRedirects]</span>
DiffuseColorMap = diffuse
NormalMap = height
DisplacementMap = displacement
AlphaMap = alpha
SpecColorMap = specColor
SpecPowerMap = specPower
</pre>
<br />
So with that we now have a fully functioning binary file format! And more than that, with a few changes in the engine code, we can load the scene cold in less than 2 seconds! (It's almost instant if your file cache is still hot). (Pre-compiled binaries <a href="https://github.com/RichieSams/thehalflingproject/tree/7815e58dbe6c393b2a45d1cdc200d65beb435c6b/precompiled_binaries/obj_loader_demo" target="_blank">here</a>).<br />
<br />
Well that's it for now. As always, feel free to ask questions and comment.<br />
<br />
Happy coding<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-19962679295003764662014-03-14T21:25:00.000-05:002014-03-14T21:25:41.918-05:00Loading more interesting scenes - Part 1When I finished the <a href="http://richiesams.blogspot.com/2014/03/introducing-halfling-project.html#deferredshadingdemo" target="_blank">Deferred Shading Demo</a>, I started looking at how Forward and Deferred differed in terms of frame-time and frame quality. I couldn't see any noticable differences in frame quality, which is good. But, forward shading was upwards of 2ms cheaper than deferred (depending on the camera position)!! I thought deferred shading was supposed to be better than forward?!?!?<br />
<br />
At first I though my implementation was wrong, or that there was a bug in my code. But then it slowly dawned on me what the problem was. If you recall, the whole point to deferred is reducing the number of pixels that are shaded. A large majority of this is not shading pixels that are occluded (they fail the z-test). However, with my simple geometry, there are very few camera positions in which ANY geometry is occluded. Thus deferred shading just adds overhead.<br />
<br />
So with this in mind I started looking for more complex scenes to test my code against. After a bit of searching I found Morgan McGuire's amazing <a href="http://graphics.cs.williams.edu/data/about.xml" target="_blank">Computer Graphics Data</a> database. He has a good 20 some-odd scenes that he's personally maintained. (As most of them are no longer even available in their original form). Huge props to him and any others involved in the project.<br />
<br />
Anyway, I downloaded the popular Crytek Sponza Scene in obj form. Awesome. Now what? Well, now I needed to load obj in Vertex and Index buffers. I looked around to see if there was any library to do it for me (why re-invent the wheel?), but I only found a smattering of thrown together code. Well and <a href="http://assimp.sourceforge.net/" target="_blank">assimp</a>. But assimp seemed a bit large for a temporary obj loader. More on that later. So with that said, I used the code <a href="http://www.braynzarsoft.net/index.php?p=D3D11OBJMODEL" target="_blank">here</a> as a starting point, and created my own obj loader.<br />
<br />
First off, obj files are HARD to parse. Mostly because they're so flexible. And being a text-based format, parsing text is just not fun. Ok, so maybe they're not HARD, but they're not easy either. The first major roadblock is that obj allows you to specify all the parts of a vertex separately.<br />
<br />
<h4>
Separate vertex definitions</h4>
For example:<br />
<br />
<pre class="brush:text">v 476.1832 128.5883 224.4587
vn -0.9176 -0.3941 -0.0529
vt 0.1674 0.8760 0.0000
</pre>
<br />
The 'v' represents the vertex position, the 'vn' represents the vertex normal, and the 'vt' represents the vertex texture coordinates. Indices can then choose whichever grouping of position, normal, and texture coordinates they need. Like this:<br />
<br />
<pre class="brush:text">f 140/45/140 139/18/139 1740/17/1852
</pre>
<br />
This is especially handy if large portions of your scene have the same surface normals, like square buildings. Then you only have to store a single 'vn' for all the vertices sharing the same normal.<br />
<br />
HOWEVER, while this is great for storage, DirectX expects a vertex to be a singular unit, AKA, position, normal, AND texture coordinate all together. (Yes, you <i>can </i>store them in separate Vertex Buffers, but then you run into cache misses and data incoherence) I chose to work around it like this:<br />
<br />
I use these data structures to hold the data:<br />
<pre class="brush:cpp">std::vector<Vertex> vertices;
std::vector<uint> indices;
</pre>
<pre class="brush:cpp">typedef std::tuple<uint, uint, uint> TupleUInt3;
std::unordered_map<TupleUInt3, uint> vertexMap;
std::vector<DirectX::XMFLOAT3> vertPos;
std::vector<DirectX::XMFLOAT3> vertNorm;
std::vector<DirectX::XMFLOAT2> vertTexCoord;
</pre>
<br />
<br />
<ul>
<li>When reading vertex data ('v', 'vn', 'vt'), the data is read into its corresponding vector. </li>
<li>Then, when reading indices, the code creates a true Vertex and adds it to vertices. I use the unordered_map to check if the vertex already exists before creating a new one:</li>
</ul>
<br />
<pre class="brush:cpp">TupleUInt3 vertexTuple{posIndex, texCoordIndex, normalIndex};
auto iter = vertexMap.find(vertexTuple);
if (iter != vertexMap.end()) {
// We found a match
indices.push_back(iter->second);
} else {
// No match. Make a new one
uint index = meshData->Vertices.size();
vertexMap[vertexTuple] = index;
DirectX::XMFLOAT3 position = posIndex == 0 ? DirectX::XMFLOAT3(0.0f, 0.0f, 0.0f) : vertPos[posIndex - 1];
DirectX::XMFLOAT3 normal = normalIndex == 0 ? DirectX::XMFLOAT3(0.0f, 0.0f, 0.0f) : vertNorm[normalIndex - 1];
DirectX::XMFLOAT2 texCoord = texCoordIndex == 0 ? DirectX::XMFLOAT2(0.0f, 0.0f) : vertTexCoord[texCoordIndex - 1];
vertices.push_back(Vertex(position, normal, texCoord));
indices.push_back(index);
}
</pre>
<br />
Success! On to the next roadblock!<br />
<br />
<h4>
N-gons</h4>
Obj supports all polygons; you just add more indices to the face definition:<br />
<br />
<pre class="brush:text">f 140/45/140 139/18/139 1740/17/1852 1784/25/429 1741/35/141
</pre>
<br />
Again, this is extremely handy for reducing storage space. For example, if two triangles are are co-planar, you can combine them into a quad, etc. HOWEVER, DirectX only supports triangles. Therefore, we have to triangulate any faces that have more than 3 vertices. Triangulation can be quite complicated, depending on what assumptions you choose to make. However, I chose to assume that all polygons are convex, which makes life significantly easier. Following the algorithm in <a href="http://www.braynzarsoft.net/index.php?p=D3D11OBJMODEL" target="_blank">Braynzar Soft's code</a>, you can triangulate by making triangles with the first vertex, the next vertex and the previous vertex. For example, let's choose this pentagon:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-uYwKEb5hIls/UyOZHqdr9VI/AAAAAAAAALE/wiPQ6pWgZtA/s1600/Triangulate0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-uYwKEb5hIls/UyOZHqdr9VI/AAAAAAAAALE/wiPQ6pWgZtA/s1600/Triangulate0.png" height="320" width="320" /></a></div>
We would then form triangles like so:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-Gf_FFVRsaXU/UyOaTSVBccI/AAAAAAAAALM/R7n2wka9JvM/s1600/Triangulate1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-Gf_FFVRsaXU/UyOaTSVBccI/AAAAAAAAALM/R7n2wka9JvM/s1600/Triangulate1.png" height="320" width="320" /></a></div>
So the triangles are:<br />
<br />
<ul>
<li>0 1 2</li>
<li>0 2 3</li>
<li>0 3 4</li>
</ul>
<br />
The code can be found <a href="https://github.com/RichieSams/thehalflingproject/blob/e29087f74e251c0964b4b9923fa7857d306f4239/source/common/geometry_generator.cpp#L448" target="_blank">here</a>. One note before I move on: This way of triangulating is definitely not optimal for high N-gons; it will create long skinny triangles, which is bad for rasterizers. However, it serves its purpose for now, so it will stay.<br />
<br />
<h4>
Normals</h4>
It's perfectly legal for an face in obj to not use normals:<br />
<br />
<pre class="brush:text">f 1270/3828 1261/3831 1245/3829
</pre>
<br />
Similarly, you can have a face that doesn't use texture coordinates:<br />
<br />
<pre class="brush:text">f -486096//-489779 -482906//-486570 -482907//-486571
</pre>
<br />
(You'll also notice that you can use negative indices, which correspond to the index (1 - current number of vertices). But that's an easy thing to work around). The problem is normals. My shader code assumes that a vertex has a valid normal. If it doesn't, the default initialization to (0.0f, 0.0f, 0.0f) makes the whole object black. Granted, I could add some checks in the shader, where if the normal is all zero, just use the material color, but this just adds dynamic branching and in reality,<i> there shouldn't be any faces that don't have normals.</i><br />
<br />
So the first thing I tried is 'manually' calculating the vertex normals using <a href="http://www.lighthouse3d.com/opengl/terrain/index.php3?normals" target="_blank">this</a> approach. The approach uses the cross product of two sides of a triangle to get the face normal, then averages all the face normals for faces sharing the same vertex. Simple, but it takes FOREVER. The first time I tried it, it ran for 10 minutes.... Granted, it is O(N<sub>1</sub><sup>2</sup> + N<sub>2</sub>), where N<sub>1</sub> is the number of vertices and N<sub>2</sub> is the number of faces. The Sponza scene has 184,330 triangles and 262,267 faces. Therefore, I resolved to do the normal calculations once, and then re-create the obj with those normals. I'll get to that in a bit.<br />
<br />
<h4>
Vectors</h4>
After creating the basic obj loader I did some crude profiling and found some interesting behavior. When compiled for "Release", the obj loader ran <b>1 - 2 magnitudes</b> of time faster. After much searching, I found out that in "Release", the VC++ compiler turns off a bunch of run-time checks on vectors. These checks are really good, in that they give improved iterator checks and various useful debug checks. However, they're really really slow. You <i>can</i> turn them off with compiler preprocessor defines, but I wouldn't. But just something to be aware of.<br />
<br />
<br />
So that's obj's. With that all done, I can now load interesting models! Yay!!<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/AX8XR78.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/AX8XR78.jpg" height="372" width="640" /></a></div>
<br />
But even in "Release", the scene still takes ~4 seconds to load on my beefy computer. Hmmm.... Well the first thing I did was to put the obj parsing into a separate thread so the main window was still interactive.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/4hOel1K.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/4hOel1K.png" height="372" width="640" /></a></div>
<br />
I also sleep the main thread in 50ms intervals to give the background thread as many cycles as it can. I need to do some further testing to see if sleeping the main thread affects child threads. This is using std::thread. I wouldn't think it would, but it doesn't hurt to test. Let me know your thoughts.<br />
<br />
Well, that's it for now. I'll cover some of the specifics of what changed in the renderer from Deferred Shading Demo to Obj Loader Demo in the next post, but this post is getting to be a bit long. As always, feel free to comment or leave suggestions.<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-39323910793072359552014-03-10T19:07:00.003-05:002014-03-14T17:51:22.477-05:00Introducing the Halfling ProjectHello everyone!<br />
<br />
It's been entirely too long since I've posted about what I've been working on. Granted, I did make a post a couple weeks ago about Git, but that was mostly for my class. So here goes!<br />
<br />
We last left off with me wrapping up GSoC with ScummVM. I have since joined the ScummVM dev team (Yay!) and my current progress on the ZVision engine was merged into the master branch. Unfortunately, due to school keeping me quite busy and another project, I haven't had much time to work more on the engine. That said, it's not abandoned! I'm planning on working more on it after I graduate in August. <br />
<br />
I have always been quite fascinated by computer graphics, especially in the algorithms that make real-time graphics possible. Wanting to get into the field, I started teaching myself DirectX 11 last December using Frank Luna's wonderful book, <a href="http://www.amazon.com/Introduction-3D-Game-Programming-DirectX/dp/1936420228" target="_blank">An Introduction to 3D Game Programming with DirectX 11</a>. However, rather than just using his base code, I chose to create my own rendering framework, and thus <a href="https://github.com/RichieSams/thehalflingproject" target="_blank">The Halfling Project</a> was born.<br />
<br />
"Why re-invent the wheel?", you ask? Because it forces me to fully understand the graphics concepts, rather than just copy-pasting cookie-cutter code. Also, no matter how recent a tutorial is, there is bound to be some code that is out of date. For example, Frank Luna's code uses .fx files and the D3DX library. Effect files <i>can</i> still be used, but Microsoft discourages it. And the D3DX library doesn't exist anymore. Granted it has a replacement (DirectXMath), but it has a slightly different API. Thus, even if I were to 'copy-paste', I would still have to change the code to fit the new standards.<br />
<br />
That said, I didn't come up with everything from scratch. The Halfling Project is heavily influenced by Luna's code, <a href="http://mynameismjp.wordpress.com/" target="_blank">MJP's sample framework</a>, and Glenn Fiedler's <a href="http://glenn%20fiedler/" target="_blank">blog posts</a>. Overall, The Halfling Project is just a collection of demos that happen to use the same base framework. So, with that in mind, let me describe some of the demos and what I plan for the future.<br />
<br />
(If you would like to try out the demos for yourself, there are compiled binaries in my <a href="https://github.com/RichieSams/thehalflingproject/archive/master.zip" target="_blank">Git repo</a>. You will need a DirectX11 capable graphics card or integrated graphics and will need to install the VS C++ 120 redistributable, which is included with the demos.)<br />
<br />
<br />
<h3 id="cratedemo">
Crate Demo:</h3>
<div style="text-align: center;">
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/zEnsaFz.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/zEnsaFz.png" height="497" width="640" /></a></div>
<br /></div>
My "Hello World" of DirectX 11! Ha ha! So much code for a colored box.... I can't tell you how happy I was when it worked though!<br />
<br />
Me: "Look! Look what I made!"<br />
My roommate: "What? It's a box."<br />
Me: "But.... it was hard..." <br />
<br />
I guess he had a point though. On to more interesting things!<br />
<br />
<br />
<h3 id="wavesimulationdemo">
Wave Simulation Demo:</h3>
So the next thing to change was to make the geometry a bit more interesting. I borrowed a wave simulation algorithm from Frank Luna's code and created this demo. Each update, it applies the wave equation to each vertex and updates the Vertex Buffer.<br />
<div style="text-align: center;">
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/hj5ZG3W3Nhg?feature=player_embedded' frameborder='0'></iframe><br />
<br /></div>
<br />
<h3 id="lightingdemo">
Lighting Demo:</h3>
So now we had some interesting geometry, now it was time for some lights! Well, one light...<br />
<div style="text-align: center;">
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/OFBm6Is.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/OFBm6Is.png" height="500" width="640" /></a></div>
<br /></div>
I actually didn't use the wave simulation geometry because it required a dynamic vertex buffer. (Yes I know you could do it with a static buffer and transformations, but baby steps) Instead, I borrowed another function from Frank Luna's code that used sin/cos to create hills. The lighting is a forward renderer using Lambert diffuse lighting and Blinn-Phong specular lighting. Rather than bore you with my own re-hash of what's already written, I will point you to <a href="http://lmgtfy.com/?q=lambert+and+blinn-phong+lighting" target="_blank">Google</a>.<br />
<br />
<br />
<h3 id="deferredshadingdemo">
Deferred Shading Demo:</h3>
This is where I diverged from Frank Luna's book and started off on my own. I like to read graphics white papers and talks on my bus ride to and from school. One that I really liked was Andrew Lauritzen's <a href="http://visual-computing.intel-research.net/art/publications/deferred_rendering/" target="_blank">talk about Tiled Shading</a>. In my head, deferred shading was the next logical step after traditional forward shading, so I launched in, skipping right to tiled deferred shading. However, it wasn't long before I was in way over my head. I guess I should have seen that coming, but hind-sight is 20-20. Therefore I resolved to first implement naïve deferred shading, and THEN think about tiled (and perhaps <a href="http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&id=clustered_shading" target="_blank">clustered</a>).<br />
<br />
<h4>
So how is deferred shading different than forward shading? </h4>
Traditional Forward:<br />
<ol>
<li>The application submits all the triangles it wants rendered to the GPU.</li>
<li>The hardware rasterizer turns the triangles into pixels and sends them off to the pixel shader</li>
<li>The pixel shader applies any lighting equations you have</li>
<ul>
<li>Assuming no light culling, this means the lighting equation is invoked <br />((# pixels from submitted triangles) x (# lights)) times</li>
</ul>
<li>The output merger rejects pixels that fail the depth test and does pixel blending if blending is enabled</li>
</ol>
<div>
<br />
Traditional Deferred:</div>
<ul>
<li>GBuffer Pass:</li>
</ul>
<ol><ol>
<li>The application submits all the triangles it wants rendered to the GPU.</li>
<li>The hardware rasterizer turns the triangles into pixels and sends them off to the pixel shader</li>
<li>The pixel shader stores the pixel data in a series of texture buffers called Geometry Buffers or GBuffers for short</li>
<ul>
<li>GBuffer contents vary by implementation, mostly depending on your lighting equation in the second pass</li>
<li>Common data is World Position, Surface Normal, Diffuse Color, Specular Color, and Specular Power</li>
</ul>
<li>The output merger rejects pixels that fail the depth test. Blending is NOT allowed.</li>
</ol>
</ol>
<ul>
<li>Lighting Pass:</li>
<ol>
<li>The application renders a fullscreen quad, guaranteeing a pixel shader thread for every pixel on the screen</li>
<li>The pixel shader samples the GBuffers for the data it needs to light the pixel</li>
<li>Then applies the lighting equation and returns the final color</li>
<ul>
<li>Assuming no light culling, this means the lighting equation is invoked <br />((# pixels on screen) x (# lights)) times</li>
</ul>
<li>The output merger is pretty much a pass-though, as we don't use a depth buffer for this pass.</li>
</ol>
</ul>
<div>
<div>
<br /></div>
</div>
<div>
<h4>
So what's the difference? Why go through all that extra work?</h4>
</div>
<div style="text-align: center;">
<b>Deferred Shading invokes the lighting equation fewer times <span style="font-size: 8px;">(generally)</span></b></div>
<br />
In the past 10 years, there has been a push to make real-time graphics more and more realistic. A massive part of realism is lighting. But, lighting is usually THE most expensive calculation for a scene. In forward shading, you calculate lighting for each and every pixel that the rasterizer creates. However, depending on your scene, a large number of these pixels will be rejected by the depth test. Thus, a large number of calculations were *wasted* in a sense. Granted there are <a href="https://www.google.com/search?q=early+z-culling" target="_blank">ways around this</a>, but they aren't perfect and I'll leave that for future exploration. <b><span style="color: #eeeeee;">Thus, deferred shading effectively separates scene complexity and lighting complexity.</span></b><br />
<br />
This all said, deferred shading isn't the cure-all for everything; it does have some significant draw-backs<br />
<ol>
<li>It requires a large* amount of bandwidth and memory to store the GBuffers</li>
<ul>
<li>Large is a relative term. It ultimately depends on what platform you're targeting</li>
</ul>
<li>It requires hardware that allows multiple render targets</li>
<ul>
<li>Somewhat of a moot point with today's hardware, but still something to watch for</li>
</ul>
<li>No hardware anti-aliasing.</li>
<li>No transparent geometry / blending</li>
</ol>
<div>
<br /></div>
<h4>
So how is my deferred shading demo implemented?</h4>
GBuffers:<br />
<table style="border-collapse: collapse; border-style: solid; border-width: 1px;">
<tbody>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Albedo-MaterialIndex</td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">DXGI_FORMAT_R8G8B8A8_UNORM</td>
</tr>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Normal </td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">DXGI_FORMAT_R16G16_FLOAT</td>
</tr>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Depth</td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">DXGI_FORMAT_R32_FLOAT</td>
</tr>
</tbody></table>
<br />
<table style="border-collapse: collapse; border-style: solid; border-width: 1px; text-align: center;">
<tbody>
<tr>
<th style="border-style: solid; border-width: 1px; padding: 2px 6px; width: 25%;">8 bits</th>
<th style="border-style: solid; border-width: 1px; padding: 2px 6px; width: 25%;">8 bits</th>
<th style="border-style: solid; border-width: 1px; padding: 2px 6px; width: 25%;">8 bits</th>
<th style="border-style: solid; border-width: 1px; padding: 2px 6px; width: 25%;">8 bits</th>
</tr>
<tr>
<td style="background-color: red; border: 1px solid #cccccc; color: white; padding: 2px 6px;">Albedo Red</td>
<td style="background-color: green; border: 1px solid #cccccc; color: white; padding: 2px 6px;">Albedo Green</td>
<td style="background-color: blue; border: 1px solid #cccccc; color: white; padding: 2px 6px;">Albedo Blue</td>
<td style="background-color: #43c6db; border: 1px solid #cccccc; color: white; padding: 2px 6px;">Material Index</td>
</tr>
<tr>
<td colspan="2" style="background-color: yellow; border: 1px solid #cccccc; color: black; padding: 2px 6px;">Normal Phi</td>
<td colspan="2" style="background-color: lime; border: 1px solid #cccccc; color: black; padding: 2px 6px;">Normal Theta</td>
</tr>
<tr>
<td colspan="4" style="background-color: #330033; border: 1px solid #cccccc; color: white; padding: 2px 6px;">Depth</td>
</tr>
</tbody></table>
<br />
<table style="border-collapse: collapse; border-style: solid; border-width: 1px;">
<tbody>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Albedo</td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Stores the RGB diffuse color read from texture mapping</td>
</tr>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">MaterialIndex</td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">An offset index to a global material array in the shader</td>
</tr>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Normal</td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">The fragment surface unit normal stored in spherical coordinates. (We don't store radius since we know it's 1 for a unit normal)</td>
</tr>
<tr>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">Depth</td>
<td style="border-style: solid; border-width: 1px; padding: 2px 6px;">The hardware depth buffer. It stores (1 - z/w). By swapping the depth planes, we <a href="http://mynameismjp.wordpress.com/2010/03/22/attack-of-the-depth-buffer/" target="_blank">spread the depth precision out more evenly</a>.</td>
</tr>
</tbody></table>
<br />
<div>
<br />
Converting the normal to/from spherical coordinates is just some trig, but <a href="https://github.com/RichieSams/thehalflingproject/blob/e29087f74e251c0964b4b9923fa7857d306f4239/source/obj_loader_demo/hlsl_util.hlsli#L13" target="_blank">here</a> is the code I use. Note: My code assumes that the GBuffer can handle non-uniform data. (AKA, potentially outside the range [0, 1])<br />
<br /></div>
I use the depth buffer to calculate the world position of the pixel. The basic principle is that since we know the position of the pixel on the screen, using that, the depth, and the inverse ViewProjection matrix, we can calculate the world postion. I'll point you<a href="http://mynameismjp.wordpress.com/2010/03/22/attack-of-the-depth-buffer/" target="_blank"> here</a> and <a href="https://github.com/RichieSams/thehalflingproject/blob/e29087f74e251c0964b4b9923fa7857d306f4239/source/obj_loader_demo/hlsl_util.hlsli#L39" target="_blank">here</a> for more information.<br />
<br />
So you managed to get through all that, let me reward you with a video and some screenshots. :)<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/Dl2-30p2P6w?feature=player_embedded' frameborder='0'></iframe></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/D18zzt4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/D18zzt4.png" height="502" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
With 500 point lights and 500 spot lights<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/ea2KIsB.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/ea2KIsB.png" height="500" width="640" /></a></div>
<br />
Visualizing the GBuffers<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/tqqkA28.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/tqqkA28.png" height="500" width="640" /></a></div>
<br />
And one last one to show you that the depth buffer does actually have data in it:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/aoF6ig4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/aoF6ig4.png" height="498" width="640" /></a></div>
<br />
<br />
Well that's it for now! I have another demo I'm working on right now, but I'll leave that for another post. If you want a sneak peak, there is a build of it <a href="https://github.com/RichieSams/thehalflingproject/archive/master.zip" target="_blank">in my repo</a>.<br />
<br />
As always, feel free to ask questions and leave comments or suggestions.<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com5tag:blogger.com,1999:blog-4016790357096156934.post-88330846247803644412014-01-22T22:01:00.002-06:002014-01-22T22:03:00.806-06:00Getting Started with GitWe're using Git in my Elements of Databases class this semester, so I though I would put together a crash course for Git. So here goes!<br />
<br />
<h3>
What is Git?</h3>
<a href="http://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn't_read" target="_blank">TL;DR</a> explanation of what Git is:<br />
Git was designed to allow multiple users to work on the same project as the same time.<br />
It also serves as a way to save and display your work history.<br />
<br />
<h3>
First things first</h3>
There are various ways you can use git (command line, SourceTree, GitHub client, TortoiseGit, or some combination). My personal preference is SourceTree for mostly everything, TortoiseGit for merge conflicts, and command line only when necessary.<br />
<br />
So the first step is to download and install the software that you would like to use. I am going to be showing SourceTree, but it should be a similar process for other programs.<br />
<br />
Go to this link: <a href="http://www.sourcetreeapp.com/download/" target="_blank">http://www.sourcetreeapp.com/download/</a><br />
The download should start within a couple seconds.<br />
<br />
Run the exe and follow the directions.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/dsOlkU1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/dsOlkU1.png" height="247" width="320" /></a></div>
<br />
<h3>
Setting up SourceTree</h3>
<br />
<ol>
<li>When you first start SourceTree, it will ask you where git is installed. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/Mg7tq8a.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/Mg7tq8a.png" height="320" width="305" /></a></div>
</li>
<li>If it's not installed, then it can do it for you if you click "Download an embedded version of Git"</li>
<li>Next it will ask you about Mercurial. You can just say "I don't want to use Mercurial"</li>
<li>You will then be presented with this: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/Uwqp56w.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/Uwqp56w.png" height="320" width="300" /></a></div>
</li>
<li>Fill our your name and email. This is the information that will show up when you commit.</li>
<li>Leave the two checkboxes checked. </li>
<ul>
<li>The first allows SourceTree to automatically update git configurations when you change options within SourceTree. </li>
<li>The second makes sure all your line endings are the same, so there are no conflicts if you move from Windows to Mac, Linux to Windows, etc.</li>
</ul>
<li>Accept SourceTree's Licence Agreement and Click "Next" <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/56M37qi.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/56M37qi.png" height="320" width="304" /></a></div>
</li>
<li>This next dialog box is for if you use SSH. This can be set up later if you choose to use it. In the meantime, just press "Next" and then "No"</li>
<li>The last dialog box gives you the opportunity to sign into any repository sites you use. This makes cloning repositories much easier and faster. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/tSTlhLa.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/tSTlhLa.png" height="320" width="300" /></a></div>
</li>
<li><div class="separator" style="clear: both; text-align: left;">
Click "Finish" and you should be in SourceTree proper: </div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/32bAlJ4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/32bAlJ4.png" height="266" width="400" /></a></div>
</li>
</ol>
<br />
<h3>
Creating a Repository</h3>
So now you have everything installed, let's actually get into usage. The first thing you'll want to do is create a repository. You can think of this as a giant box to hold all your code and changes. So let's head over to GitHub. Once you've logged in, you should see something similar to this:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/86h9X4P.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/86h9X4P.png" height="386" width="400" /></a></div>
<br />
<br />
<ol>
<li>Click the green, "New repository" button on the right-hand side of the web page. The page should look something like this: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/oriQBeT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/oriQBeT.png" height="246" width="320" /></a></div>
</li>
<li>Name the repository and, if you would like, add a description.</li>
<li>Click the radio button next to "Private", since all our class repos need to be private</li>
<li>Click on the combobox labelled "Add git ignore" and select Python. Github will then automatically create a .gitignore files for us.</li>
<ul>
<li>A '.gitignore' file tells git what type of files or directories we <i><b>don't</b></i> want to store in our repository.</li>
</ul>
<li>Finally, click "Create repository"</li>
</ol>
<div>
<br /></div>
<h3>
Cloning a Repository</h3>
Now that we've created the repository, we want a copy of it on our local machine.<br />
<ol>
<li>Open up SourceTree</li>
<li>Click the button in the top left corner of the program called "Clone/New" <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/R8shdKm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/R8shdKm.png" /></a></div>
</li>
<li>You should get something that looks like this: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/Bs4nNHx.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/Bs4nNHx.png" height="200" width="400" /></a></div>
</li>
<li>If you logged in with your GitHub account earlier, you can press the Globe-looking button to list all your repositories. </li>
<ol type="a">
<li>Just select the one you want to clone and press OK. <a href="http://i.imgur.com/dHqZH7Z.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://i.imgur.com/dHqZH7Z.png" height="130" width="400" /></a></li>
<li>Otherwise, go to the repository on GitHub and copy the url labelled "HTTPS clone url" <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/6fe6ZA3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/6fe6ZA3.png" height="320" width="291" /></a></div>
</li>
<ul>
<li>(You can use SSH if you want, but that's beyond the scope of this tutorial)</li>
</ul>
<li>Paste the url into SourceTree <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/KQ1zu3L.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/KQ1zu3L.png" height="198" width="400" /></a></div>
</li>
</ol>
<li>Click on the ellipses button next to "Destination path" and select an ***EMPTY*** folder where you want your local copy to reside. </li>
<li>Click "Clone" <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/TiuNEjQ.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/TiuNEjQ.png" height="110" width="400" /></a></div>
</li>
</ol>
<br />
<h3>
<b>Basic Git Usage</b></h3>
Now let's get into the basic usage of git<br />
<br />
Let's add a python file with some basic code. So browse to the folder that you just created and create a file called hello.py. Open it with your favorite editor and write a basic Hello World.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/73BCOhH.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/73BCOhH.png" height="132" width="320" /></a></div>
<br />
Ok now that we've created this file, let's add it to our repository. So let's go over to SourceTree.<br />
<br />
<ol>
<li><span style="text-align: center;">Make sure you're in the "File Status" tab </span></li>
<ul>
<li>This tab lists all the changes that you've done since your last commit with a preview window on the right
</li>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/8rVQPZS.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="http://i.imgur.com/8rVQPZS.png" height="320" width="237" /></a></div>
</ul>
<li>Click on hello.py</li>
<li>Add the file to the "Stage" by clicking "Stage file" or by using the arrows in the left column.
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/tvcTEyw.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/tvcTEyw.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/xkx49Pz.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/xkx49Pz.png" /></a></div>
</li>
<br />
Just what is the stage? Think of it as a temporary storage area where you prepare a set of changes before committing. Only the items that are on the stage will be committed. This comes in handy when you want to break changes into multiple commits. We'll see an example of that later.<br />
<br />
<li>Press the "Commit" button in the top left of SourceTree. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/LgYuh2S.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/LgYuh2S.png" /></a></div>
You should get something like this: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/XfVtyWd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/XfVtyWd.png" height="185" width="400" /></a></div>
</li>
<li>Add a message to your commit and click the "Commit" button at the bottom right-hand corner. I'll explain message formatting later. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/UYj5alV.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/UYj5alV.png" height="90" width="320" /></a></div>
</li>
<li>Now if you go to the "Log/History" tab, you will see your new commit: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/lzagykU.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/lzagykU.png" height="118" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/Hndxby4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/Hndxby4.png" height="122" width="320" /></a></div>
</li>
</ol>
<div>
<br /></div>
You might notice that SourceTree tells you that "master is 1 ahead". What does this mean?<br />
<div>
<br /></div>
<div>
When you commit, everything is local. Nothing is transmitted to GitHub. Therefore, SourceTree is telling you that your Master branch is 1 commit ahead of GitHub.</div>
<div>
<br />
So let's fix that! </div>
<div>
<ol>
<li>Click the "Push" button. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/FuOMuKg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/FuOMuKg.png" height="85" width="400" /></a></div>
</li>
<li>And press "Ok"</li>
</ol>
<i><b>Now</b></i> everything is synced to GitHub.</div>
<div>
<br /></div>
<h3>
Commit Style and Commit Message Formatting</h3>
<div>
Before I go any further I want to make a few comments on commit style and commit message formatting.</div>
<div>
<br /></div>
<div>
Commits should be treated as small logical changes. A stranger should be able to look at your history and know roughly what your thought process was. Also, they should be able to look at each commit and know exactly what you changed. Some examples would be "Fixed a typo on the output message" "Added an iteration counter for debug purposes"</div>
<div>
<br /></div>
<div>
With that in mind, Git has a standard commit message format:</div>
<div>
<br /></div>
<pre class="brush:text"><SYSTEM_NAME_IN_ALL_CAPS>: <Commit message>
[Commit body / Any additional information]
</pre>
<br />
So an example would be:
<br />
<pre class="brush:text">COLLATZ: Added an iteration counter for debug purposes
I wanted to know how many times the function was being called
in each loop.
</pre>
<br />
<span style="color: #a64d79;">SYSTEM_NAME</span> refers to whatever part of the project the commit affects. IE. SOUND_SYSTEM, GRAPHICS_MANAGER, CORE. For our class projects, we probably won't have subsystems, so we can just use the project name, ie. for this first project COLLATZ.<br />
The <span style="color: #a64d79;">commit message</span> should be short and to the point. Any details should be put in the body of the commit.<br />
If you have a <span style="color: #a64d79;">commit body</span>, there should be a blank line between it and the commit message.<br />
<br />
<br />
<h3>
More Git Usage Examples</h3>
Let's do another example commit<br />
<br />
<ol>
<li>Modify your hello.py file to add these lines: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/wGD9mI8.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/wGD9mI8.png" /></a></div>
</li>
<li>Save</li>
</ol>
<br />
<br />
Now, let's commit<br />
<br />
<ol>
<li>Go back to the "File Status" tab in SourceTree <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/9FJXPD0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/9FJXPD0.png" height="86" width="320" /></a></div>
</li>
<li>If you look at the preview pane, you'll see the lines we added highlighted in green <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/loRzyz6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/loRzyz6.png" height="141" width="400" /></a></div>
</li>
</ol>
However, it would make sense to split the changes into two commits. How do we do that?<br />
<ol>
<li>Click on the first line you would like to add to the Stage. Holding down shift, click on the last line you want to add to the stage. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/lXVFWiO.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/lXVFWiO.png" height="167" width="400" /></a></div>
</li>
<li>Now click, "Stage Selected Lines" <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/wQdieCt.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/wQdieCt.png" /></a></div>
</li>
<li>The changes moved to the Stage! <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/4jYkdr9.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/4jYkdr9.png" height="107" width="400" /></a></div>
</li>
<li>Commit the changes using the same instructions as before<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/kquiMMO.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/kquiMMO.png" height="162" width="320" /></a></div>
</li>
<li>Now let's stage and commit the remaining changes. You can once again select the lines you want and use "Stage Selected Lines", or you can stage the entire chunk. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/VcMRlOO.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/VcMRlOO.png" height="185" width="400" /></a></div>
</li>
<ul>
<li>A chunk is just a group of changes that happen to be near each other.</li>
</ul>
<li>Now there's an extra space that I accidentally added. <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/sJOMwYj.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/sJOMwYj.png" height="206" width="400" /></a></div>
</li>
<li>Rather than going to my editor to delete it, I can let git do the work.</li>
<li>Select the lines you want to discard and press "Discard Selected lines"</li>
</ol>
<div style="background-color: red;">
<div style="text-align: center;">
<b><span style="color: white;">************ WARNING *************</span></b></div>
<span style="color: white;">Once you discard changes, they are gone <b><i>forever</i></b>. As in, no getting them back. So be VERY VERY careful using discard.</span><br />
<div style="text-align: center;">
<b><span style="color: white;">************ WARNING *************</span></b></div>
</div>
<br />
<h3>
Pulling</h3>
So far, we've been the only ones on our repository. However, the whole point of using a repository is so that multiple people can work at the same time.<br />
<br />
This is a portion of the commit history for an open source project I'm part of called ScummVM:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/MjAH5aL.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/MjAH5aL.png" height="107" width="640" /></a></div>
As you can see, there are many changes going on all the same time.<br />
<h4>
Let's imagine a scenario:</h4>
You and your partner Joe are working on some code at the same time. You make some changes and commit them. However, in the meantime, Joe also made some changes, commited them, and pushed them to the repository. If you try and push, git will complain, and rightfully so. You don't have the most up-to-date version of the repository. Therefore, in order to push your changes to the repository, you first need to pull Joe's changes and merge any conflicts.<br />
<br />
How do you pull?<br />
Just click the "Pull" button in SourceTree. Click ok and wait for git to do its work. Once it finishes, you'll notice Joe's new commit have shown up in your history. *Now* you can push.<br />
<br />
Therefore, it's common practice to always pull before you push. Nothing will go wrong if you don't, since git will catch the error, but it's a good habit to get in.<br />
<br />
<h3>
Tips and Tricks</h3>
<h4>
Stashing</h4>
So say you have a group of changes that you're working on, but you want to try a different way to fix the problem. One way to approach that is by "Stashing". Stashing stores all your current changes and then reverts your code back to your last commit. Then at a later time you can restore the stash back onto your code.<br />
<br />
<br />
<ol>
<li>To stash changes, just press the stash button in SourceTree <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/n3xPm21.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/n3xPm21.png" /></a></div>
</li>
<li>To bring your changes back, right click on the stash you want and click "Apply" <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/XJiY0UZ.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/XJiY0UZ.png" /></a></div>
</li>
<li>It will bring up a dialog box like this: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/kpilHjc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/kpilHjc.png" height="117" width="400" /></a></div>
</li>
<li>If you leave the "Delete after applying" checkbox unchecked, the stash will stay, even after it's been restored. I usually delete a stash after applying, but it can be useful to keep it if you want to apply it somewhere else.</li>
</ol>
<br />
Stashing can also be done on the command line with:<br />
<br />
<ul>
<li>git stash</li>
<li>git stash pop</li>
</ul>
<br />
The first command stashes changes and the second restores the last stash and then deletes it<br />
<br />
<h4>
Going back in history</h4>
Say you want to go back to a certain state in your history, perhaps because that was the last time your code worked, or maybe to see if a certain stage also had a certain bug.<br />
<br />
<ol>
<li>First, stash or commit all your current changes. If you don't, you could lose some or all of your work.</li>
<li>Then, in the Log/History tab of SourceTree, double click on the commit you would like to move to. You should get a dialog box like this: <div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/02w3nki.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/02w3nki.png" height="140" width="320" /></a></div>
</li>
<li>That's to confirm that you want to move. Click yes.</li>
<li>Now your code should have changed to reflect the state of the commit you clicked.</li>
<li>If you want to make any changes here, first create a branch. That's covered in the next section.</li>
<li>To move back to the end, just double click the last commit you were on.</li>
</ol>
<br />
<br />
<h4>
Branching</h4>
Consider that you and Joe are both trying to come up with a solution to a bug. Rather than both working in 'master' and potentially messing up each other's code, it would make more sense if you each had a separate instance of the code. This can be solved with branching.<br />
<br />
So for example, you could work in a branch called, 'solution1' and Joe could work in a branch called 'solution2'. Then when everything is finished, you choose the branch you like best and use git to merge that branch back into 'master'.<br />
<br />
So to start, let's create a branch.<br />
<br />
<ol>
<li>Easy enough. Just click the "Branch" button http://i.imgur.com/BAmPmg2.png</li>
<li>Name the branch and press "Create Branch". Branch names can not contain spaces and are case sensitive</li>
<li>You should now be in your new branch. Any commits you do will commit to this branch.</li>
</ol>
<br />
To move to another branch, or "checkout" a branch, simply double click the branch in your commit history or double click the branch in the branch list in the left column<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/ikvsVti.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/ikvsVti.png" /></a></div>
<br />
Now that you've committed some changes to another branch, let's merge it back into master<br />
<br />
<ol>
<li>Double click on master to check it out</li>
<li>Right click on the last commit of the branch you would like to merge in and select "Merge..."</li>
<li>Click "Ok"</li>
<li>If there are no conflicts, the merge will be successful and master will contain all the changes from the other branch</li>
<li>Remember to push!</li>
</ol>
<div>
<br /></div>
<div>
<br /></div>
<div>
Well, that's pretty much all the basics. There are <i>many many many</i> more things you can do with Git, but you can worry about that when you the situation arises. </div>
<div>
<br /></div>
<div>
You are more than welcome to leave a comment if you have any questions or if you have any suggestions for improving what I've written or the structure of how it's organized. Also, please let me know if you find any errors.</div>
<div>
<br /></div>
<div>
Have fun coding!</div>
<div>
-<span style="color: #dd7700;">RichieSams</span></div>
RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-47102459049755753342013-09-13T20:19:00.002-05:002014-01-22T20:27:57.032-06:00I never knew moving a lever could be so hardIn the Zork games, there are switches/twist knobs/turn-tables:<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/-h357Pmn1Gc?feature=player_embedded' frameborder='0'></iframe></div>
<br />
These are all controlled by the Lever control:<br />
<pre class="brush:text">control:624 lever {
descfile(knocker.lev)
cursor(handpt)
}
</pre>
<br />
The knocker.lev file looks like this:
<br />
<pre class="brush:text">animation_id:631~
filename:te2ea21c.rlf~
skipcolor:0~
anim_coords:200 88 343 315~
mirrored:1~
frames:11~
elsewhere:0 0 511 319~
out_of_control:0 0 511 319~
start_pos:0~
hotspot_deltas:42 39~
0:241 252 D=1,90 ^=P(0 to 1) P(1 to 0) P(0 to 1) P(1 to 0) E(0)~
1:234 260 D=2,90 D=0,270 ^=P(1 to 0) E(0)~
2:225 258 D=3,90 D=1,270 ^=P(2 to 0) P(0 to 1) P(1 to 0) E(0)~
3:216 255 D=4,90 D=2,270 ^=P(3 to 0) P(0 to 1) P(1 to 0) E(0)~
4:212 234 D=5,90 D=3,270 ^=P(4 to 0) P(0 to 2) P(2 to 0) E(0)~
5:206 213 D=6,90 D=4,270 ^=P(5 to 0) P(0 to 3) P(3 to 0) E(0)~
6:212 180 D=7,90 D=5,270 ^=P(6 to 0) P(0 to 3) P(3 to 0) E(0)~
7:214 147 D=8,90 D=6,270 ^=P(7 to 0) P(0 to 4) P(4 to 0) E(0)~
8:222 114 D=9,90 D=7,270 ^=P(8 to 0) P(0 to 5) P(4 to 0) E(0)~
9:234 106 D=10,90 D=8,270 ^=P(9 to 0) P(0 to 5) P(4 to 0) E(0)~
10:234 98 D=9,270~</pre>
<ul>
<li><span style="color: #a64d79;">animation_id</span> is unused.</li>
<li><span style="color: #a64d79;">filename refers</span> to the animation file used.</li>
<li><span style="color: #a64d79;">skip color</span> is unused.</li>
<li><span style="color: #a64d79;">anim_coords</span> refers to the location the control will be rendered</li>
<li><span style="color: #a64d79;">mirrored</span> says that the reverse of the animation is appended to the end of the file. Ex: 0, 1, 2, 3, 3, 2, 1, 0</li>
<li><span style="color: #a64d79;">frames</span> refers to how many animation frames there are (If mirrored = 1, frames = animationFile::frameCount / 2)</li>
<li><span style="color: #a64d79;">elsewhere</span> is unused</li>
<li><span style="color: #a64d79;">out_of_control</span> is unused</li>
<li><span style="color: #a64d79;">start_pos</span> refers to the first animation frame used by the control</li>
<li><span style="color: #a64d79;">hotspot_deltas</span> refers to the width and height of the hotspots used to grab a control with the mouse</li>
</ul>
<br />
The last section is a bit tricky. It's formatted like so:
<br />
<pre class="brush:text">`
[frameNumber]:[hotspotX] [hotspotY] D=[directionToFrame],[directionAngle] .....(potentially more directions) ^=P([from] to [to]) P([from] to [to]) ... (potentially more return paths) E(0)~
</pre>
<br />
<ul>
<li><span style="color: #a64d79;">frameNumber</span> corresponds the animationFile frame that should be displayed when the lever is in that state</li>
<li><span style="color: #a64d79;">hotspotX</span> is the X coordinate of the hotspot rectangle in which the user can grab the control</li>
<li><span style="color: #a64d79;">hotspotY</span> is the Y coordinate of the hotspot rectangle in which the user can grab the control</li>
</ul>
<br />
D refers to "Direction". Let's say we're at frame 0. D=1,90 means: "To get to frame 1, the mouse needs to be moving at a 90 degree angle." (I'll cover how the angles work in a bit)
<br />
<br />
P refers to "Path". This is what frames should be rendered after the user lets go of a control. For example, lets say we let go of the knocker at frame 6. The .lev file reads: ^=P(6 to 0) P(0 to 3) P(3 to 0). This says to render every frame from 6 to 0, then every frame from 0 to 3, then every frame from 3 to 0. So written out:
<br />
<div style="text-align: center;">
6, 5, 4, 3, 2, 1, 0, 0, 1, 2, 3, 3, 2, 1, 0
</div>
<br />
This allows for some cool effects such as the knocker returning to the lowest position and bouncing as though it had gravity.
<br />
<br />
<br />
So what is that angle I was talking about? It refers to the direction the mouse is moving while the user is holding down left mouse button.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-0XzrWhXNjZY/UjO09ZiAEsI/AAAAAAAAAIc/MRR-he05x8Y/s1600/angle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-0XzrWhXNjZY/UjO09ZiAEsI/AAAAAAAAAIc/MRR-he05x8Y/s320/angle.png" width="320" /></a></div>
So let's go over a typical user interaction:<br />
<ol>
<li>User hovers over the control. The cursor changes to a hand.</li>
<li>User presses down the left mouse button</li>
<li>Test if the mouse is within the current frame's hotspot</li>
<li>If so, begin a drag:</li>
<ol>
<li>Calculate the distance between the last mouse position and the current</li>
<li>If over 64 (a heuristic), calculate the angle. (Only calculating the angle when we're sufficiently far from the last mouse position saves calculations as well as makes the lever less "twitchy"</li>
<li>Test the angle against the directions</li>
<li>If one passes, render the new frame</li>
</ol>
<li>User moves a couple more times</li>
<li>User releases the left mouse button</li>
<ol>
<li>Follow any return paths set out in the .lev file</li>
</ol>
</ol>
<br />
<br />
And that's it! Let me know if you have any questions or comments. The full source code can be found <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/lever_control.h" target="_blank">here</a> and <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/lever_control.cpp" target="_blank">here</a>. Until next time<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>
RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-61714601210988455552013-09-13T15:23:00.000-05:002014-01-22T20:28:15.741-06:00One frame at a timeSo, we're entering into the final weekend before the soft pencil's down for GSoC. It's been a very busy couple of weeks since university here in the US started 3 weeks ago. So I've been juggling homework, labs, and working on this. But you're not here to for that, so let's launch into the actual post:<br />
<br />
Animations in-game come in two formats: AVI and a custom format called RLF. AVI is simple because I can use the ScummVM AVI decoder. But I had to reverse engineer the RLF file format so it can be played as a video or frame by frame.<br />
<br />
Before I go into the format of the file, I want to explain the general schema of animations, or more specifically, video frame compression techniques. (For another reference, the article <a href="https://en.wikipedia.org/wiki/Video_compression_picture_types" target="_blank">here</a> is pretty good) The first frame of an animation has to include every pixel, aka, no compression. These frames are called I-frames or key frames. For the next frame, we <i><b>could</b></i> store every pixel, but that seems kind of wasteful. Instead, we store only the pixels that changed between the last frame and this frame. These are called P-frames. Optionally, a frame can also store the pixels that changed between the next frame and this frame. These are called B-frames. This allows animations to be played forwards <i><b>or</b></i> backwards. With P-frames, we can only go forwards. In order to seek within an animation, we have to find the closest I-frame, then add P/B-frames until we're at the frame we want. To make this less painful for long animations, video encoders insert I-frames every 120 frames or so. (About 1 I-frame every 5 seconds. Assuming 24 fps, 24 * 5 = 120).<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/I_P_and_B_frames.svg/2000px-I_P_and_B_frames.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="120" src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/I_P_and_B_frames.svg/2000px-I_P_and_B_frames.svg.png" width="640" /></a></div>
<br />
RLF files only use I-frames and P-frames. If they need to go backwards, the whole animation is encoded both forwards and backwards. For example: 0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0<br />
It seems pretty wasteful in my opinion, but that's what they do.<br />
<br />
The RLF file starts off with a header to describe the information it contains:<br />
<pre class="brush:cpp">bool RlfAnimation::readHeader() {
if (_file.readUint32BE() != MKTAG('F', 'E', 'L', 'R')) {
return false;
}
// Read the header
_file.readUint32LE(); // Size1
_file.readUint32LE(); // Unknown1
_file.readUint32LE(); // Unknown2
_frameCount = _file.readUint32LE(); // Frame count
// Since we don't need any of the data, we can just seek right to the
// entries we need rather than read in all the individual entries.
_file.seek(136, SEEK_CUR);
//// Read CIN header
//_file.readUint32BE(); // Magic number FNIC
//_file.readUint32LE(); // Size2
//_file.readUint32LE(); // Unknown3
//_file.readUint32LE(); // Unknown4
//_file.readUint32LE(); // Unknown5
//_file.seek(0x18, SEEK_CUR); // VRLE
//_file.readUint32LE(); // LRVD
//_file.readUint32LE(); // Unknown6
//_file.seek(0x18, SEEK_CUR); // HRLE
//_file.readUint32LE(); // ELHD
//_file.readUint32LE(); // Unknown7
//_file.seek(0x18, SEEK_CUR); // HKEY
//_file.readUint32LE(); // ELRH
//// Read MIN info header
//_file.readUint32BE(); // Magic number FNIM
//_file.readUint32LE(); // Size3
//_file.readUint32LE(); // OEDV
//_file.readUint32LE(); // Unknown8
//_file.readUint32LE(); // Unknown9
//_file.readUint32LE(); // Unknown10
_width = _file.readUint32LE(); // Width
_height = _file.readUint32LE(); // Height
// Read time header
_file.readUint32BE(); // Magic number EMIT
_file.readUint32LE(); // Size4
_file.readUint32LE(); // Unknown11
_frameTime = _file.readUint32LE() / 10; // Frame time in microseconds
return true;
}
</pre>
<br />
The magic number 'FELR' refers to the run-length encoding used in the file. I'll explain the specifics later on. I'm kind of curious what all the extra information in the header is used for, so if you guys have any ideas, I'm all ears. The useful information is pretty self-explanatory.<br />
<br />
After the header is the actual frame data. Each frame also has a header.<br />
<pre class="brush:cpp">RlfAnimation::Frame RlfAnimation::readNextFrame() {
RlfAnimation::Frame frame;
_file.readUint32BE(); // Magic number MARF
uint32 size = _file.readUint32LE(); // Size
_file.readUint32LE(); // Unknown1
_file.readUint32LE(); // Unknown2
uint32 type = _file.readUint32BE(); // Either ELHD or ELRH
uint32 headerSize = _file.readUint32LE(); // Offset from the beginning of this frame to the frame data. Should always be 28
_file.readUint32LE(); // Unknown3
frame.encodedSize = size - headerSize;
frame.encodedData = new int8[frame.encodedSize];
_file.read(frame.encodedData, frame.encodedSize);
if (type == MKTAG('E', 'L', 'H', 'D')) {
frame.type = Masked;
} else if (type == MKTAG('E', 'L', 'R', 'H')) {
frame.type = Simple;
_completeFrames.push_back(_lastFrameRead);
} else {
warning("Frame %u doesn't have type that can be decoded", _lastFrameRead);
}
_lastFrameRead++;
return frame;
}
</pre>
<br />
If a frame is of type DHLE, it is a P-frame, if it is of type HRLE, it's an I-frame. We hold off decoding until we actually need to render the frame. This allows for less memory use.<br />
<br />
So now we've read in all our data. How do we render a frame? The simplest case is to render the next frame. Note: _currentFrameBuffer is a Graphics::Surface that stores the current frame.<br />
<pre class="brush:cpp">const Graphics::Surface *RlfAnimation::getNextFrame() {
assert(_currentFrame + 1 < (int)_frameCount);
if (_stream) {
applyFrameToCurrent(readNextFrame());
} else {
applyFrameToCurrent(_currentFrame + 1);
}
_currentFrame++;
return &_currentFrameBuffer;
}
void RlfAnimation::applyFrameToCurrent(uint frameNumber) {
if (_frames[frameNumber].type == Masked) {
decodeMaskedRunLengthEncoding(_frames[frameNumber].encodedData, (int8 *)_currentFrameBuffer.getPixels(), _frames[frameNumber].encodedSize, _frameBufferByteSize);
} else if (_frames[frameNumber].type == Simple) {
decodeSimpleRunLengthEncoding(_frames[frameNumber].encodedData, (int8 *)_currentFrameBuffer.getPixels(), _frames[frameNumber].encodedSize, _frameBufferByteSize);
}
}
void RlfAnimation::applyFrameToCurrent(const RlfAnimation::Frame &frame) {
if (frame.type == Masked) {
decodeMaskedRunLengthEncoding(frame.encodedData, (int8 *)_currentFrameBuffer.getPixels(), frame.encodedSize, _frameBufferByteSize);
} else if (frame.type == Simple) {
decodeSimpleRunLengthEncoding(frame.encodedData, (int8 *)_currentFrameBuffer.getPixels(), frame.encodedSize, _frameBufferByteSize);
}
}
</pre>
<br />
The decode....() functions simultaneously decode the frame data we read in earlier, and then blit it directly on-top of the _currentFrameBuffer pixels. I'll explain the details of each function further down.<br />
<br />
You might be wondering what the _stream variable refers to? I've created the RlfAnimation class so that it can decode in two different ways: it can load all the data from the file into memory and then do all decoding/blitting from memory, or it can stream the data from file, one frame at a time. The first option allows you to seek within the animation, but it uses quite a bit of memory (roughly the size of the file). The second option uses far less memory, but you can only play the animation forwards and can not seek.<br />
<br />
On to the decoding functions:<br />
<br />
I-frames contain every single pixel within a frame. Again, we <b><i>could</i></b> store every one of these, but that would be kind of expensive. So we use a simple compression algorithm called <i>Run Length Encoding</i>. (There are tons of frame compression algorithms out there. This is just the one they chose to use). Consider this image:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-yiXNvOXJuqI/UjNq24UBRvI/AAAAAAAAAH0/2tq8jJRgpG0/s1600/smilieface.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://4.bp.blogspot.com/-yiXNvOXJuqI/UjNq24UBRvI/AAAAAAAAAH0/2tq8jJRgpG0/s320/smilieface.png" width="320" /></a></div>
<br />
And then, let's choose a specific line of pixels:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-5_kKukBpWes/UjNscS8pZSI/AAAAAAAAAIA/saDGHSWLXmA/s1600/smilieFaceLine.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-5_kKukBpWes/UjNscS8pZSI/AAAAAAAAAIA/saDGHSWLXmA/s1600/smilieFaceLine.png" /></a></div>
<br />
If we were to encode each pixel we would need to store:<br />
<div style="text-align: center;">
YYYYYYYBYYYYYYBYYYYYYY</div>
where Y means yellow and B means black. That's a lot of repeated yellows. Lets instead store this:<br />
<div style="text-align: center;">
7Y1B6Y1B</div>
The numbers represent how many of the following pixels are of the same color. So the decoder would interpret that as: render 7 yellow pixels, 1 black pixel, 6 yellow pixels, 1 black pixel, then 7 yellow pixels.<br />
<br />
The RLF files take this idea further. Consider this line of data, where G means green, R means red:<br />
<div style="text-align: center;">
YYYYYBGRYBYGBYYYYYY</div>
If we use the same method as before we get:<br />
<div style="text-align: center;">
5Y1B1G1R1Y1B1Y1G1B6Y</div>
It's almost as long as the original data! If a color doesn't have any repetition, using encoding actually takes up more space. To counter that, the RLF files do the following:<br />
<div style="text-align: center;">
5Y-8<span style="text-align: center;">BGRYBYGB6Y</span></div>
If the number is negative, the next N pixels are copied directly to the destination. If it's positive, the next N pixels are filled with the color directly following the number.<br />
<br />
Here's that algorithm in code form:<br />
<pre class="brush:cpp">void RlfAnimation::decodeSimpleRunLengthEncoding(int8 *source, int8 *dest, uint32 sourceSize, uint32 destSize) const {
uint32 sourceOffset = 0;
uint32 destOffset = 0;
while (sourceOffset < sourceSize) {
int8 numberOfSamples = source[sourceOffset];
sourceOffset++;
// If numberOfSamples is negative, the next abs(numberOfSamples) samples should
// be copied directly from source to dest
if (numberOfSamples < 0) {
numberOfSamples = ABS(numberOfSamples);
while (numberOfSamples > 0) {
if (sourceOffset + 1 >= sourceSize) {
return;
} else if (destOffset + 1 >= destSize) {
return;
}
byte r, g, b;
_pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);
uint16 destColor = _pixelFormat565.RGBToColor(r, g, b);
WRITE_UINT16(dest + destOffset, destColor);
sourceOffset += 2;
destOffset += 2;
numberOfSamples--;
}
// If numberOfSamples is >= 0, copy one sample from source to the
// next (numberOfSamples + 2) dest spots
} else {
if (sourceOffset + 1 >= sourceSize) {
return;
}
byte r, g, b;
_pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);
uint16 sampleColor = _pixelFormat565.RGBToColor(r, g, b);
sourceOffset += 2;
numberOfSamples += 2;
while (numberOfSamples > 0) {
if (destOffset + 1 >= destSize) {
return;
}
WRITE_UINT16(dest + destOffset, sampleColor);
destOffset += 2;
numberOfSamples--;
}
}
}
}
</pre>
<br />
To encode the P-frames, we use a similar method as above. Remember that P-frames are partial frames. They only include the pixels that changed from the last frame. An example pixel line could look like this, where O is a placeholder for empty space:<br />
<div style="text-align: center;">
OOOOBRGOOYYBRGOOOOO</div>
To encode this we do the following:<br />
<div style="text-align: center;">
4-3BRG2-5YYBRG5</div>
If the number read is positive, the next N pixels should be skipped. If the number is negative, the next N pixels should be copied directly to the destination.<br />
<br />
Here is that algorithm in code form:<br />
<pre class="brush:cpp">void RlfAnimation::decodeMaskedRunLengthEncoding(int8 *source, int8 *dest, uint32 sourceSize, uint32 destSize) const {
uint32 sourceOffset = 0;
uint32 destOffset = 0;
while (sourceOffset < sourceSize) {
int8 numberOfSamples = source[sourceOffset];
sourceOffset++;
// If numberOfSamples is negative, the next abs(numberOfSamples) samples should
// be copied directly from source to dest
if (numberOfSamples < 0) {
numberOfSamples = ABS(numberOfSamples);
while (numberOfSamples > 0) {
if (sourceOffset + 1 >= sourceSize) {
return;
} else if (destOffset + 1 >= destSize) {
return;
}
byte r, g, b;
_pixelFormat555.colorToRGB(READ_LE_UINT16(source + sourceOffset), r, g, b);
uint16 destColor = _pixelFormat565.RGBToColor(r, g, b);
WRITE_UINT16(dest + destOffset, destColor);
sourceOffset += 2;
destOffset += 2;
numberOfSamples--;
}
// If numberOfSamples is >= 0, move destOffset forward ((numberOfSamples * 2) + 2)
// This function assumes the dest buffer has been memset with 0's.
} else {
if (sourceOffset + 1 >= sourceSize) {
return;
} else if (destOffset + 1 >= destSize) {
return;
}
destOffset += (numberOfSamples * 2) + 2;
}
}
}</pre>
<br />
Whew! Almost there. The last thing to talk about is frame seeking. This requires that you're <i><b>not</b></i> streaming directly from disk. (Well, you <i style="font-weight: bold;">could</i> do it, but it would probably be more trouble than it was worth). As we read in the frames, we stored which frames were I-frames. So to seek to a frame, we iterate through that list of I-frames and find the I-frame closest to our destination frame. Then we use applyFrameToCurrent() to move from the I-frame to the destination frame:<br />
<pre class="brush:cpp">void RlfAnimation::seekToFrame(int frameNumber) {
assert(!_stream);
assert(frameNumber < (int)_frameCount || frameNumber >= -1);
if (frameNumber == -1) {
_currentFrame = -1;
return;
}
int closestFrame = _currentFrame;
int distance = (int)frameNumber - _currentFrame;
for (Common::List<uint>::const_iterator iter = _completeFrames.begin(); iter != _completeFrames.end(); iter++) {
int newDistance = (int)frameNumber - (int)(*iter);
if (newDistance > 0 && (closestFrame == -1 || newDistance < distance)) {
closestFrame = (*iter);
distance = newDistance;
}
}
for (int i = closestFrame; i <= frameNumber; i++) {
applyFrameToCurrent(i);
}
_currentFrame = frameNumber;
}
</pre>
<br />
That's it! If you want to look at the full class, you can find it <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/rlf_animation.h" target="_blank">here</a> and <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/rlf_animation.cpp" target="_blank">here</a>. And as always, if you have ANY questions, feel free to comment. Happy coding!<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-47393688525768900462013-08-30T02:51:00.002-05:002014-01-22T20:28:15.735-06:00One pixel at a timeOver the course of this project, the way I've rendered images to the screen has rather drastically changed. Well, let me clarify. Everything is blitted to the screen in exactly the same way ( OSystem::copyRectToScreen ), however, how I get/modify the pixels that I pass to copyRectToScreen() has changed. (Disclaimer: From past experiences, we know that panorama images are stored transposed. However, in this post, I'm not really going to talk about it, though you may see some snippets of that in code examples) So a brief history:<br />
<br />
In my first iteration an image would be rendered to the screen as such:
<br />
<ol>
<li>Load the image from file to a pixel buffer</li>
<li>Choose where to put the image.</li>
<li>Choose what portion of the image we want to render to the screen. We don't actually specify the height/width, just the (x,y) top-left corner<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-JbvsErgfQcE/UiAIPjv_YpI/AAAAAAAAAE8/8KCTOObJJK4/s1600/Untitled-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-JbvsErgfQcE/UiAIPjv_YpI/AAAAAAAAAE8/8KCTOObJJK4/s1600/Untitled-1.png" /></a></div>
<br /><div class="separator" style="clear: both; text-align: center;">
</div>
</li>
<li>Call renderSubRect(buffer, destinationPoint, Common::Point(200, 0))</li>
<li>Create a subRect of the image by clipping the entire image width/height with the boundaries of the window and the boundaries of the image size</li>
<li>If we're in Panorama or Tilt RenderState, then warp the pixels of the subRect. (See <a href="http://richiesams.blogspot.com/2013/08/the-making-of-psychedelic-pictures-aka.html" target="_blank">post about the panorama system</a>)</li>
<li>Render the final pixels to the screen using OSytem::copyRectToScreen()</li>
<li>If we're rendering a background image (boolean passed in the arguments), check if the dimensions of the subRect completely fill the window boundaries. If they don't them we need to wrap the image so it seems like it is continuous.</li>
<li>If we need to wrap, calculate a wrappedSubRect and a wrappedDestination point from the subRect dimensions and the window dimensions.</li>
<li>Call renderSubRect(buffer, wrappedDestination, wrappedSubRect)</li>
</ol>
At first glance, this seems like it would work well; however, it had some major flaws. The biggest problem stemmed from the <a href="http://richiesams.blogspot.com/2013/08/the-making-of-psychedelic-pictures-aka.html" target="_blank">Z-Vision technology</a>.<br />
<br />
To understand why, let's review how pixel warping works:<br />
<ol>
<li>We use math to create a table of (x, y) offsets.</li>
<li>For each pixel in the subRect:</li>
<ol>
<li>Look up the offsets for the corresponding (x, y) position</li>
<li>Add those offsets to the the actual coordinates</li>
<li>Look up the pixel color at the new coordinates</li>
<li>Write that pixel color to the destination buffer at the original coordinates</li>
</ol>
</ol>
Let's give a specific example:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<ol>
<li>We want to render a pixel located at (183, 91)<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-658PLyJFZIw/UiAaV56pfEI/AAAAAAAAAFw/I5NInolKQNA/s1600/Untitled-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-658PLyJFZIw/UiAaV56pfEI/AAAAAAAAAFw/I5NInolKQNA/s1600/Untitled-2.png" /></a></div>
</li>
<li>We go to the RenderTable and look up the offsets at location (183, 91)<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/tUc4xBE.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/tUc4xBE.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
</li>
<li>Add (52, 13) to (183, 91) to get (235, 104)</li>
<li>Look up the pixel color at (235, 104). In this example, the color is FFFC00 (Yellow).<div class="separator" style="clear: both; text-align: center;">
<a href="http://2.bp.blogspot.com/-XcBj1oc0CIU/UiAadcPmL8I/AAAAAAAAAF8/0Vq0k8glL-8/s1600/lookUpPixelColor.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-XcBj1oc0CIU/UiAadcPmL8I/AAAAAAAAAF8/0Vq0k8glL-8/s1600/lookUpPixelColor.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
</li>
<li>Write to color FFFC00 to (183, 91) in the destination buffer</li>
</ol>
The problem occurs when you're at the edges of an image. Let's consider the same scenario, but the image is shifted to the left:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-FiepKsVhrCc/UiAdMpCAaMI/AAAAAAAAAGk/y-YO-TmtfBk/s1600/edgeCase.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-FiepKsVhrCc/UiAdMpCAaMI/AAAAAAAAAGk/y-YO-TmtfBk/s1600/edgeCase.png" /></a></div>
<br />
Let's skip to step 4:<br />
When we try to look up the pixel color at (235, 104) we have a problem. (235, 104) is outside the boundaries of the image.<br />
<br />
So, after discussing the problem with wjp, we thought that we could let the pixel warping function ( mutateImage() ) do the image wrapping, instead of doing it in renderSubRectToScreen. Therefore, in renderSubRectToScreen(), instead of clipping subRect to the boundaries of the image, I expand it to fill the entire window. Then inside of mutateImage, if the final pixel coordinates are larger or smaller than the actual image dimensions, I just keep adding or subtracting image widths/heights until the coordinates are in the correct range.<br />
<pre class="brush:cpp">void RenderTable::mutateImage(uint16 *sourceBuffer, uint16* destBuffer, int16 imageWidth, int16 imageHeight, int16 destinationX, int16 destinationY, const Common::Rect &subRect, bool wrap) {
for (int16 y = subRect.top; y < subRect.bottom; y++) {
int16 normalizedY = y - subRect.top;
int32 internalColumnIndex = (normalizedY + destinationY) * _numColumns;
int32 destColumnIndex = normalizedY * _numColumns;
for (int16 x = subRect.left; x < subRect.right; x++) {
int16 normalizedX = x - subRect.left;
int32 index = internalColumnIndex + normalizedX + destinationX;
// RenderTable only stores offsets from the original coordinates
int16 sourceYIndex = y + _internalBuffer[index].y;
int16 sourceXIndex = x + _internalBuffer[index].x;
if (wrap) {
// If the indicies are outside of the dimensions of the image, shift the indicies until they are in range
while (sourceXIndex >= imageWidth) {
sourceXIndex -= imageWidth;
}
while (sourceXIndex < 0) {
sourceXIndex += imageWidth;
}
while (sourceYIndex >= imageHeight) {
sourceYIndex -= imageHeight;
}
while (sourceYIndex < 0) {
sourceYIndex += imageHeight;
}
} else {
// Clamp the yIndex to the size of the image
sourceYIndex = CLIP<int16>(sourceYIndex, 0, imageHeight - 1);
// Clamp the xIndex to the size of the image
sourceXIndex = CLIP<int16>(sourceXIndex, 0, imageWidth - 1);
}
destBuffer[destColumnIndex + normalizedX] = sourceBuffer[sourceYIndex * imageWidth + sourceXIndex];
}
}
}
</pre>
<br />
With these changes, rendering worked well and wrapping/scrolling worked well. However, the way in which Zork games calculate background position forced me to slightly change the model.<br />
<br />
Script files change location by calling "change_location(<<span style="color: #a64d79;">world</span>> <<span style="color: #a64d79;">room</span>> <<span style="color: #a64d79;">nodeview</span>> <<span style="color: #a64d79;">location</span>>).<span style="color: #a64d79;"> location</span> refers to the initial position of the background image. Originally I thought this referred to distance from the top-left corner of the image. So for example, location = 200 would create the following image:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-6m-ozIyIcdM/UiA_o9sbWLI/AAAAAAAAAHI/FM7-QeCWqmM/s1600/Untitled-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-6m-ozIyIcdM/UiA_o9sbWLI/AAAAAAAAAHI/FM7-QeCWqmM/s1600/Untitled-1.png" /></a></div>
However, it turns out that this is not the case. <span style="color: #a64d79;">location</span> refers to distance the top-left corner is from the center line of the window:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-0xvyGUzoc4A/UiBBZLnDurI/AAAAAAAAAHU/d6A4iKsI_zU/s1600/Untitled-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-0xvyGUzoc4A/UiBBZLnDurI/AAAAAAAAAHU/d6A4iKsI_zU/s1600/Untitled-1.png" /></a></div>
Therefore, rather than worry about a subRect at all, I just pass in the destination coordinate, and then try to render the entire image (clipping it to window boundaries):<br />
<pre class="brush:cpp">void RenderManager::renderSubRectToScreen(Graphics::Surface &surface, int16 destinationX, int16 destinationY, bool wrap) {
int16 subRectX = 0;
int16 subRectY = 0;
// Take care of negative destinations
if (destinationX < 0) {
subRectX = -destinationX;
destinationX = 0;
} else if (destinationX >= surface.w) {
// Take care of extreme positive destinations
destinationX -= surface.w;
}
// Take care of negative destinations
if (destinationY < 0) {
subRectY = -destinationY;
destinationY = 0;
} else if (destinationY >= surface.h) {
// Take care of extreme positive destinations
destinationY -= surface.h;
}
if (wrap) {
_backgroundWidth = surface.w;
_backgroundHeight = surface.h;
if (destinationX > 0) {
// Move destinationX to 0
subRectX = surface.w - destinationX;
destinationX = 0;
}
if (destinationY > 0) {
// Move destinationY to 0
subRectX = surface.w - destinationX;
destinationY = 0;
}
}
// Clip subRect to working window bounds
Common::Rect subRect(subRectX, subRectY, subRectX + _workingWidth, subRectY + _workingHeight);
if (!wrap) {
// Clip to image bounds
subRect.clip(surface.w, surface.h);
}
// Check destRect for validity
if (!subRect.isValidRect() || subRect.isEmpty())
return;
if (_renderTable.getRenderState() == RenderTable::FLAT) {
_system->copyRectToScreen(surface.getBasePtr(subRect.left, subRect.top), surface.pitch, destinationX + _workingWindow.left, destinationY + _workingWindow.top, subRect.width(), subRect.height());
} else {
_renderTable.mutateImage((uint16 *)surface.getPixels(), _workingWindowBuffer, surface.w, surface.h, destinationX, destinationY, subRect, wrap);
_system->copyRectToScreen(_workingWindowBuffer, _workingWidth * sizeof(uint16), destinationX + _workingWindow.left, destinationY + _workingWindow.top, subRect.width(), subRect.height());
}
}
</pre>
<br />
So to walk through it:<br />
<br />
<ol>
<li>If destinationX/Y is less than 0, the image is off the screen to the left/top. Therefore get the top left corner of the subRect by <i>subtracting </i>destinationX/Y.</li>
<li>If destinationX/Y is greater than the image width/height respectively, the image is off the screen to the right/bottom. Therefore get the top left corner of the subRect by <i>adding </i>destinationX/Y.</li>
<li>If we're wrapping and destinationX/Y is still positive at this point, it means that the image will be rendered like this:<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-1c-w5rmO9zY/UiBHOQRU6KI/AAAAAAAAAHk/CdfX9T817Gc/s1600/Untitled-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-1c-w5rmO9zY/UiBHOQRU6KI/AAAAAAAAAHk/CdfX9T817Gc/s1600/Untitled-1.png" /></a></div>
</li>
<li>We want it to fully wrap, so we offset the image to the left one imageWidth, and then let mutateImage() take care of actually wrapping.</li>
</ol>
The last change to the render system was not due to a problem with the system, but due to a problem with the pixel format of the images. All images in Zork Nemesis and Zork Grand Inquisitor are encoded in RGB 555. However, a few of the ScummVM backends do not support RGB 555. Therefore, it was desirable to convert all images to RGB 565 on the fly. To do this, all image pixel data is first loaded into a Surface, then converted to RGB 565. After that, it is passed to renderSubRectToSurface().<br />
<br />
Since I was alreadly preloading the pixel data into a Surface for RGB conversion, I figured that was a good place to do 'un-transpose-ing', rather than having to do it within mutateImage().<br />
<br />
So, with all the changes, this is the current state of the render system:<br />
<br />
<ol>
<li>Read image pixel data from file and dump it into a Surface buffer. In the case of a background image, the surface buffer is stored so we only have to read the file once.
<pre class="brush:cpp">void RenderManager::readImageToSurface(const Common::String &fileName, Graphics::Surface &destination) {
Common::File file;
if (!file.open(fileName)) {
warning("Could not open file %s", fileName.c_str());
return;
}
// Read the magic number
// Some files are true TGA, while others are TGZ
uint32 fileType = file.readUint32BE();
uint32 imageWidth;
uint32 imageHeight;
Graphics::TGADecoder tga;
uint16 *buffer;
bool isTransposed = _renderTable.getRenderState() == RenderTable::PANORAMA;
// All ZEngine images are in RGB 555
Graphics::PixelFormat pixelFormat555 = Graphics::PixelFormat(2, 5, 5, 5, 0, 10, 5, 0, 0);
destination.format = pixelFormat555;
bool isTGZ;
// Check for TGZ files
if (fileType == MKTAG('T', 'G', 'Z', '\0')) {
isTGZ = true;
// TGZ files have a header and then Bitmap data that is compressed with LZSS
uint32 decompressedSize = file.readSint32LE();
imageWidth = file.readSint32LE();
imageHeight = file.readSint32LE();
LzssReadStream lzssStream(&file);
buffer = (uint16 *)(new uint16[decompressedSize]);
lzssStream.read(buffer, decompressedSize);
} else {
isTGZ = false;
// Reset the cursor
file.seek(0);
// Decode
if (!tga.loadStream(file)) {
warning("Error while reading TGA image");
return;
}
Graphics::Surface tgaSurface = *(tga.getSurface());
imageWidth = tgaSurface.w;
imageHeight = tgaSurface.h;
buffer = (uint16 *)tgaSurface.getPixels();
}
// Flip the width and height if transposed
if (isTransposed) {
uint16 temp = imageHeight;
imageHeight = imageWidth;
imageWidth = temp;
}
// If the destination internal buffer is the same size as what we're copying into it,
// there is no need to free() and re-create
if (imageWidth != destination.w || imageHeight != destination.h) {
destination.create(imageWidth, imageHeight, pixelFormat555);
}
// If transposed, 'un-transpose' the data while copying it to the destination
// Otherwise, just do a simple copy
if (isTransposed) {
uint16 *dest = (uint16 *)destination.getPixels();
for (uint32 y = 0; y < imageHeight; y++) {
uint32 columnIndex = y * imageWidth;
for (uint32 x = 0; x < imageWidth; x++) {
dest[columnIndex + x] = buffer[x * imageHeight + y];
}
}
} else {
memcpy(destination.getPixels(), buffer, imageWidth * imageHeight * _pixelFormat.bytesPerPixel);
}
// Cleanup
if (isTGZ) {
delete[] buffer;
} else {
tga.destroy();
}
// Convert in place to RGB 565 from RGB 555
destination.convertToInPlace(_pixelFormat);
}
</pre>
</li>
<li>Use the ScriptManager to calculate the destination coordinates</li>
<li>Call renderSubRectToScreen(surface, destinationX, destinationY, wrap) (see above)</li>
<ol>
<li>If destinationX/Y is less than 0, the image is off the screen to the left/top. Therefore get the top left corner of the subRect by <i>subtracting </i>destinationX/Y.</li>
<li>If destinationX/Y is greater than the image width/height respectively, the image is off the screen to the right/bottom. Therefore get the top left corner of the subRect by <i>adding </i>destinationX/Y.</li>
<li>If we're wrapping and destinationX/Y is still positive at this point, offset the image to the left one imageWidth</li>
<li>If we're in PANORAMA or TILT state, call mutateImage() (see above)</li>
<ol>
<li>Iterate over the pixels of the subRect</li>
<li>At each pixel get the coordinate offsets from the RenderTable</li>
<li>Add the offsets to the coordinates of the pixel.</li>
<li>Use these new coordinates to get the location of the pixel color</li>
<li>Store this color at the coordinates of the original pixel</li>
</ol>
<li>Blit the final result to the Screen using OSystem::copyRectToScreen()</li>
</ol>
</ol>
<br />
That's it! Thanks for reading. As always, feel free to ask questions or make comments. Happy coding!
<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-27291202861900874952013-08-18T20:15:00.002-05:002014-01-22T20:28:15.728-06:00Moving through timeBefore I start, I know it's been a long time since my last post. Over the next couple days I'm going to write a series of posts about what I've been working on these last two weeks. So without further ado, here is the first one:
<br />
<br />
While I was coding in the last couple of weeks, I noticed that every time I came back to the main game from a debug window, the whole window hung for a good 6 seconds. After looking at my run() loop for a bit, I realized what the problem was. When I returned from the debug window, the next frame would have a massive deltaTime, which in turn caused a huge frame delay. This was partially a problem with how I had structured my frame delay calculation, but in the end, I needed a way to know when the game was paused, and to modify my deltaTime value accordingly.
<br />
<br />
To solve the problem, I came up with a pretty simple Clock class that tracks time, allows pausing, (and if you really wanted scaling/reversing):
<br />
<pre class="brush:cpp">/* Class for handling frame to frame deltaTime while keeping track of time pauses/un-pauses */
class Clock {
public:
Clock(OSystem *system);
private:
OSystem *_system;
uint32 _lastTime;
int32 _deltaTime;
uint32 _pausedTime;
bool _paused;
public:
/**
* Updates _deltaTime with the difference between the current time and
* when the last update() was called.
*/
void update();
/**
* Get the delta time since the last frame. (The time between update() calls)
*
* @return Delta time since the last frame (in milliseconds)
*/
uint32 getDeltaTime() const { return _deltaTime; }
/**
* Get the time from the program starting to the last update() call
*
* @return Time from program start to last update() call (in milliseconds)
*/
uint32 getLastMeasuredTime() { return _lastTime; }
/**
* Pause the clock. Any future delta times will take this pause into account.
* Has no effect if the clock is already paused.
*/
void start();
/**
* Un-pause the clock.
* Has no effect if the clock is already un-paused.
*/
void stop();
};
</pre>
<br />
I'll cover the guts of the functions in a bit, but first, here is their use in the main run() loop:<br />
<pre class="brush:cpp">Common::Error ZEngine::run() {
initialize();
// Main loop
while (!shouldQuit()) {
_clock.update();
uint32 currentTime = _clock.getLastMeasuredTime();
uint32 deltaTime = _clock.getDeltaTime();
processEvents();
_scriptManager->update(deltaTime);
_renderManager->update(deltaTime);
// Update the screen
_system->updateScreen();
// Calculate the frame delay based off a desired frame time
int delay = _desiredFrameTime - int32(_system->getMillis() - currentTime);
// Ensure non-negative
delay = delay < 0 ? 0 : delay;
_system->delayMillis(delay);
}
return Common::kNoError;
}
</pre>
<br />
And lastly, whenever the engine is paused (by a debug console, by the Global Main Menu, by a phone call, etc.), ScummVM core calls <i>pauseEngineIntern(bool pause)</i>, which can be overridden to implement any engine internal pausing. In my case, I can call Clock::start()/stop()<br />
<pre class="brush:cpp">void ZEngine::pauseEngineIntern(bool pause) {
_mixer->pauseAll(pause);
if (pause) {
_clock.stop();
} else {
_clock.start();
}
}
</pre>
<br />
All the work of the class is done by update(). update() gets the current time using getMillis() and subtracts the last recorded time from it to get _deltaTime. If the clock is currently paused, it subtracts off the amount of time that the clock has been paused. Lastly, it clamps the value to positive values.<br />
<pre class="brush:cpp">void Clock::update() {
uint32 currentTime = _system->getMillis();
_deltaTime = (currentTime - _lastTime);
if (_paused) {
_deltaTime -= (currentTime - _pausedTime);
}
if (_deltaTime < 0) {
_deltaTime = 0;
}
_lastTime = currentTime;
}
</pre>
<br />
If you wanted to slow down or speed up time, it would be a simple matter to scale _deltaTime. You could even make it negative to make time go backwards. The full source code can be found <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/clock.cpp" target="_blank">here</a> and <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/clock.h" target="_blank">here</a>.<br />
<br />
Well that's it for this post. Next up is a post about the rendering system. Until then, happy coding!<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-85509367544950406572013-08-03T17:35:00.000-05:002014-01-22T20:28:15.754-06:00The making of psychedelic pictures (AKA, the panorama system)<div style="clear: both; text-align: center;">
<img border="0" height="395" src="http://i.imgur.com/lZAGtq1.png" style="align: center;" width="500" /></div>
<br />
In the game, the backgrounds are very long 'circular' images. By circular, I mean that if you were to put two copies of the same image end-to-end, they would be continuous. So, when the user moves around in the game, we just scroll the image accordingly. However, being that the images are flat, this movement isn't very realistic; it would seem like you are continually moving sideways through an endless room. (Endless staircase memories anyone?)<br />
<br />
<div style="clear: both; text-align: center;">
<img border="0" height="238" src="http://1.bp.blogspot.com/-MMOj-tMI6Og/TdcNWKhErKI/AAAAAAAAAsU/hK8Vdl3xy-g/s320/Mario.JPG" style="align: center;" width="320" /></div>
<br />
To counter this, the makers of ZEngine created 'ZVision': they used trigonometry to warp the images on the screen so, to the user, it looked like you were truly spinning 360 degrees. So let's dive into how exactly they did that.<br />
<br />
The basic premise is mapping an image onto a cylinder and then mapping it back onto a flat plane. The math is all done once and stored into an offset lookup table. Then the table is referenced to warp the images.<br />
<div style="clear: both; text-align: center;">
<img border="0" height="315" src="http://i.imgur.com/nvULP6j.png" width="400" /></div>
<div style="text-align: center;">
Without warping
</div>
<br />
<div style="clear: both; text-align: center;">
<img border="0" height="313" src="http://i.imgur.com/DpRCtrY.png" width="400" /></div>
<div style="text-align: center;">
With warping</div>
<br />
You'll notice that the images are pre-processed as though they were captured with a panorama camera.<br />
<br />
Video example:<br />
<div class="separator" style="clear: both; text-align: center;">
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://i1.ytimg.com/vi/aJxDZIqW_f4/0.jpg" height="266" width="320"><param name="movie" value="http://www.youtube.com/v/aJxDZIqW_f4?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="http://www.youtube.com/v/aJxDZIqW_f4?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div>
<br />
<br />
Here is the function for creating the panorama lookup table:<br />
<pre class="brush:cpp">void RenderTable::generatePanoramaLookupTable() {
memset(_internalBuffer, 0, _numRows * _numColumns * sizeof(uint16));
float halfWidth = (float)_numColumns / 2.0f;
float halfHeight = (float)_numRows / 2.0f;
float fovRadians = (_panoramaOptions.fieldOfView * M_PI / 180.0f);
float halfHeightOverTan = halfHeight / tan(fovRadians);
float tanOverHalfHeight = tan(fovRadians) / halfHeight;
for (uint x = 0; x < _numColumns; x++) {
// Add an offset of 0.01 to overcome zero tan/atan issue (vertical line on half of screen)
float temp = atan(tanOverHalfHeight * ((float)x - halfWidth + 0.01f));
int32 newX = int32(floor((halfHeightOverTan * _panoramaOptions.linearScale * temp) + halfWidth));
float cosX = cos(temp);
for (uint y = 0; y < _numRows; y++) {
int32 newY = int32(floor(halfHeight + ((float)y - halfHeight) * cosX));
uint32 index = y * _numColumns + x;
// Only store the x,y offsets instead of the absolute positions
_internalBuffer[index].x = newX - x;
_internalBuffer[index].y = newY - y;
}
}
}
</pre>
<br />
I don't quite understand all the math here, so at the moment it is just a cleaned-up version of what Marisa Chan had. If any of you would like to help me understand/clean up some of the math here I would be extremely grateful!<br />
<br />
Putting aside the math for the time being, the function creates an (dx, dy) offset at each (x,y) coordinate. Or in other words, if we want the pixel located at (x,y), we should instead look at pixel (x + dx, y + dy). So to blit an image to the screen, we do this:<br />
<ol>
<li>Iterate though each pixel</li>
<li>Use the (x,y) coordinates to look up a (dx, dy) offset in the lookup table</li>
<li>Look up that pixel color in the source image at (x + dx, y + dy)</li>
<li>Set that pixel in the destination image at (x,y)</li>
<li>Blit the destination image to the screen using OSystem::copyRectToScreen()</li>
</ol>
<br />
Steps 1 - 4 are done in mutateImage()<br />
<pre class="brush:cpp">void RenderTable::mutateImage(uint16 *sourceBuffer, uint16* destBuffer, uint32 imageWidth, uint32 imageHeight, Common::Rect subRectangle, Common::Rect destRectangle) {
bool isTransposed = _renderState == RenderTable::PANORAMA
for (int y = subRectangle.top; y < subRectangle.bottom; y++) {
uint normalizedY = y - subRectangle.top;
for (int x = subRectangle.left; x < subRectangle.right; x++) {
uint normalizedX = x - subRectangle.left;
uint32 index = (normalizedY + destRectangle.top) * _numColumns + (normalizedX + destRectangle.left);
// RenderTable only stores offsets from the original coordinates
uint32 sourceYIndex = y + _internalBuffer[index].y;
uint32 sourceXIndex = x + _internalBuffer[index].x;
// Clamp the yIndex to the size of the image
sourceYIndex = CLIP<uint32>(sourceYIndex, 0, imageHeight - 1);
// Clamp the xIndex to the size of the image
sourceXIndex = CLIP<uint32>(sourceXIndex, 0, imageWidth - 1);
if (isTransposed) {
destBuffer[normalizedY * destRectangle.width() + normalizedX] = sourceBuffer[sourceXIndex * imageHeight + sourceYIndex];
} else {
destBuffer[normalizedY * destRectangle.width() + normalizedX] = sourceBuffer[sourceYIndex * imageWidth + sourceXIndex];
}
}
}
}
</pre>
<br />
<ul>
<li>Since the whole image can't fit on the screen, we iterate over a subRectangle of the image instead of the whole width/height.</li>
<li>destRectangle refers to where the image will be placed on the screen. It is in screen space, so we use it to offset the image coordinates in the lookup table (line 10).</li>
<li>We clip the coordinates to the height/width of the image to ensure no "index out of range" exceptions.</li>
</ul>
<br />
<br />
You may have noticed the last bit of code hinted at panoramas being transposed. For some reason, the developers chose to store panorama image data transposed. (Perhaps it made their math easier?) By transposed, I mean a pixel (x,y) in the true image would instead be stored at (y, x). Also the image height and width would be swapped. So an image that is truly 1440x320 would instead be 320x1440. If you have any insights into this, I'm all ears. Swapping x and y in code was trivial enough though. I would like to note that prior to calling mutateImage, I check if the image is a panorama, and if so, swap the width and height. So the imageWidth and imageHeight in the function are the width/height of the true image, not of the actual source image. This code that does the swap can be found in the function <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/render_manager.cpp#L66" target="_blank">RenderManager::renderSubRectToScreen</a>.<br />
<br />
Well, that's it for now. My next goal is to get the majority of the events working so I can load a room and the background image, music, etc. load automatically. So until next time, happy coding!<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com7tag:blogger.com,1999:blog-4016790357096156934.post-45432526811373710892013-07-29T11:51:00.002-05:002014-01-22T20:28:15.731-06:00“One person's data is another person's noise.”I know it's been forever since I've done a post and I'm really sorry. I got caught up in the sound issues and panorama issues. I'm going to talk about sound in this post and then make another post about panoramas. So here we go!<br />
<br />
“One person's data is another person's noise.” ― K.C. Cole<br />
<br />
This quote pretty much sums up my experiences with the sound decoding. I was somewhat lucky in that Marisa Chan's source code had an implementation of sound decoding that I could model off of, but at the same time, the whole function was quite cryptic. This is mostly due to the variable "naming". And I say "naming" in the loosest sense of the word, because most were single letters:<br />
<br />
<pre class="brush:cpp">void adpcm8_decode(void *in, void *out, int8_t stereo, int32_t n)
{
uint8_t *m1;
uint16_t *m2;
m1 = (uint8_t *)in;
m2 = (uint16_t *)out;
uint32_t a, x, j = 0;
int32_t b, i, t[4] = {0, 0, 0, 0};
while (n)
{
a = *m1;
i = t[j+2];
x = t2[i];
b = 0;
if(a & 0x40)
b += x;
if(a & 0x20)
b += x >> 1;
if(a & 0x10)
b += x >> 2;
if(a & 8)
b += x >> 3;
if(a & 4)
b += x >> 4;
if(a & 2)
b += x >> 5;
if(a & 1)
b += x >> 6;
if(a & 0x80)
b = -b;
b += t[j];
if(b > 32767)
b = 32767;
else if(b < -32768)
b = -32768;
i += t1[(a >> 4) & 7];
if(i < 0)
i = 0;
else if(i > 88)
i = 88;
t[j] = b;
t[j+2] = i;
j = (j + 1) & stereo;
*m2 = b;
m1++;
m2++;
n--;
}
}
</pre>
<br />
No offense intended towards Marisa Chan, but that makes my eyes hurt. It made understanding the algorithm that much harder. But after talking to a couple people at ScummVM and Wikipedia-ing general sound decoding, I figured out the sound is encoded using a modified Microsoft Adaptive PCM. I'll go ahead and post my implementation and then describe the process:<br />
<br />
<pre class="brush:cpp">const int16 RawZorkStream::_stepAdjustmentTable[8] = {-1, -1, -1, 1, 4, 7, 10, 12};
const int32 RawZorkStream::_amplitudeLookupTable[89] = {0x0007, 0x0008, 0x0009, 0x000A, 0x000B, 0x000C, 0x000D, 0x000E,
0x0010, 0x0011, 0x0013, 0x0015, 0x0017, 0x0019, 0x001C, 0x001F,
0x0022, 0x0025, 0x0029, 0x002D, 0x0032, 0x0037, 0x003C, 0x0042,
0x0049, 0x0050, 0x0058, 0x0061, 0x006B, 0x0076, 0x0082, 0x008F,
0x009D, 0x00AD, 0x00BE, 0x00D1, 0x00E6, 0x00FD, 0x0117, 0x0133,
0x0151, 0x0173, 0x0198, 0x01C1, 0x01EE, 0x0220, 0x0256, 0x0292,
0x02D4, 0x031C, 0x036C, 0x03C3, 0x0424, 0x048E, 0x0502, 0x0583,
0x0610, 0x06AB, 0x0756, 0x0812, 0x08E0, 0x09C3, 0x0ABD, 0x0BD0,
0x0CFF, 0x0E4C, 0x0FBA, 0x114C, 0x1307, 0x14EE, 0x1706, 0x1954,
0x1BDC, 0x1EA5, 0x21B6, 0x2515, 0x28CA, 0x2CDF, 0x315B, 0x364B,
0x3BB9, 0x41B2, 0x4844, 0x4F7E, 0x5771, 0x602F, 0x69CE, 0x7462, 0x7FFF};
int RawZorkStream::readBuffer(int16 *buffer, const int numSamples) {
uint32 bytesRead = 0;
// 0: Left, 1: Right
byte channel = 0;
while (bytesRead < numSamples) {
byte encodedSample = _stream->readByte();
if (_stream->eos()) {
_endOfData = true;
return bytesRead;
}
bytesRead++;
int16 index = _lastSample[channel].index;
uint32 lookUpSample = _amplitudeLookupTable[index];
int32 sample = 0;
if (encodedSample & 0x40)
sample += lookUpSample;
if (encodedSample & 0x20)
sample += lookUpSample >> 1;
if (encodedSample & 0x10)
sample += lookUpSample >> 2;
if (encodedSample & 8)
sample += lookUpSample >> 3;
if (encodedSample & 4)
sample += lookUpSample >> 4;
if (encodedSample & 2)
sample += lookUpSample >> 5;
if (encodedSample & 1)
sample += lookUpSample >> 6;
if (encodedSample & 0x80)
sample = -sample;
sample += _lastSample[channel].sample;
sample = CLIP(sample, -32768, 32767);
buffer[bytesRead - 1] = (int16)sample;
index += _stepAdjustmentTable[(encodedSample >> 4) & 7];
index = CLIP<int16>(index, 0, 88);
_lastSample[channel].sample = sample;
_lastSample[channel].index = index;
// Increment and wrap the channel
channel = (channel + 1) & _stereo;
}
return bytesRead;
}
</pre>
<br />
Each sample is encoded into 8 bits. The actual sound sample is read from the bits using a lookup table and an index from the previous 'frame'. This is then added to the sample from last 'frame'. Finally, the 4 high bits are used to set the index for the next 'frame'.<br />
<br />
The biggest problem I ran into for sound was actually a typo on my part. The template argument for CLIP was accidentally set to a uint16 instead of a int16. This caused distortions at the extremely high and low ranges of the sound. But, this usually only occurred at the beginning and end of a sound clip. I spent days trying to figure out if I had set the initial lastSample correctly, or other random ideas. After pounding my head into the desk for 3 days, the glorious wjp came along and found my typo. After which, the sound worked perfectly. Shout out to wjp!!!!!!!!!<br />
<br />
There is one other bug with sound and that's in videos. The sound has a slight 'ticking'. However, clone2727 identified it potentially as a problem with the AVI decoder. In the current state, the AVI decoder puts each sound 'chunk' into its own AudioStream, and then puts all the streams into a queue to be played. We're thinking the lastSample needs to persist from chunk to chunk. However, solving this problem would take either a gross hack, or a redesign of the AVI decoder. clone2727 has taken on the task, so I'm going to leave it to him and get back to the video audio later in the project.<br />
<br />
Well, that's it for this post. Sound was pretty straightforward. I was only bogged down due to some really bad typos on my part. As always, feel free to comment or ask questions.<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com3tag:blogger.com,1999:blog-4016790357096156934.post-52880688887703060232013-07-17T16:21:00.002-05:002014-01-22T20:28:15.739-06:00The Engine Skeleton Gains Some Tendons - Part 2Part 2!! As a recap from last post, I started out last week by implementing image handling, video handling, and a text debug console.<br />
<br />
I started with the console as it allows me to to map typed commands to functions. (IE. 'loadimage zassets/castle/cae4d311.tga' calls loadImageToScreen() on that file) This is extremely useful in that I can load an image multiple times or I can load different images all without having to re-run the engine or recompile.<br />
<br />
Creating the text console was actually extremely easy because it was already written. I just to inherit from the base class:<br />
<pre class="brush:cpp">class Console : public GUI::Debugger {
public:
Console(ZEngine *engine);
virtual ~Console() {}
private:
ZEngine *_engine;
bool cmdLoadImage(int argc, const char **argv);
bool cmdLoadVideo(int argc, const char **argv);
bool cmdLoadSound(int argc, const char **argv);
};
</pre>
<br />
In the constructor, I just registered the various commands:
<br />
<pre class="brush:cpp">Console::Console(ZEngine *engine) : GUI::Debugger(), _engine(engine) {
DCmd_Register("loadimage", WRAP_METHOD(Console, cmdLoadImage));
DCmd_Register("loadvideo", WRAP_METHOD(Console, cmdLoadVideo));
DCmd_Register("loadsound", WRAP_METHOD(Console, cmdLoadSound));
}
</pre>
<br />
And then, in ZEngine::initialize() I created an instance of my custom class:
<br />
<pre class="brush:cpp">void ZEngine::initialize() {
.
.
.
_console = new Console(this);
}
</pre>
<br />
And lastly, I registered a key press combination to bring up the debug console
<br />
<pre class="brush:cpp">void ZEngine::processEvents() {
while (_eventMan->pollEvent(_event)) {
switch (_event.type) {
case Common::EVENT_KEYDOWN:
switch (_event.kbd.keycode) {
case Common::KEYCODE_d:
if (_event.kbd.hasFlags(Common::KBD_CTRL)) {
// Start the debugger
_console->attach();
_console->onFrame();
}
break;
}
break;
}
}
}
</pre>
<br />
With that done, I can press ctrl+d, and this is what pops up:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/TnB5bsI.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/TnB5bsI.png" /></a></div>
<br />
Awesome! With that done, I could move on to images. All the images in ZNem and ZGI are .tga files, but don't be fooled; the vast majority of them aren't actually TGA. They're actually TGZ, a custom image format. The format itself isn't too difficult, and I give all the credit to <a href="http://forum.xentax.com/viewtopic.php?f=18&t=3511&sid=af21b2ecfc2990f4cdec70a1585df31a" target="_blank">Mr. Mouse on Xentax</a>.<br />
<pre class="brush:cpp">Byte[4] "TGZ\0"
uint32 Original size of bitmap data
uint32 Width of image
uint32 Heigth of image
Byte[n] Bitmap data (LZSS compressed)
</pre>
<br />
I could have created a class for decoding TGZ, but with it being that simple, I just chose to integrate the decoding in the renderImageToScreen method:<br />
<pre class="brush:cpp">void ZEngine::renderImageToScreen(const Common::String &fileName, uint32 x, uint32 y) {
Common::File file;
if (!file.open(fileName)) {
error("Could not open file %s", fileName.c_str());
return;
}
// Read the magic number
// Some files are true TGA, while others are TGZ
char fileType[4];
file.read(fileType, 4);
// Check for TGZ files
if (fileType[0] == 'T' && fileType[1] == 'G' && fileType[2] == 'Z' && fileType[3] == '\0') {
// TGZ files have a header and then Bitmap data that is compressed with LZSS
uint32 decompressedSize = file.readSint32LE();
uint32 width = file.readSint32LE();
uint32 height = file.readSint32LE();
LzssReadStream stream(&file);
byte *buffer = new byte[decompressedSize];
stream.read(buffer, decompressedSize);
_system->copyRectToScreen(buffer, width * 2, x, y, width, height);
} else {
// Reset the cursor
file.seek(0);
// Decode
Graphics::TGADecoder tga;
if (!tga.loadStream(file)) {
error("Error while reading TGA image");
return;
}
const Graphics::Surface *tgaSurface = tga.getSurface();
_system->copyRectToScreen(tgaSurface->pixels, tgaSurface->pitch, x, y, tgaSurface->w, tgaSurface->h);
tga.destroy();
}
_needsScreenUpdate = true;
}
</pre>
<br />
So after using the loadimage command in the console, we get a wonderful picture on the screen:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/p52bibb.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/p52bibb.png" /></a></div>
<br />
Video!! Implementing the image aspect of video was rather trivial, as ZEngine uses a standard AVI format. The only 'wrinkle' was that the videos used a different PixelFormat. Every other part of the engine uses RGB 555, but videos use RGB 565. However, when a video is playing, it's only thing going on. So, I can reinitialize the graphics to RGB 565 before playing a video, and reset it back to RGB 555 when the video finishes:<br />
<pre class="brush:cpp">void ZEngine::startVideo(Video::VideoDecoder *videoDecoder) {
if (!videoDecoder)
return;
_currentVideo = videoDecoder;
Common::List<graphics::pixelformat> formats;
formats.push_back(videoDecoder->getPixelFormat());
initGraphics(_width, _height, true, formats);
.
.
.
}
</graphics::pixelformat></pre>
<pre class="brush:cpp">void ZEngine::continueVideo() {
.
.
.
if (!_currentVideo->endOfVideo()) {
// Code to render the current frame
} else {
initGraphics(_width, _height, true, &_pixelFormat);
delete _currentVideo;
_currentVideo = 0;
delete _scaledVideoFrameBuffer;
_scaledVideoFrameBuffer = 0;
}
}
</pre>
Where _pixelFormat is a const PixelFormat member variable of the ZEngine class.<br />
<br />
One other slight wrinkle is that the video is at a resolution of 256 x 160, which is quite small if I do say so myself. To fix that, I used a linear 2x scaler that <a href="https://gist.github.com/RichieSams/6024532" target="_blank">[md5] wrote</a> and scaled every frame. Using the opening cinematic as an example, we get this:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i.imgur.com/ohytU23.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i.imgur.com/ohytU23.png" /></a></div>
<br />
However, the sound in video is messed up, and it's actually been what I've been working on this week, but I'll save that for another post.<br />
<br />
I'm now two steps closer to getting all the parts of the engine implemented and somewhat tied together. As always, if you have an suggestions or comments, feel free to comment below.<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-64005105053713568972013-07-11T01:27:00.005-05:002014-01-22T20:28:15.744-06:00The Engine Skeleton Gains Some Tendons - Part 1Being a little tired of the script system, I started last week by adding image handling, video handling and a text debug console to the engine. With that done, I tried piecing together how the script system worked as a whole. After a long talk with Fuzzie, we figured out the majority of the system worked and I've spent the beginning of this week putting it into code.<br />
<br />
I'll start with the script system since it's fresh in my mind. Rather than try to explain what I learned, I'll just explain my current understanding of the system and it's behavior.<br />
<br />
The system is governed by five main containers:<br />
<pre class="brush:cpp">Common::HashMap<uint32, byte> _globalState;
Common::List<ActionNode *> _activeNodes;
Common::HashMap<uint32, Common::Array<Puzzle *>> _referenceTable;
Common::Stack<Puzzle *> _puzzlesToCheck;
Common::List<Puzzle> _activePuzzles;
Common::List<Control> _activeControls;
</pre>
<br />
<span style="color: #c27ba0;">_globalState</span> holds the state of the entire game. Each key is a hash that can represent anything from a timer to whether a certain puzzle has been solved. The value depends on the what the key is, however, the vast majority are boolean states (0 or 1).<br />
<br />
<span style="color: #c27ba0;">_activeNodes</span> holds... wait for it... the active ActionNodes. Imagine that! Nodes are anything that needs to be processed over time. For example, a timer, an animation, etc. I'll explain further later in the post.<br />
<br />
<span style="color: #c27ba0;">_referenceTable</span> stores references to the Puzzles that certain globalState keys have. This can be thought of as a reverse of the <a href="https://gist.github.com/RichieSams/5959662" target="_blank">Puzzle struct</a>. A Puzzle stores a list of globalState keys to be checked. <span style="color: #c27ba0;">_referenceTable</span> stores which Puzzles reference certain globalState keys. Why would we want to do this? It means that any puzzles loaded into the _reference table only have to be checked once, instead of every frame. When a value in _globalState is changed, it adds the referenced Puzzle to <span style="color: #c27ba0;">_puzzlesToCheck</span><br />
<br />
<span style="color: #c27ba0;">_puzzlesToCheck</span> holds the Puzzles whose Criteria we want to check against <span style="color: #c27ba0;">_globalState</span>. This stack is exhausted every frame. It is filled either by <span style="color: #c27ba0;">_referenceTable</span> or when we enter a new room.<br />
<br />
<span style="color: #c27ba0;">_activePuzzles</span> is where the room's Puzzles are stored. The Puzzle pointers in <span style="color: #c27ba0;">_referenceTable</span> and <span style="color: #c27ba0;">_puzzlesToCheck</span> point to here.<br />
<br />
I realize that the descriptions are still a bit vague, so I figured I would go through an example of sorts and how the containers behave.<br />
<br />
Every time we change rooms:<br />
<ol>
<li>Clear _referenceTable, _puzzlesToCheck, and _activePuzzles</li>
<li>Open and parse the corresponding .scr file into Puzzle structs and store them in _activePuzzles. (See last three blog posts)</li>
<li>Iterate through all the Puzzles and their Criteria and create references from a globalState key to the Puzzle. (See createReferenceTable below)</li>
<li>Add all Puzzles to _puzzlesToCheck</li>
</ol>
<pre class="brush:cpp">void ScriptManager::createReferenceTable() {
// Iterate through each Puzzle
for (Common::List<Puzzle>::iterator activePuzzleIter = _activePuzzles.begin(); activePuzzleIter != _activePuzzles.end(); activePuzzleIter++) {
Puzzle *puzzlePtr = &(*activePuzzleIter);
// Iterate through each Criteria and add a reference from the criteria key to the Puzzle
for (Common::List<Criteria>::iterator criteriaIter = activePuzzleIter->criteriaList.begin(); criteriaIter != (*activePuzzleIter).criteriaList.end(); criteriaIter++) {
_referenceTable[criteriaIter->key].push_back(puzzlePtr);
// If the argument is a key, add a reference to it as well
if (criteriaIter->argument)
_referenceTable[criteriaIter->argument].push_back(puzzlePtr);
}
}
// Remove duplicate entries
for (Common::HashMap<uint32, Common::Array<Puzzle *>>::iterator referenceTableIter; referenceTableIter != _referenceTable.end(); referenceTableIter++) {
removeDuplicateEntries(&(referenceTableIter->_value));
}
}
</pre>
<br />
Every frame:<br />
<ol>
<li>Iterate through each ActionNode in _activeNodes and call process() on them</li>
<li>If process() returns true, remove and delete the ActionNode</li>
</ol>
<pre class="brush:cpp">void ScriptManager::updateNodes(uint32 deltaTimeMillis) {
// If process() returns true, it means the node can be deleted
for (Common::List<ActionNode *>::iterator iter = _activeNodes.begin(); iter != _activeNodes.end();) {
if ((*iter)->process(_engine, deltaTimeMillis)) {
// Remove the node from _activeNodes, then delete it
ActionNode *node = *iter;
iter = _activeNodes.erase(iter);
delete node;
} else {
iter++;
}
}
}
</pre>
<pre class="brush:cpp">bool NodeTimer::process(ZEngine *engine, uint32 deltaTimeInMillis) {
_timeLeft -= deltaTimeInMillis;
if (_timeLeft <= 0) {
engine->getScriptManager()->setStateValue(_key, 0);
return true;
}
return false;
}
</pre>
<br />
<ol>
<li>While _puzzlesToCheck is not empty, pop a Puzzle off the stack and check its Criteria against <span style="color: #c27ba0;">_globalState</span></li>
<li>If any of the Criteria pass, call execute() on the corresponding ResultAction.</li>
<ul>
<li>Some ResultAction's might create ActionNode's and add them to<span style="color: #c27ba0;"> _activeNodes</span>. IE ActionTimer</li>
</ul>
</ol>
<br />
<pre class="brush:cpp">void ScriptManager::checkPuzzleCriteria() {
while (!_puzzlesToCheck.empty()) {
Puzzle *puzzle = _puzzlesToCheck.pop();
// Check each Criteria
for (Common::List<Criteria>::iterator iter = puzzle->criteriaList.begin(); iter != puzzle->criteriaList.end(); iter++) {
bool criteriaMet = false;
// Get the value to compare against
byte argumentValue;
if ((*iter).argument)
argumentValue = getStateValue(iter->argument);
else
argumentValue = iter->argument;
// Do the comparison
switch ((*iter).criteriaOperator) {
case EQUAL_TO:
criteriaMet = getStateValue(iter->key) == argumentValue;
break;
case NOT_EQUAL_TO:
criteriaMet = getStateValue(iter->key) != argumentValue;
break;
case GREATER_THAN:
criteriaMet = getStateValue(iter->key) > argumentValue;
break;
case LESS_THAN:
criteriaMet = getStateValue(iter->key) < argumentValue;
break;
}
// TODO: Add logic for the different Flags (aka, ONCE_PER_INST)
if (criteriaMet) {
for (Common::List<ResultAction *>::iterator resultIter = puzzle->resultActions.begin(); resultIter != puzzle->resultActions.end(); resultIter++) {
(*resultIter)->execute(_engine);
}
}
}
}
}
</pre>
<pre class="brush:cpp">bool ActionTimer::execute(ZEngine *zEngine) {
zEngine->getScriptManager()->addActionNode(new NodeTimer(_key, _time));
return true;
}
</pre>
<br />
So that's the script system. I've tried to explain it in the best way possible, but if you guys have any questions or suggestions for my implementation, as always, feel free to comment.<br />
<br />
Details on the image handling, video handling and the text debug console will be in Part 2, which should be up some time tomorrow. As always, thanks for reading. :)<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-64443597069813736702013-07-01T17:03:00.003-05:002014-01-22T20:28:15.746-06:00Improving the 'Object' class (including renaming it) and using classes for ResultActionsLast week, I posted about using an 'Object' class to encapsulate the variable-typed arguments for ResultActions. You guys posted some awesome feedback and I used it to improve the class. First, I renamed the class to 'SingleValueContainer' so users have a better sense of what it is. Second, following <a href="http://richiesams.blogspot.com/2013/06/implementing-generic-single-value.html?showComment=1372315137107#c2490144333599238582" target="_blank">Fuzzie's advice</a>, I put all the values except for String, directly in the union. It's the same or less memory cost and results in less heap allocations.<br />
<pre class="brush:cpp">union {
bool boolVal;
byte byteVal;
int16 int16Val;
uint16 uint16Val;
int32 int32Val;
uint32 uint32Val;
float floatVal;
double doubleVal;
char *stringVal;
} _value;
</pre>
<br />
You'll notice that the stringVal isn't actually a Common::String object, but rather a pointer to a char array. This saves a bit of memory at the cost of a couple strlen(), memcpy(), and String object assigment.<br />
<pre class="brush:cpp">SingleValueContainer::SingleValueContainer(Common::String value) : _objectType(BYTE) {
_value.stringVal = new char[value.size() + 1];
memcpy(_value.stringVal, value.c_str(), value.size() + 1);
}
</pre>
<pre class="brush:cpp">SingleValueContainer &SingleValueContainer::operator=(const Common::String &rhs) {
if (_objectType != STRING) {
_objectType = STRING;
_value.stringVal = new char[rhs.size() + 1];
memcpy(_value.stringVal, rhs.c_str(), rhs.size() + 1);
return *this;
}
uint32 length = strlen(_value.stringVal);
if (length <= rhs.size() + 1) {
memcpy(_value.stringVal, rhs.c_str(), rhs.size() + 1);
} else {
delete[] _value.stringVal;
_value.stringVal = new char[rhs.size() + 1];
memcpy(_value.stringVal, rhs.c_str(), rhs.size() + 1);
}
return *this;
}
</pre>
<pre class="brush:cpp">bool SingleValueContainer::getStringValue(Common::String *returnValue) const {
if (_objectType != STRING)
warning("'Object' is not storing a Common::String.");
*returnValue = _value.stringVal;
return true;
}
</pre>
<br />
With those changes the class seems quite solid. (The full source can be found <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/singleValueContainer.h" target="_blank">here</a> and <a href="https://github.com/RichieSams/scummvm/blob/zengine/engines/zengine/singleValueContainer.cpp" target="_blank">here</a>). However, after seeing <a href="http://richiesams.blogspot.com/2013/06/implementing-generic-single-value.html?showComment=1372307055323#c4981240489482910586" target="_blank">Zidane Sama's comment</a>, I realized that there was a better way to tackle the problem than variant objects. Instead of trying to generalize the action types and arguments and storing them in structs, a better approach is to create a class for each action type with a common, "execute()" method that will be called by the scriptManager when the Criteria are met for an ResultAction.<br />
<br />
I first created an interface base class that all the different types would inherit from:<br />
<pre class="brush:cpp">class ResultAction {
public:
virtual ~ResultAction() {}
virtual bool execute(ZEngine *zEngine) = 0;
};
</pre>
<br />
Next, I created the individual classes for each type of ResultAction:<br />
<pre class="brush:cpp">class ActionAdd : public ResultAction {
public:
ActionAdd(Common::String line);
bool execute(ZEngine *zEngine);
private:
uint32 _key;
byte _value;
};
</pre>
<br />
The individual classes parse out any arguments in their constructor and store them in member variables. In execute(), they execute the logic pertaining to their action. A pointer to ZEngine is passed in order to give the method access to all the necessary tools (modifying graphics, scriptManager states, sounds, etc.)<br />
<pre class="brush:cpp">class ResultAction {
ActionAdd::ActionAdd(Common::String line) {
sscanf(line.c_str(), ":add(%u,%hhu)", &_key, &_value);
}
bool ActionAdd::execute(ZEngine *zEngine) {
zEngine->getScriptManager()->addToStateValue(_key, _value);
return true;
}
</pre>
<br />
Thus, in the script file parser I can just look for the action type and then pass create an action type, passing the constructor the whole line:<br />
<pre class="brush:cpp">while (!line.contains('}')) {
// Parse for the action type
if (line.matchString("*:add*", true)) {
actionList.push_back(ActionAdd(line));
} else if (line.matchString("*:animplay*", true)) {
actionList.push_back(ActionAnimPlay(line));
} else if (.....)
.
.
.
}
</pre>
<br />
While this means I have to create 20+ classes for all the different types of actions, I think this method nicely encapsulates and abstracts both the parsing and the action of the result.<br />
<br />
I'm a bit sad that I'm not going to be using the 'SingleValueContainer' class, but if nothing else, I learned quite a bit while creating it. Plus, I won't be getting rid of it, so it might have a use somewhere else.<br />
<br />
This coming week I need to finish creating all the classes and then try to finish the rest of the engine skeleton. As always, feel free to comment / ask questions.<br />
<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com2tag:blogger.com,1999:blog-4016790357096156934.post-8508386582583520702013-06-26T22:19:00.005-05:002014-01-22T20:28:15.751-06:00Implementing a generic single value container in c++In my previous post I explained the format of the script system for ZEngine. Each Puzzle has a Results section which essentially stores function names and their arguments:<br />
<pre class="brush:cpp">results {
action:assign(5985, 0)
background:timer:7336(60)
event:change_location(C,B,C0,1073)
background:music:5252(1 a000h1tc.raw 1)
}
</pre>
I wanted to be able to store each action inside a struct, and then have a linked list of all the structs. However, the problem is that both the number of arguments and the size of the arguments are variable. Marisa Chan's solution was to store all the arguments in a space delimited char array. IE:<br />
<pre class="brush:cpp">char arguments[25] = "1 a00h1tc.raw 1";
</pre>
<br />
Simple, but not without it's problems.<br />
<ol>
<li>Since the char array is in a struct, the size is fixed. In order to make sure we never overflow, we have to allocate a fairly large array. That said, in this particular case, each 'large' array in this case would only be ~30 bytes per struct.</li>
<li>By storing everything as strings, we put off parsing till the action function is actually called. At first glace, this doesn't seem too bad, since the data will have to be parsed anyway. However, this method forces it to be parsed at <i>every</i> call to that action function.</li>
</ol>
<br />
Another option was to have everything stored in a linked list of void pointers. However, I don't think I need to convince anyone that void pointers are just gross and using them would be just asking for problems.<br />
<br />
What I <i>really</i> wanted was a typed way to store a variably typed (and sized) value. Therefore I created what I'm calling the "Object" class. (I'm up for suggestions for a better name)
<br />
<br />
The heart of the class is a union that stores a variety of pointers to different types and an enum that defines what type is being stored:
<br />
<pre class="brush:cpp">class Object {
public:
enum ObjectType : byte {
BOOL,
BYTE,
INT16,
UINT16,
INT32,
UINT32,
FLOAT,
DOUBLE,
STRING,
};
private:
ObjectType _objectType;
union {
bool *boolVal;
byte *byteVal;
int16 *int16Val;
uint16 *uint16Val;
int32 *int32Val;
uint32 *uint32Val;
float *floatVal;
double *doubleVal;
Common::String *stringVal;
} _value;
}
</pre>
<span style="color: #c27ba0;">_objectType</span> keeps track of what type of data the object is storing and <span style="color: #c27ba0;">_value</span> points to the actual data. If <span style="color: #c27ba0;">_value</span> were instead to hold the actual data value, the union would be forced to sizeof(Common::String), which is quite large (~34 bytes), due to internal caching. Then we're back to the argument of storing things in containers much larger than what they need. By putting the data on the heap and only storing pointers to the data, we save the wasted space, but at the CPU cost of heap allocation.<br />
<br />
Now that the data is stored, how do we get it back? My original idea was to have implicit cast operators:<br />
<pre class="brush:cpp">operator bool();
operator byte();
operator int16();
.
.
.
</pre>
However, LordHoto, one of the GSoC mentors and ScummVM developers, brought my attention to the problems that can arise when using implicit casting. For example, a user could try to cast the data to a type that wasn't stored in the Object and the cast would work, but the data would be completely corrupted. Also, from a user point of view, it wasn't intuitive.<br />
<br />
Therefore, I removed the cast operators and created accessor methods:<br />
<pre class="brush:cpp">bool getBoolValue(bool *returnValue) const;
bool getByteValue(byte *returnValue) const;
bool getInt16Value(int16 *returnValue) const;
.
.
.
</pre>
<br />
<pre class="brush:cpp">bool Object::getBoolValue(bool *returnValue) const {
if (_objectType != BOOL) {
warning("'Object' not of type bool.");
return false;
}
*returnValue = *_value.boolVal;
return true;
}
</pre>
This adds a layer of type semi-protection to the class.<br />
<br />
Lastly, I added assigment operators to the class, but rather than making this post even longer, I'll just link the full source <a href="https://gist.github.com/RichieSams/5873413" target="_blank">here</a> and <a href="https://gist.github.com/RichieSams/5873397" target="_blank">here</a>.<br />
<br />
<br />
Advantages of 'Object' class<br />
<ul>
<li>Can store relatively 'any' type of data. (Any type not currently supported could be trivially added)</li>
<li>Only uses as much space as needed.</li>
<li>Transforms dynamically typed data into a statically typed 'box' that can be stored in arrays, linked lists, hashmaps, etc. and can be iterated upon</li>
</ul>
Disadvantages of 'Object' class<br />
<ul>
<li>Adds a small memory overhead per object. ( 1 byte + sizeof(Operating System pointer) )</li>
<li>Adds one heap memory allocation per object</li>
</ul>
<br />
<br />
So is it better than Marisa Chan's implementation? It really depends on what you define as better. While it does save memory, only requires data to be parsed once, and, in my opinion, adds a great deal of elegance to handling the Results arguments, it does so at the cost of heap storage. Not only the cost of the initial allocation, but the cost of potential defragmentation runs. But then again, is the cost of heap storage really that big, especially since the data should have a relatively long life? (On average, the time an end user spends in a room in the game) That I don't know, since it all depends on the memory allocator implementation.<br />
<br />
In the end, I believe both methods perform well, and as such I choose the eloquence of using the 'Object' class. I am very much open to your thoughts on both the class as a whole or on your take of the problem. Also, if I misspoke about something please, please, please let me know.<br />
<br />
Thanks for reading and have fun coding,<br />
-<span style="color: #dd7700;">RichieSams</span>
<br />
<br />
<br />
<i><b>Edit: </b></i>Upon further inspection I noticed that by using Common::String I'm not only negating any memory size benefits from using 'Object', but potentially even using more memory, since Common::String has such a huge size.<br />
<pre class="brush:plain">Marisa Chan:
char arguments[25] = "1 a00h1tc.raw 1";
size = 25;
Object:
Object arg1 = 1;
Object arg2 = "a00h1tc.raw";
Object arg3 = 1;
size = (3 *sizeof(Object)) + sizeof(byte) + sizeof(Common::String) + sizeof(byte);
size = 15 + 1 + 34 + 1;
size = 51;
</pre>
I could instead store the data in a char array, but it would mean that the Object class would need a stringLength member, which adds another 1 - 4 bytes on every instance of the class.
Even with this new insight, I think I will continue to use 'Object', again for the added eloquence it adds. The memory difference is still rather small.RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com7tag:blogger.com,1999:blog-4016790357096156934.post-18066170728041494142013-06-26T19:42:00.001-05:002014-01-22T20:28:15.726-06:00Scripting!!!!!!I just realized that I forgot to do a post last week! I was being so productive, time just flew by.<br />
<br />
Last week and the beginning of this week I've been working on the script management system for ZEngine. Well, before I get into that, let me go back a little further. According to my <a href="https://www.google-melange.com/gsoc/project/google/gsoc2013/richiesams/23001" target="_blank">original timeline</a>, the next milestone was creating a skeleton engine that could do basic rendering, sounds, and events. So, last Monday, I started by cleaning up the main game loop and splitting everything into separate methods and classes. With that, the run loop looks like this:<br />
<pre class="brush:cpp;">Common::Error ZEngine::run() {
initialize();
// Main loop
uint32 currentTime = _system->getMillis();
uint32 lastTime = currentTime;
const uint32 desiredFrameTime = 33; // ~30 fps
while (!shouldQuit()) {
processEvents();
currentTime = _system->getMillis();
uint32 deltaTime = currentTime - lastTime;
lastTime = currentTime;
updateScripts();
updateAnimations(deltaTime);
if (_needsScreenUpdate)
{
_system->updateScreen();
}
// Calculate the frame delay based off a desired frame rate
int delay = desiredFrameTime - (currentTime - _system->getMillis());
// Ensure non-negative
delay = delay < 0 ? 0 : delay;
_system->delayMillis(delay);
}
return Common::kNoError;
}
</pre>
No bad, if I do say so myself. :)<br />
<br />
That done, I started implementing the various method shells, such as processEvents(). It was about that time that I realized the the structure of the scripting system had a huge impact on the structure of the engine as a whole. For example, should the event system call methods directly, or should it just register key presses, etc. and let the script system handle the calls? I had a basic understanding of how it <i>probably</i> worked, knowing the history of adventure games, but it was clear I needed to understand the script system before I could go any further.<br />
<br />
The .scr files themselves are rather simple; they're text-based if-then statements. Here's an example of a puzzle and a control:<br />
<pre class="brush:plain">puzzle:5251 {
criteria {
[4188] = 1
[4209] ! 5
[7347] = 1
[67] = 0
}
criteria {
[4209] > 1
[7347] = 1
[67] = 1
[4188] = [6584]
}
results {
action:assign(5985, 0)
background:timer:7336(60)
event:change_location(C,B,C0,1073)
background:music:5252(1 a000h1tc.raw 1)
}
flags {
ONCE_PER_INST
}
}
control:8454 push_toggle {
flat_hotspot(0,265,511,54)
cursor(backward)
}
</pre>
Puzzles:
<br />
<ul>
<li>Criteria are a set of comparisons. If ANY of the criteria are satisfied, the results are called.</li>
<ul>
<li>The number in square brackets is the key in a 'global' variable hashmap. (The hashmap isn't <i>actually</i> global in my implementation but rather a member variable in the ScriptManager class) </li>
<li>Next is a simplified form of the standard comparison operators ( ==, !=, <, > ). </li>
<li>The last number can either be a constant or a key to another global variable. </li>
</ul>
<li>Results are what happens when one of the criteria is met. The first part defines a function, and the remaining parts are the arguments.</li>
<li>I haven't fully figured out flags, but from what I can see it's a bitwise OR of <i>when</i> results can be called. For example, only once per room.</li>
</ul>
For those of you that understand code better than words:<br />
<pre class="brush:cpp">if (criteriaOne || criteriaTwo) {
assign(5985, 0);
timer(7336, 60);
change_location('C', 'B', "C0", 1073);
music(5252, 1, "a000h1tc.raw", 1);
}
</pre>
<br />
Controls:
<br />
<ul>
<li>I haven't done much work on controls yet, but from what I have done, they look to be similar to results and are just called whenever interacted with. For example, a lever being toggled.</li>
</ul>
<br />
The majority of the week was spent working on the best way to store this information so all the conditions could be readily tested and actions fired. The best way I've come up with so far, is to have a Criteria struct and a Results struct as follows:<br />
<pre class="brush:cpp">/** Criteria for a Puzzle result to be fired */
struct Criteria {
/** The id of a global state */
uint32 id;
/**
* What we're comparing the value of the global state against
* This can either be a pure value or it can be the id of another global state
*/
uint32 argument;
/** How to do the comparison */
CriteriaOperator criteriaOperator;
/** Is 'argument' the id of a global state or a pure value */
bool argumentIsAnId;
};
</pre>
<br />
<pre class="brush:cpp">/** What happens when Puzzle criteria are met */
struct Result {
ResultAction action;
Common::List<Object> arguments;
};
</pre>
<br />
CriteriaOperator is an enum of the operators and ResultAction is an enum of all the possible actions. The other variables are pretty self explanatory.<br />
<br />
Using the Criteria and Result structs, the Puzzle struct is:<br />
<pre class="brush:cpp">struct Puzzle {
uint32 id;
Common::List<criteria> criteriaList;
Common::List<result> resultList;
byte flags;
};
</pre>
<br />
Thus, the process is: read a script file, parse the puzzles into structs and load the structs into a linked list representing all the currently active puzzles. Elegant and exceedingly fast to iterate for criteria comparison checking. Now, some of you may have noticed the 'Object' class and are probably thinking to yourselves, "I thought this was c++, not c# or <insert terrible coffee-named language here>." It is, but that is a whole post to itself, which I will be writing after this one.<br />
<br />
So, a couple hundred words in, what have I said? Well, over this past week I discovered how the script system determines what events to fire. This has helped me not only to design the script system code, but also has given me insight into how to design the other systems in the engine. For example, I now know that mouse and keyboard events will just translate to setting global state variables.<br />
<br />
What I have left to do in the ScriptManager:<br />
<ul>
<li>Figure out what CriteriaFlags are used for</li>
<li>Create shell methods for all the Result 'actions'</li>
<li>Write the parser and storage for control and figure out how they are called</li>
</ul>
<div>
<br /></div>
Well that's about it for this post, so until next time,<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-50528851856209684732013-06-12T22:59:00.001-05:002014-01-22T20:28:15.737-06:00ZFS File formatOver the years I've reverse engineered quite a few file formats, but I've never really sat down and picked apart why a format was designed the way it was. With that said, I wanted to show the ZFS archive file format and highlight some of the peculiarities I saw and perhaps you guys can answer some of my questions.<br />
<div>
<br />
For some context, Z-engine was created around 1995 and was used on Macintosh, MS-DOS, and Windows 95.<br />
<br /></div>
<div>
<b>Format</b></div>
<div>
The main file header is defined as:</div>
<pre class="brush:cpp">
struct ZfsHeader {
uint32 magic;
uint32 unknown1;
uint32 maxNameLength;
uint32 filesPerBlock;
uint32 fileCount;
byte xorKey[4];
uint32 fileSectionOffset;
};
</pre>
<div>
<ul>
<li><span style="color: #a64d79;">magic</span> and <span style="color: #a64d79;">unknown1</span> are self explanatory</li>
<li><span style="color: #a64d79;">maxNameLength</span> refers to the length of the block that stores a file's name. Any extra spaces are null.</li>
<li>The archive is split into 'pages' or 'blocks'. Each 'page' contains, at max, <span style="color: #a64d79;">filesPerBlock </span>files</li>
<li><span style="color: #a64d79;">fileCount </span>is total number of files the archive contains</li>
<li><span style="color: #a64d79;">xorKey</span> is the XOR cipher used for encryption of the files</li>
<li><span style="color: #a64d79;">fileSectionOffset</span> is the offset of the main data section, aka fileLength - mainHeaderLength</li>
</ul>
<div>
<br /></div>
</div>
<div>
The file entry header is defined as:</div>
<pre class="brush:cpp">
struct ZfsEntryHeader {
char name[16];
uint32 offset;
uint32 id;
uint32 size;
uint32 time;
uint32 unknown;
};
</pre>
<div>
<ul>
<li><span style="color: #a64d79;">name</span> is the file name right-padded with null characters</li>
<li><span style="color: #a64d79;">offset</span> is the offset to the actual file data</li>
<li><span style="color: #a64d79;">id </span>is a the numeric id of the file. The id's increment from 0 to <span style="color: #a64d79;">fileCount</span></li>
<li><span style="color: #a64d79;">size </span>is the length of the file</li>
<li><span style="color: #a64d79;">unknown </span>is self explanatory</li>
</ul>
<div>
<br /></div>
<div>
Therefore, the entire file structure is as follows:</div>
</div>
<pre class="brush:plain">[Main Header]
[uint32 offsetToPage2]
[Page 1 File Entry Headers]
[Page 1 File Data]
[uint32 offsetToPage3]
[Page 2 File Entry Headers]
[Page 2 File Data]
etc.
</pre>
<div>
<br /></div>
<div>
<br /></div>
<div>
<b>Questions and Observations</b><br />
<b><br /></b>
maxNameLength<br />
Why have a fixed size name block vs. null terminated or [size][string]? Was that just the popular thing to do back then so the entire header to could be cast directly to a struct?<br />
<br />
filesPerBlock<br />
What is the benefit to pagination? The only explanation I can see atm is that it was some artifact of their asset compiler max memory. Maybe I'm missing something since I've never programmed for that type of hardware.<br />
<br />
fileSectionOffset<br />
I've seen things like this a lot in my reverse engineering; they give the offset to a section that's literally just after the header. Even if they were doing straight casting instead of incremental reading, a simple sizeof(mainHeader) would give them the offset to the next section. Again, if I'm missing something, please let me know.<br />
<br />
<br />
Well that's it for now,<br />
-<span style="color: #dd7700;">RichieSams</span></div>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com4tag:blogger.com,1999:blog-4016790357096156934.post-70355513969483170582013-06-12T20:04:00.005-05:002014-01-22T20:28:15.749-06:00Git is hard, but ScummVM Common is awesomeThis week I started working on Z-engine proper... And immediately ran face-first into the complexity of git. Well, let me restate that. Git isn't hard, per-se, but has so many features and facets that it can very easily go over your head. Anybody with a brain can mindlessly commit and push things to a git repo. However, if you really want structured and concise commit flow, it takes not only knowing the tools, but actually sitting back and thinking about what changes should be put in what commits and which branches.<br />
<br />
So that said, I'll go over the things I really like about git or just distributed source control in general.<br />
<br />
Branchy development is absolutely a must. It's really really helpful to separate different parts of a project or even different parts of the same section of a project. It makes identifying and diff-ing changes really easy. Also, I found it's really helpful to have a local "work-in-progess" version of the branch I'm working on. That allows me to commit really often and not really have to worry about commit message formatting or general structure. Then when I'm ready to do a push to the repo, I rebase my commits in my WIP branch to fit all my needs, then rebase them to the main branch before pushing.<br />
<br />
On that note, rebase is AMAZING!!! It's like the "Jesus" answer in Sunday school, or "Hydrogen bonding" in chemistry class. However, "With great power comes great responsibility". So I try my hardest to only use rebase on my local repo.<br />
<br />
<br />
On to details about Z-engine work!!<br />
<br />
My first milestone for Z-engine was to get a file manager fully working, seeing how pretty much every other part of the engine relies on files. When I was writing my proposal for GSoC, I thought I was going to have to write my own file manager, but Common::SearchManager to the rescue!<br />
<br />
By default, the SearchManager will register every file within the game's directory. So any calls to<br />
<pre class="brush:cpp">Common::File.open(Common::String filePath);
</pre>
will search the game's directory for the filePath and open that file if found.<br />
Well that was easy. Done before lunch.... Well, not quite. Z-engine games store their script files in archive files. The format is really really simple, but I'll save that for a post of itself. Ideally, I wanted to be able to do:<br />
<pre class="brush:cpp">Common::File.open("fileInsideArchive.scr");
</pre>
After some searching and asking about irc, I found that I can do exactly that by implementing Common::Archive:
<br />
<pre class="brush:cpp">class ZfsArchive : public Common::Archive {
public:
ZfsArchive(const Common::String &fileName);
ZfsArchive(const Common::String &fileName, Common::SeekableReadStream *stream);
~ZfsArchive();
/**
* Check if a member with the given name is present in the Archive.
* Patterns are not allowed, as this is meant to be a quick File::exists()
* replacement.
*/
bool hasFile(const Common::String &fileName) const;
/**
* Add all members of the Archive to list.
* Must only append to list, and not remove elements from it.
*
* @return the number of names added to list
*/
int listMembers(Common::ArchiveMemberList &list) const;
/**
* Returns a ArchiveMember representation of the given file.
*/
const Common::ArchiveMemberPtr getMember(const Common::String &name) const;
/**
* Create a stream bound to a member with the specified name in the
* archive. If no member with this name exists, 0 is returned.
* @return the newly created input stream
*/
Common::SeekableReadStream *createReadStreamForMember(const Common::String &name) const;
}
</pre>
and then registering each archive with the SearchManager like so:
<br />
<pre class="brush:cpp">// Search for .zfs archive files
Common::ArchiveMemberList list;
SearchMan.listMatchingMembers(list, "*.zfs");
// Register the files within the zfs archive files with the SearchMan
for (Common::ArchiveMemberList::iterator iter = list.begin(); iter != list.end(); ++iter) {
Common::String name = (*iter)->getName();
ZfsArchive *archive = new ZfsArchive(name, (*iter)->createReadStream());
SearchMan.add(name, archive);
}
</pre>
<br />
In summary, git can be complicated, but it has a wealth of potential and is extremely powereful. Also, the ScummVM Common classes are absolutely fantastic and make the lives of engine developers sooooo much easier. A toast to the wonderful people who developed them.
Well, that's all for now.<br />
<br />
So until next time, happy coding. :)<br />
-<span style="color: #dd7700;">RichieSams</span>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com0tag:blogger.com,1999:blog-4016790357096156934.post-91096945338983707372013-05-31T12:41:00.002-05:002014-01-22T20:46:12.876-06:00Obligatory "Hello world!"Hello world!<br />
<br />
Welcome to my new blog: 'RichieSam's Adventures in Code-ville'. This will be the place I share the coding experiences have and learn while working on my various projects. With that said, who am I and what am I working on?<br />
<br />
I'm a 21 year old, fourth year student studying at The University of Texas at Austin. I'm majoring in Mechanical Engineering with a minor in Computer Science. I thoroughly enjoy programming, both the thrill of gettting something to work and the science/math of algorithms and data structures. The majority of my programming projects have revolved around games. The first major project I did was creating an application that tracked guild currency for my guild. My latest project is a suite of tools to let users install, modify, and create game asset modifications of the game League of Legends. It required reverse engineering quite a few file formats and learning how to hook the game process in order to allow run-time asset swapping.<br />
<br />
The two big projects I'm working on right now are The Dargon Project and Z-engine for ScummVM. The Dargon Project is the aforementioned suite of applications. Z-engine is my project for Google Summer of Code.<br />
<br />
Z-engine:<br />
<div>
The Z-Engine is used in the games Zork Nemesis and Zork Grand Inquisitor. Marisa Chan created a C implementation of the engine, but it is only for desktop and requires configuration files. The project aims to create a ScummVM engine using Marisa Chan’s implementation code as a guide into the Zork file structure and engine design. That is, it will not simply adapt the current implementation to the ScummVM engine structure. Rather, it will create a new engine, using the file structures and event implementations in Marisa Chan’s code as a reference. ScummVM will allow these games to be played on a variety of platforms and a redesign will remove the need for configuration files. Lastly, it will mean that ScummVM will support all of the Zork point'n'click adventure games.</div>
<div>
<br /></div>
<div>
I'm absolutely thrilled be one of the lucky people to be a part of Google Summer of Code. ScummVM is an amazing group of developers and I'm really looking forward to being a part of that.</div>
<div>
<br /></div>
<div>
Well, I guess that's it for now. My next post will most likely be about that start of GSoC.</div>
<div>
<br /></div>
<div>
Until then,</div>
<div>
-<span style="color: #dd7700;">RichieSams</span></div>RichieSamshttp://www.blogger.com/profile/11068267631031438940noreply@blogger.com8