Thursday, April 9, 2015

Making Our First Pretty Picture

This is the fourth post in a series documenting my adventures in creating a GPU path tracer from scratch
1. Tracing Light in a Virtual World
2. Creating Randomness and Accumulating Change
3. Shooting Object from Across the Way


So we're finally here! We've learned everything we need to actually implement a path tracer! First off, let me apologize for taking so long to get this post up. Things started getting busy at work again and I was in a car accident last week. So I've been recovering from that. Also, the implementation I showed a picture of last post turned out to be mathematically incorrect. (The current simulation is still far from physically correct, but I'll get to that later in the post) But, that's in the past now; let's get started.


A Quick Review

Before we launch into the code. let's first go over what we're trying to do. For each pixel we want to simulate the light that bouces around the scene in order to calculate a color. This calculation can be described by the Rendering Equation:
\[L_{\text{o}}(p, \omega_{\text{o}}) =  L_{e}(p, \omega_{\text{o}}) + \int_{\Omega} f(p, \omega_{\text{o}}, \omega_{\text{i}}) L_{\text{i}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right | d\omega_{\text{i}} \]
Wow, that's looks intimidating. But once we break it down it's actually a pretty simple concept. In English, the equation is:
The outgoing light from a point equals the emmisive light from the point itself, plus all the incoming light attenuated by a function of the material (the BSDF) and the direction of the incoming light.
The integral is just a mathematical way of adding up all the incoming light coming from the hemisphere around the point.

That said, once you start taking into account light reflected off of things, being absorbed by things, being refracted by things, etc, things start to become very complex. For almost all scenes, this integral can not be analytically solved.

A very simple approximation of this integral is to only take into account the light that is directly coming from light sources. With this, the integral collapses into a simple sum over all the lights in the scene. Aka:
\[L_{\text{o}}(p, \omega_{\text{o}}) =  L_{e}(p, \omega_{\text{o}}) + \sum_{k = 0}^{n} f(p, \omega_{\text{o}}, \omega_{\text{i}}) L_{\text{i, k}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right |\]
This is what Direct Lighting Ray Tracers do. They shoot rays from the point to each light, and test for obscurance between them.

However, you can simplify it further by also ignoring the obscurance checks. That is, you light the pixel with the assumption that it is the only thing in the scene. This is exactly what most real-time renderers do. It is known as a local lighting model. It's extremely fast because each pixel can be lit independently; the incoming light at the pixel is determined only by the lights in the scene, not on the scene itself. However, the speed comes at a cost. By disregarding the scene, you lose all global illumination effects. Aka, shadows, light bleeding, indirect light, specular reflections, etc. These have to be approximated using tricks and/or hacks. (Shadow mapping, light baking, screen space reflections, etc.)


Another way to approximate the integral is by using Monte Carlo Integration.
\[\int f(x) \, dx \approx \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{pdf(x_{i})}\]
It says that you can approximate an integral by averaging successive random samples from the function. As $N$ gets large, the approximation gets closer and closer to the solution. $pdf(x_{i})$ represents the probability density function of each random sample.

Path tracing uses Monte Carlo Integration to solve the rendering equation. We sample by tracing randomly selected light paths around the scene.


The Path Tracing Kernel

I figure the best way to understand the path tracing algorithm is just to post the whole kernel, and then go over each line / function, explaining as we go. So here's the whole thing:
__global__ void PathTraceKernel(unsigned char *textureData, uint width, uint height, size_t pitch, DeviceCamera *g_camera, Scene::SceneObjects *g_sceneObjects, uint hashedFrameNumber) {
    // Create a local copy of the arguments
    DeviceCamera camera = *g_camera;
    Scene::SceneObjects sceneObjects = *g_sceneObjects;
 
    // Global threadId
    int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
 
    // Create random number generator
    curandState randState;
    curand_init(hashedFrameNumber + threadId, 0, 0, &randState);
 
 
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
 
    // Calculate the first ray for this pixel
    Scene::Ray ray = {camera.OriginCalculateRayDirectionFromPixel(x, y, width, height, camera, &randState)};
 
 
    float3 pixelColor = make_float3(0.0f, 0.0f, 0.0f);
    float3 accumulatedMaterialColor = make_float3(1.0f, 1.0f, 1.0f);
 
    // Bounce the ray around the scene
    for (uint bounces = 0; bounces < 10; ++bounces) {
        // Initialize the intersection variables
        float closestIntersection = FLT_MAX;
        float3 normal;
        Scene::LambertMaterial material;
 
        TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);
 
        // Find out if we hit anything
        if (closestIntersection < FLT_MAX) {
            // We hit an object
 
            // Add the emmisive light
            pixelColor += accumulatedMaterialColor * material.EmmisiveColor;
 
 
            // Shoot a new ray
 
            // Set the origin at the intersection point
            ray.Origin = ray.Origin + ray.Direction * closestIntersection;
            // Offset the origin to prevent self intersection
            ray.Origin += normal * 0.001f;
 
            // Choose the direction based on the material
            if (material.MaterialType == Scene::MATERIAL_TYPE_DIFFUSE) {
                ray.Direction = CreateUniformDirectionInHemisphere(normal, &randState);
 
                // Accumulate the diffuse color
                accumulatedMaterialColor *= material.MainColor /* * (1 / PI)  <- this cancels with the PI in the pdf */ * dot(ray.Direction, normal);
 
                // Divide by the pdf
                accumulatedMaterialColor *= 2.0f; // pdf == 1 / (2 * PI)
            } else if (material.MaterialType == Scene::MATERIAL_TYPE_SPECULAR) {
                ray.Direction = reflect(ray.Direction, normal);
 
                // Accumulate the specular color
                accumulatedMaterialColor *= material.MainColor;
            }
 

            // Russian Roulette
            if (bounces > 3) {
                float p = max(accumulatedMaterialColor.xmax(accumulatedMaterialColor.y, accumulatedMaterialColor.z));
                if (curand_uniform(&randState) > p) {
                    break;
                }
                accumulatedMaterialColor *= 1 / p;
            }
        } else {
            // We didn't hit anything, return the sky color
            pixelColor += accumulatedMaterialColor * make_float3(0.846f, 0.933f, 0.949f);
 
            break;
        }
    }
 
 
    if (x < width && y < height) {
        // Get a pointer to the pixel at (x,y)
        float *pixel = (float *)(textureData + y * pitch) + 4 /*RGBA*/ * x;
 
        // Write pixel data
        pixel[0] += pixelColor.x;
        pixel[1] += pixelColor.y;
        pixel[2] += pixelColor.z;
        // Ignore alpha, since it's hardcoded to 1.0f in the display
        // We have to use a RGBA format since CUDA-DirectX interop doesn't support R32G32B32_FLOAT
    }
}


Initializing all the Variables

// Create a local copy of the arguments
DeviceCamera camera = *g_camera;
Scene::SceneObjects sceneObjects = *g_sceneObjects;
 
// Global threadId
int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
 
// Create random number generator
curandState randState;
curand_init(hashedFrameNumber + threadId, 0, 0, &randState);

The first thing we do is create a local copy of some of the pointer arguments that are passed into the kernel. It means one less indirection when fetching from global memory. We fetch multiple variables from the camera and we fetch the scene objects many times.

Next we calculate the global id for the thread and use it to create a random state so we can generate random numbers. We covered this in the second post of this series.


Shooting the First Ray

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
 
// Calculate the first ray for this pixel
Scene::Ray ray = {camera.OriginCalculateRayDirectionFromPixel(x, y, width, height, camera, &randState)};

Then we shoot the initial ray from the camera origin through the pixel for this thread. We covered this in the first post. However, I added one thing to CalculateRayDirectionFromPixel().
__device__ float3 CalculateRayDirectionFromPixel(uint x, uint y, uint width, uint height, DeviceCamera &camera, curandState *randState) {
    float3 viewVector = make_float3((((x + curand_uniform(randState)) / width) * 2.0f - 1.0f) * camera.TanFovXDiv2,
                                    -(((y + curand_uniform(randState)) / height) * 2.0f - 1.0f) * camera.TanFovYDiv2,
                                    1.0f);
 
    // Matrix multiply
    return normalize(make_float3(dot(viewVector, camera.ViewToWorldMatrixR0),
                                 dot(viewVector, camera.ViewToWorldMatrixR1),
                                 dot(viewVector, camera.ViewToWorldMatrixR2)));
}

In the first post, we always shot the ray through the center of the pixel. ie: (x + 0.5, y + 0.5). Now we use a random number to jitter the ray within the pixel. Why would we do this? By jittering the camera ray, we effectively supersample the scene. Since we're going to shoot the same number of rays either way, we effectively get supersampled anti-aliasing for free.

Let's look at one of the renders from the third post with and without the jitter:

No jitter: Ewwwwwww. Look at all those jaggies...... 


With jitter: Silky smooth :)


Start Bouncing!

float3 pixelColor = make_float3(0.0f, 0.0f, 0.0f);
float3 accumulatedMaterialColor = make_float3(1.0f, 1.0f, 1.0f);
 
// Bounce the ray around the scene
for (uint bounces = 0; bounces < 10; ++bounces) {

Here we initialize two float3's used to accumulate color as we bounce. I'll cover them in detail below. Then we start a for loop the determines how many times we can bounce around the scene.


Shooting the Scene

// Initialize the intersection variables
float closestIntersection = FLT_MAX;
float3 normal;
Scene::LambertMaterial material;
 
TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);

We initialize the intersection variables and then shoot the ray through the scene. TestSceneIntersection() loops through each object in the scene and tries to intersect it. If it hits, it tests if the new intersection is closer than the current. If so, it replaces the current intersection with the new one. (And updates the normal / material).

__device__ void TestSceneIntersection(Scene::Ray &ray, Scene::SceneObjects &sceneObjects, float *closestIntersection, float3 *normal, Scene::LambertMaterial *material) {
    // Try to intersect with the planes
    for (uint j = 0; j < sceneObjects.NumPlanes; ++j) {
        // Make a local copy
        Scene::Plane plane = sceneObjects.Planes[j];
 
        float3 newNormal;
        float intersection = TestRayPlaneIntersection(ray, plane, newNormal);
        if (intersection > 0.0f && intersection < *closestIntersection) {
            *closestIntersection = intersection;
            *normal = newNormal;
            *material = sceneObjects.Materials[plane.MaterialId];
        }
    }
 
    // Try to intersect with the rectangles;
    for (uint j = 0; j < sceneObjects.NumRectangles; ++j) {
        // Make a local copy
        Scene::Rectangle rectangle = sceneObjects.Rectangles[j];
 
        float3 newNormal;
        float intersection = TestRayRectangleIntersection(ray, rectangle, newNormal);
        if (intersection > 0.0f && intersection < *closestIntersection) {
            *closestIntersection = intersection;
            *normal = newNormal;
            *material = sceneObjects.Materials[rectangle.MaterialId];
        }
    }
 
    // Try to intersect with the circles;
    for (uint j = 0; j < sceneObjects.NumCircles; ++j) {
        // Make a local copy
        Scene::Circle circle = sceneObjects.Circles[j];
 
        float3 newNormal;
        float intersection = TestRayCircleIntersection(ray, circle, newNormal);
        if (intersection > 0.0f && intersection < *closestIntersection) {
            *closestIntersection = intersection;
            *normal = newNormal;
            *material = sceneObjects.Materials[circle.MaterialId];
        }
    }
 
    // Try to intersect with the spheres;
    for (uint j = 0; j < sceneObjects.NumSpheres; ++j) {
        // Make a local copy
        Scene::Sphere sphere = sceneObjects.Spheres[j];
 
        float3 newNormal;
        float intersection = TestRaySphereIntersection(ray, sphere, newNormal);
        if (intersection > 0.0f && intersection < *closestIntersection) {
            *closestIntersection = intersection;
            *normal = newNormal;
            *material = sceneObjects.Materials[sphere.MaterialId];
        }
    }
}


I added support for two more intersections since the last post. Ray-Circle and Ray-Rectangle. (It actually supports any parallelogram, not just rectangles). The idea for both is to first test intersection with the plane, then test if the intersection point is inside the respective shapes. The code/comments should be self explanatory, but feel free to comment if you have a question.


Direct Hit!!!!           Or Perhaps Not....

This is where the true path tracing starts to come in. Once we've traced the ray through the scene, there are 2 outcomes: either we hit something, or we missed.

For real-life materials, light doesn't just bounce off the surface of the material, but rather, it also enters the material itself, bouncing and reflecting off the molecules.
Image from Naty Hoffman's Siggraph 2013 Presentation on Physically Based Shading

As you can image, it would be prohibitively expensive to try to calculate the exact scattering of light within the material. Therefore, a common simplification is to calculate the interaction of light with the surface in two parts: diffuse and specular.
Image from Naty Hoffman's Siggraph 2013 Presentation on Physically Based Shading

The diffuse term represents the refraction, absorption, and scattering of light within the material. It is convenient to ignore entry to exit distance for the diffuse term. Doing so allows us to compute the diffuse lighting at a single point.

The specular term represents the reflected light off the surface, aka, mirror-like reflections.

For now, we're going to assume a material is either a perfect diffuse Lambertian diffuse material, or a perfect specular mirror in order to keep things simple.


First, let's look at the case where we hit a perfectly diffuse surface lit by a single light source, $L_{\text{i}}$. If we plug that into the Monte Carlo Estimate of the Rendering Equation for a single sample, $\text{k}$, we get:

\[L_{\text{o, k}} = L_e + \frac{L_{\text{i, k}} \: \frac{materialColor_{\text{diffuse}}}{\pi} \: (n \cdot v)}{pdf}\]
where $n$ is the surface normal where we hit, and $v$ is the unit vector pointing from the surface to the light. We use the fact that $\left | \cos \theta \right | \equiv (n \cdot v)$ since both $n$ and $v$ are unit length.

For path tracing, $L_{\text{i}}$ represents the light coming from the next path. ie:
\[L_{\text{i,  k}} = L_{\text{o,  k + 1}}\]
It's a classic recursive algorithm. ie:
float3 CalculateColorForRay(Scene::Ray &ray, Scene::SceneObjects &sceneObjects) {
    // Initialize the intersection variables
    float closestIntersection = FLT_MAX;
    float3 normal;
    Scene::LambertMaterial material;
 
    TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);
 
    // Find out if we hit anything
    if (closestIntersection < FLT_MAX) {
        // We hit an object
        
        // Shoot a new ray
        ray.Origin = ray.Origin + ray.Direction * closestIntersection;
        
        if (material.MaterialType == Scene::MATERIAL_TYPE_DIFFUSE) {
            // We hit a diffuse surface
            ray.Direction = CreateNewRayDirection();
        } else if (material.MaterialType == Scene::MATERIAL_TYPE_SPECULAR) {
            // We hit a specular surface
            ray.Direction = reflect(ray.Direction, normal);
        }
 
        return material.EmmisiveColor + CalculateColorForRay(ray, sceneObjects) * material.MainColor * dot(ray.Direction, normal) * 2.0f /* pdf */;
    } else {
        // We didn't hit anything, return the sky color
        return make_float3(0.846f, 0.933f, 0.949f);
    }
}

Unfortunately for us, the GPU doesn't handle recursion well. In fact, it's forbidden in most, if not all, of the GPU programming languages. That said, since multiplication is both commutative and distributive, we can factor out all the $\frac{\frac{materialColor_{\text{diffuse}}}{\pi} \: (n \cdot v)}{pdf}$ terms and turn our path tracing algorithm into a iterative solution:
float3 pixelColor = make_float3(0.0f, 0.0f, 0.0f);
float3 accumulatedMaterialColor = make_float3(1.0f, 1.0f, 1.0f);
 
// Bounce the ray around the scene
for (uint bounces = 0; bounces < 10; ++bounces) {
    // Initialize the intersection variables
    float closestIntersection = FLT_MAX;
    float3 normal;
    Scene::LambertMaterial material;
 
    TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);
 
    // Find out if we hit anything
    if (closestIntersection < FLT_MAX) {
        // We hit an object
 
        // Add the emmisive light
        pixelColor += accumulatedMaterialColor * material.EmmisiveColor;
 
 
        // Shoot a new ray
 
        // Set the origin at the intersection point
        ray.Origin = ray.Origin + ray.Direction * closestIntersection;
        // Offset the origin to prevent self intersection
        ray.Origin += normal * 0.001f;
 
        // Choose the direction based on the material
        if (material.MaterialType == Scene::MATERIAL_TYPE_DIFFUSE) {
            ray.Direction = CreateUniformDirectionInHemisphere(normal, &randState);
 
            // Accumulate the diffuse color
            accumulatedMaterialColor *= material.MainColor /* * (1 / PI)  <- this cancels with the PI in the pdf */ * dot(ray.Direction, normal);
 
            // Divide by the pdf
            accumulatedMaterialColor *= 2.0f; // pdf == 1 / (2 * PI)
        } else if (material.MaterialType == Scene::MATERIAL_TYPE_SPECULAR) {
            ray.Direction = reflect(ray.Direction, normal);
 
            // Accumulate the specular color
            accumulatedMaterialColor *= material.MainColor;
        }
    } else {
        // We didn't hit anything, return the sky color
        pixelColor += accumulatedMaterialColor * make_float3(0.846f, 0.933f, 0.949f);
 
        break;
    }
}


Bouncing is What Tiggers Rays Do Best

The last thing to do is to choose the direction for the next path. For mirror specular reflections, this is as simple as using the reflect function:
ray.Direction = reflect(ray.Direction, normal);

Since it is a perfect reflection, the probability of the reflection in the chosen direction is 1. ie, $pdf_{\text{specular}} = 1$


However, for diffuse reflections, it's a different story. By definition, the diffuse term can represent any internal reflection. Therefore, the light can come out in any direction within the unit hemisphere defined by the surface normal:

So, we need to randomly pick a direction in the unit hemisphere. For now, we'll use use a uniform distribution of directions. In a later post, we'll explore other distributions.
__device__ float3 CreateUniformDirectionInHemisphere(float3 normal, curandState *randState) {    
    // Create a random coordinate in spherical space
    // Then calculate the cartesian equivalent
    float z = curand_uniform(randState);
    float r = sqrt(1.0f - z * z);
    float phi = TWO_PI * curand_uniform(randState);
    float x = cos(phi) * r;
    float y = sin(phi) * r;
 
    // Find an axis that is not parallel to normal
    float3 majorAxis;
    if (abs(normal.x) < INV_SQRT_THREE) { 
        majorAxis = make_float3(1, 0, 0);
    } else if (abs(normal.y) < INV_SQRT_THREE) { 
        majorAxis = make_float3(0, 1, 0);
    } else {
        majorAxis = make_float3(0, 0, 1);
    }
 
    // Use majorAxis to create a coordinate system relative to world space
    float3 u = normalize(cross(majorAxis, normal));
    float3 v = cross(normal, u);
    float3 w = normal;
 
    // Transform from spherical coordinates to the cartesian coordinates space
    // we just defined above, then use the definition to transform to world space
    return normalize(u * x +
                     v * y +
                     w * z);
}

We use two uniformly distributed random numbers to create a random point in the spherical coordinate system. Then we use some trig to convert to Cartesian coordinates. Finally, we transform the random point from local coordinates (ie, relative to the normal) to world coordinates.

To create the transformation coordinate system, we first find the world axis that is most perpendicular to the normal by comparing each axis of the normal to $\frac{1}{\sqrt{3}}$. $\frac{1}{\sqrt{3}}$ is significant because it's the length at which all 3 axes are equal (since we know the normal is a unit vector). So if any are less than $\frac{1}{\sqrt{3}}$, there is a good chance that axis is the smallest.

Next, we use the cross product to find an axis that is perpendicular to the normal. And finally, we use the cross product again to find an axis that is perpendicular to both the new axis and the normal.


The $pdf$ for a uniformly distributed direction in a hemisphere is:
\[pdf_{\text{hemi}} = \frac{1}{2 \pi}\]
This corresponds to the solid angle that the random directions can come from. A hemisphere has a solid angle of $2\pi$ steradians.


Shooting Yourself in the Head

// Russian Roulette
if (bounces > 3) {
    float p = max(accumulatedMaterialColor.xmax(accumulatedMaterialColor.y, accumulatedMaterialColor.z));
    if (curand_uniform(&randState) > p) {
        break;
    }
    accumulatedMaterialColor /= p;
}

During path tracing, we keep bouncing until we miss the scene or exceed the maximum number of bounces. But if the scene is closed and maxBounces is large, we could end up bouncing for a very long time. Remember though, that at each bounce, we accumulate the light attenuation of the material we hit ie: accumulatedMaterialColor will end up getting closer and closer to zero the more bounces we do. Since  accumulatedMaterialColor ends up being multiplied by the emitted light color, it makes very little sense to keep bouncing after a while, since the result won't add very much to the final pixel color.

However, we can't arbitrarily stop bouncing after accumulatedMaterialColor goes below a certain threshold, since this would create bias in the Monte Carlo Integration. What we can do, though, is use a random probability to determine the stopping point, as long as we divide out the $pdf$ of the random sample. This is known as Russian Roulette.

In this implementation, we create a uniform random sample and then compare it against the max component of accumulatedMaterialColor. If the random sample is greater than the max component then we terminate the tracing. If not, we keep the integration unbiased by dividing by the $pdf$ of our calculation. Since we used a uniform random variable, the $pdf$ is just:
\[pdf =  \text{p}\]


Success!

If we whip up a simple scene with a plane and 9 balls we get:

Whooo!! Look at those nice soft shadows! And the color bleeding! Houston, this is our first small step towards a battle-ready photo-realistic GPU renderer.

Let's do another! Obligatory Cornell Box:


One thing to note: since the Cornell box scene has a light with very small surface area in comparison to the surface area of the rest of the scene, the probability that a ray will hit the light is relatively low. Therefore, it takes a long time to converge. The render above was around 30 minutes. In comparison, the 9 balls scene took about 60 seconds to resolve to an acceptable level.


Closing Words

There we go. Our very own path tracer. I shall call him Squishy and he shall be mine, and he shall be my Squishy.    (I really need to stop watching videos right before writing this blog)

The code for everything in this post is on GitHub. It's open source under the Apache license, so feel free to use it in your own projects.

As always, feel free to ask questions, make comments, and if you find an error, please let me know.

Happy coding!
-RichieSams

Friday, March 13, 2015

Shooting Objects from Across the Way

This is the third post in a series documenting my adventures in creating a GPU path tracer from scratch
1. Tracing Light in a Virtual World
2. Creating Randomness and Accumulating Change


If you recall, the very high level algorithm for path tracing is:
  1. From each pixel, fire a ray out into the scene from the eye.
  2. If the ray hits an object
    1. Use the material properties to accumulate attenuation
    2. Bounce the ray in a new direction
    3. GOTO 2
  3. If the ray hits a light
    1. Multiply the light and the attenuation
    2. Add the result to the accumulation buffer
    3. GOTO 5
  4. If the ray doesn't hit anything
    1. GOTO 5
  5. GOTO 1 until sufficiently converged

The first post in the series covered step 1 and the second post covered step 3b. This post is going to cover how to test if a ray hits an object in the scene (aka, 2 and 3).


Ray - Sphere Intersection

Foreword: Scratchapixel has an excellent lesson covering ray-object intersections. Much of what I say below is based on their lessons. Much props to the authors.


A sphere can be represented mathematically with vector representing the location of the sphere's center, and its radius. (You can save a multiplication in a later calculation if you store the radius squared instead)
struct Sphere {
    float3 Center;
    float RadiusSquared;
};

A ray can be represented as a vector representing its origin and a unit vector representing its direction:
struct Ray {
    float3 Origin;
    float3 Direction;
};

Let's look at a picture of the most common type of ray - sphere intersection:
\[\begin{align}
\overrightarrow{R_{o}} &= ray \: origin \\
\overrightarrow{R_{d}} &= ray \: direction \\
\overrightarrow{S_{c}} &= sphere \: center \\
S_{r} &= sphere \: radius \\
\overrightarrow{P_{0}} &= first \: intersection \: point \\
\overrightarrow{P_{1}} &= second \: intersection \: point \\
t_{0} &= distance \: from \: \overrightarrow{R_{o}} \: to \: \overrightarrow{P_{0}} \\
t_{1} &= distance \: from \: \overrightarrow{R_{o}} \: to \: \overrightarrow{P_{1}} \\
\end{align}\]

We would like to find $\overrightarrow{P_{0}}$ and $\overrightarrow{P_{1}}$. Mathematically, they are defined as:
 \[\begin{align}
\overrightarrow{P_{0}} &= \overrightarrow{R_{o}} + t_{0} \overrightarrow{R_{d}} \\
\overrightarrow{P_{1}} &= \overrightarrow{R_{o}} + t_{1} \overrightarrow{R_{d}} \\
\end{align}\]

We already know $\overrightarrow{R_{o}}$ and $\overrightarrow{R_{d}}$, so we just need to find $t_{0}$ and $t_{1}$. In order to do so, let's define a few new variables.


And then define $t_{0}$ and $t_{1}$ in terms of $t_{ca}$ and $t_{hc}$:
 \[\begin{align}
t_{0} &= t_{ca} - t_{hc} \\
t_{1} &= t_{ca} + t_{hc} \\
\end{align}\]


Now we can begin solving for our unknown variables. To start, let's look at the right triangle formed by $t_{ca}$, $\overrightarrow{L}$, and $d$.


We can solve for $t_{ca}$ by using the definition of cosine and the dot product:
\[\cos\left ( \theta \right ) = \frac{adjacent}{hypotenuse} = \frac{t_{ca}}{\left \| \overrightarrow{L} \right \|}\]
\[\overrightarrow{m} \cdot \overrightarrow{n} = \left \| m \right \| \left \| n \right \| \cos \left ( \theta \right )\]

\[\begin{split} ie: \end{split} \qquad \qquad
\begin{split}
\overrightarrow{R_{d}} \cdot \overrightarrow{L} &= \left \| \overrightarrow{R_{d}} \right \| \left \| \overrightarrow{L} \right \| \cos \left ( \theta \right ) \\
                                 &= \frac{\left \| \overrightarrow{L} \right \| t_{ca}}{\left \| \overrightarrow{L} \right \|} \\
                                 &= t_{ca}
\end{split}\]

$\overrightarrow{R_{d}}$ is a unit vector. Therefore, $\left \| \overrightarrow{R_{d}} \right \| = 1$ and cancels out. Then, if we replace $\cos \left ( \theta \right )$ with its definition, we can cancel $\left \| \overrightarrow{L} \right \|$ from top and bottom. Thus, we're left with just $t_{ca}$.



Using the Pythagorean Theorem and a trick with the dot product, we can solve for d:
\[\overrightarrow{m} \cdot \overrightarrow{m} \equiv \left \| \overrightarrow{m} \right \| ^{2}\]

\[\begin{align} t_{ca} \: ^{2} + d^{2} &=  \left \| \overrightarrow{L} \right \| ^{2}\\
t_{ca} \: ^{2} + d^{2} &= \overrightarrow{L} \cdot \overrightarrow{L} \\
                      d^{2} &= \overrightarrow{L} \cdot \overrightarrow{L} - t_{ca} \: ^{2}
\end{align}\]


To solve for $t_{hc}$, let's look at the triangle formed by $S_{r}$, $t_{hc}$, and $d$.


Using the Pythagorean Theorem again, we can solve for $t_{hc}$:
\[\begin{align}
t_{hc} \: ^{2} + d^{2} &= S_{r} \: ^{2} \\
                      t_{hc} &= \sqrt{S_{r} \: ^{2} - d^{2}}
\end{align}\]


Whew! We made it! Using the above equations, we can find the two intersection points. Let's look at some code that implements the above and walk through some special cases / early outs.
/**
 * Test for the intersection of a ray with a sphere
 *
 * @param ray           The ray to test
 * @param sphere        The sphere to test
 * @param normal_out    Filled with normal of the surface at the intersection point. Not changed if no intersection.
 * @return              The distance from the ray origin to the nearest intersection. -1.0f if no intersection
 */
__device__ float TestRaySphereIntersection(Scene::Ray &ray, Scene::Sphere &sphere, float3 &normal_out) {
    float3 L = sphere.Center - ray.Origin;
    float t_ca = dot(L, ray.Direction);
 
    // Ray points away from the sphere
    if (t_ca < 0) {
        return -1.0f;
    }
 
    float d_squared = dot(L, L) - t_ca * t_ca;
 
    // Ray misses the sphere
    if (d_squared > sphere.RadiusSquared) {
        return -1.0f;
    }
 
    float t_hc = sqrt(sphere.RadiusSquared - d_squared);
 
    float t_0 = t_ca - t_hc;
    float t_1 = t_ca + t_hc;
 
    float nearestIntersection;
    float normalDirection;
    if (t_0 > 0 && t_1 > 0) {
        // Two intersections
        // Return the nearest of the two
        nearestIntersection = min(t_0, t_1);
 
        normalDirection = 1.0f;
    } else {
        // Ray starts inside the sphere
 
        // Return the far side of the sphere
        nearestIntersection = max(firstIntersection, secondIntersection);
 
        // We reverse the direction of the normal, since we are inside the sphere
        normalDirection = -1.0f;
    }
 
    normal_out = normalize(((ray.Origin + (ray.Direction * nearestIntersection)) - sphere.Center) * normalDirection);
 
    return nearestIntersection;
}


The algorithm can be summarized as follows:
  1. Calculate $t_{ca}$
  2. If $t_{ca}$ is negative, $\overrightarrow{R_{d}}$ is pointing away from the sphere. Thus, there can not be an intersection
  3. Calculate $d^{2}$
  4. If $d^{2}$ is greater than $S_{r} \: ^{2}$, the ray misses the sphere.
  5. Calculate $t_{hc}$
  6. Calculate $t_{0}$ and $t_{1}$.
  7. If $t_{0}$ and $t_{1}$ are both positive, the ray starts outside the sphere and intersects it twice. Choose the closest of the two intersections
  8. If either $t_{0}$ or $t_{1}$ is negative, the ray starts inside the sphere and intersects it on the way out. Choose the positive intersection.
    • They both can't both be negative, since that would mean the ray is pointing away from the sphere, and we already checked for that.
    • See following picture.




Ray - Plane Intersection

A plane can be represented using any point on the plane, and a unit normal vector from the plane.
struct Plane {
    float3 Point;
    float3 Normal;
};

\[\begin{align}
\overrightarrow{P_{p}} &= plane \: point \\
\overrightarrow{P_{n}} &= plane \: normal
\end{align}\]

Let's look at the intersection from the top.


We would like to find $d$. We can start by taking the dot product between $\overrightarrow{R_{d}}$ and  $\overrightarrow{P_{n}}$. We get $\cos \left ( \theta_{1} \right )$, since both  $\overrightarrow{R_{d}}$ and  $\overrightarrow{P_{n}}$ are unit vectors.
\[\begin{align}
\overrightarrow{R_{d}} \cdot \overrightarrow{P_{n}} &= \left \| \overrightarrow{R_{d}} \right \| \left \| \overrightarrow{P_{n}} \right \| \cos \left ( \theta_{1} \right ) \\
          &= \cos \left ( \theta_{1} \right )
\end{align}\]

Let's look at the triangle formed by $\overrightarrow{P_{n}}$ and $\overrightarrow{L}$, where $\overrightarrow{L}$ is the vector between $\overrightarrow{P_{p}}$ and $\overrightarrow{R_{o}}$.


If we take the dot product between $\overrightarrow{P_{n}}$ and  $\overrightarrow{L}$, we get $a$.

\[\begin{align}
\overrightarrow{P_{n}} \cdot \overrightarrow{L} &= \left \| \overrightarrow{P_{n}} \right \| \left \| \overrightarrow{L} \right \| \cos \left ( \theta_{2} \right ) \\
                                 &= \frac{\left \| \overrightarrow{L} \right \| a}{\left \| \overrightarrow{L} \right \|} \\
                                 &= a
\end{align}\]


Finally, let's look at the two triangles formed between $\overrightarrow{P_{n}}$, $\overrightarrow{R_{d}}$, and the plane itself.


They are similar triangles, so we can solve for $d$ using the Law of Similar Triangles and the definition of cosine:
\[\cos \left ( \theta_{1} \right ) = \frac{b}{c}\]

\[\begin{align}
\frac{a + b}{c + d} &= \frac{b}{c} \\
c \left ( a + b \right ) &= b \left ( c + d \right ) \\
ac + bc &= bc + bd \\
ac &= bd \\
d &= \frac{ac}{b} \\
d &= \frac{a}{\cos \left ( \theta_{1} \right )}
\end{align}\]


Let's look at the code implementation of the above and walk through it.
/**
 * Test for the intersection of a ray with a plane   
 *
 * @param ray           The ray to test
 * @param plane         The plane to test
 * @param normal_out    Filled with normal of the surface at the intersection point. Not changed if no intersection.
 * @return              The distance from the ray origin to the nearest intersection. -1.0f if no intersection
 */
__device__ float TestRayPlaneIntersection(Scene::Ray &ray, Scene::Plane &plane, float3 &normal_out) {
    float cos_theta1 = dot(plane.Normal, ray.Direction);
 
    // If cos_theta1 is greater than -epison,
    // the ray is perpendicular to the plane or points away from the plane normal
    if (cos_theta1 > -1.0e-6f) {
        return -1.0f;
    }
 
    normal_out = plane.Normal;
    float a = dot(plane.Normal, plane.Point - ray.Origin);
    return a / cos_theta1;
}


The algorithm can be summarized as follows:
  1. Calculate $\cos \left ( \theta_{1} \right )$
  2. If $\cos \left ( \theta_{1} \right )$ is greater than or equal to zero, the ray is perpendicular to the plane, or faces away
    • In the code, we use a small episilon to account for floating point innacuracies
  3. Calculate a
  4. Return $\frac{a}{\cos \left ( \theta_{1} \right )}$


Testing Them Out

In order to test the code out, I added on to the eye ray code we created in the first post. After creating the ray, I tried to intersect it with an object in the scene. If it hit, I did some basic Lambertian shading with a hard-coded directional light. If it didn't, I colored the pixel black. Here's the results:

Very nice! If I do say so myself. I don't have a plane in the scene because I didn't implement the plane intersection code until I had skipped ahead to the path tracing itself. (I got too excited. Ha ha!) So you'll have to take my word for it that the code does work. Below is the kernel code used to create the image above:
__global__ void PathTraceKernel(unsigned char *textureData, uint width, uint height, size_t pitch, DeviceCamera *g_camera, Scene::Sphere *g_spheres, uint numSpheres, uint hashedFrameNumber) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
 
    // Create a local copy of the camera
    DeviceCamera camera = *g_camera;
 
    // Global threadId
    int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
 
    // Create random number generator
    curandState randState;
    curand_init(hashedFrameNumber + threadId, 0, 0, &randState);
 
    // Calculate the first ray for this pixel
    Scene::Ray ray = {camera.OriginCalculateRayDirectionFromPixel(x, y, width, height, camera, &randState)};
 
    // Generate a uniform random number
    //float randNum = curand_uniform(&randState);
 
    // Try to intersect with the spheres;
    float closestIntersection = FLT_MAX;
    float3 normal;
    for (uint i = 0; i < numSpheres; ++i) {
        float3 newNormal;
        float intersection = TestRaySphereIntersection(ray, g_spheres[i], newNormal);
        if (intersection > 0.0f && intersection < closestIntersection) {
            closestIntersection = intersection;
            normal = newNormal;
        }
    }
 
    float3 pixelColor;
    if (closestIntersection < FLT_MAX) {
        float attentuation = max(dot(normal, make_float3(0.70710678118f, 0.70710678118f, -0.70710678118f)), 0.0f);
        pixelColor = make_float3(0.846, 0.933, 0.949) * attentuation + make_float3(0.15f, 0.15f, 0.15f);
    } else {
        pixelColor = make_float3(0.0f, 0.0f, 0.0f);
    }
 
    if (x < width && y < height) {
        // Get a pointer to the pixel at (x,y)
        float *pixel = (float *)(textureData + y * pitch) + 4 /*RGBA*/ * x;
 
        // Write out pixel data
        pixel[0] += pixelColor.x;
        pixel[1] += pixelColor.y;
        pixel[2] += pixelColor.z;
        // Ignore alpha, since it's hardcoded to 1.0f in the display
        // We have to use a RGBA format since CUDA-DirectX interop doesn't support R32G32B32_FLOAT
    }


Conclusion

There we go! We now have everything we need to do path tracing. The next post will be just that: creating a basic path tracing kernel. I've already implemented it in code, so it's just a matter of how quickly I can write up the post. (teaser picture). Keep an eye out!

The code for everything in this post is on GitHub. It's open source under the Apache license, so feel free to use it in your own projects. The path tracing code is already pushed to the repo, if you're feeling antsy, you can give that a look.

As always, feel free to ask questions, make comments, and if you find an error, please let me know.


Happy coding!
-RichieSams

Friday, March 6, 2015

Creating Randomness and Acummulating Change

This is the second post in a series documenting my adventures in creating a GPU path tracer from scratch. If you missed it, the first post is here.


Path tracing uses Monte Carlo Integration to estimate the Global Illumination in the scene. In our case, Monte Carlo Integration boils down to taking a large number of random samples of the scene, and averaging them together. Random is the key word here. If we don't randomly sample, the resulting image will have artifacts in the form of patterns, banding, etc.


Creating Randomness from the Non-Random

So how do we create random numbers? This is a really old topic that has been extensively researched, so rather than reiterate it here, I'll just point you to Google. The point we do care about, though, is where we create the random numbers. As far as I can see, our options are as follows:
  1. Generate a large number of random numbers on the CPU using classic psuedo-random number generators (PRNG), and transfer them to the GPU to be consumed as needed.
  2. Create a random number generator on the GPU, and access it from each thread
  3. Create a random number generator per thread on the GPU

While option 1 looks simple and straightforward, path tracing will use a large number of random numbers, and each number consumed has to be transferred across the bus from the CPU to the GPU. This is going to be SLOW.

Ok, with the CPU out of the picture, we need to find a PRNG for the GPU. Luckily, CUDA comes with a library that does just that: curand. But, how many PRNGs should we have, and where should they live?

To better understand the problem, let's briefly go over how PRNGs work. PRNGs use math to simulate random sequences. In order to get different numbers, they need to access and store internal state, aka data. The size of this state depends on the PRNG algorithm. 

If we choose option 2, every single thread will want access to a single generator. There will be massive contention, and random number generation will degrade to a serial operation. Again, since path tracing requires a large number of random numbers, this option will be slow.

So option 3 it is! It turns out that the size of the state for the default curand generator isn't that big, so storing state per thread isn't too bad. 


Implementing curand in the Kernel

In order to create a generator in the kernel, you have to create a curandstate object and then call curand_init on it, passing in a seed, a sequence number, and an offset.

curandState randState;
curand_init(seed, sequenceNum, offset, &randState);

Then, you can generate random numbers using:
uint64 randInteger = curand(&randState);
float randNormalizedFloat = curand_uniform(&randState);

Two states with different seeds will create a different sequence of random numbers. Two states with the same seed will create the same sequence of random numbers.

Two states with the same seed, but different sequenceNum will use the same sequence, but be offset to different blocks of the sequence. (Specifically, in increments of 267) Why would you want this? According to the documentation, "Sequences generated with the same seed and different sequence numbers will not have statistically correlated values."

Offset just manually skips ahead n in the sequence.

In the curand documentation, the creators mention that curand_init() can be relatively slow, so if you're launching the same kernel multiple times, it's usually better to keep one curandstate per thread, but to store it in global memory between kernel launches. ie:
__global__ void setupRandStates(curandState *state) {
    int id = threadIdx.x + blockIdx.x * blockDim.x;
 
    // Each thread gets same seed, a different sequence number, no offset
    curand_init(1234, id, 0, &state[id]);
}
 
__global__ void useRandStates(curandState *state) {
    int id = threadIdx.x + blockIdx.x * blockDim.x;
    
    // Copy state to local memory for efficiency 
    curandState localState = state[id];
 
    // Use localState to generate numbers....
 
 
    // Copy state back to global memory 
    state[id] = localState;
}
 
int main() {
    int numThreads = 64;
    int numBlocks = 64;
 
    // Allocate space on the device to store the random states
    curandState *d_randStates;
    cudaMalloc(&d_randStates, numBlocks * numThreads * sizeof(curandState));
 
    // Setup and use the randStates
    setupRandStates<<<numBlocks, numThreads>>>(d_randStates);
 
    for (uint i = 0; i < NUM_ITERATIONS; ++i) {
        useRandStates<<<numBlocks, numThreads>>>(d_randStates);
    }
 
    return 0;
}

This would be really nice, but there is one problem: storing all the states:
cudaMalloc(&d_randStates, numBlocks * numThreads * sizeof(curandState));

In our case, we'll be launching a thread for every pixel on the screen. aka, millions of threads. While curandState isn't that large, storing millions of them is not feasible. So what can we do instead? It turns out that curand_init() is only slow if you use sequenceNum and offset. This is quite intuitive, since using those requires the generator to skip ahead a large amount. So if we keep both sequenceNum and offset equal to zero, curand_init() is quite fast.

In order to give each thread different random numbers we give them unique seeds. A simple method I came up with is to hash the frameNumber and then add the id of thread.
uint32 WangHash(uint32 a) {
    a = (a ^ 61) ^ (a >> 16);
    a = a + (a << 3);
    a = a ^ (a >> 4);
    a = a * 0x27d4eb2d;
    a = a ^ (a >> 15);
    return a;
}
 
__global__ void generateRandNumbers(uint hashedFrameNumber) {
    // Global threadId
    int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
 
    // Create random number generator
    curandState randState;
    curand_init(hashedFrameNumber + threadId, 0, 0, &randState);
 
    // Use randState to generate numbers...
}
 
int main() {
    uint frameNumber = 0;
    uint width = 1024;
    uint height = 256;
 
    dim3 Db = dim3(16, 16);   // block dimensions are fixed to be 256 threads
    dim3 Dg = dim3((width + Db.x - 1) / Db.x, (height + Db.y - 1) / Db.y);
 
    for (uint i = 0; i < NUM_ITERATIONS; ++i) {
        generateRandNumbers<<<Dg, Db >>>(WangHash(frameNumber++));
    }
 
    return 0;
}


Yay! Now we can generate lots of random numbers! If we generate a random number on each thread and output it as a greyscale color, we can make some nice white noise.


Accumulating and Averaging Colors

The last part of Monte Carlo Integration is the averaging of all the samples taken. The simplest solution is to just accumulate the colors from each frame, adding one frame to the next. Then, at the end, we divide the color at each pixel by the number of frames. I integrated this into my code by passing the frame number into the pixel shader that draws the texture to the screen.
cbuffer constants {
    float gInverseNumPasses;
};
 
 
Texture2D<float3> gHDRInput : register(t0);
 
float4 CopyCudaOutputToBackbufferPS(CalculatedTrianglePixelIn input) : SV_TARGET {
    return float4(gHDRInput[input.positionClip.xy] * gInverseNumPasses, 1.0f);
}

This way, I can see the accumulated output as it's being generated. If we create a simple kernel that outputs either pure red, green, or blue, depending on a random number, we can test if the accumulation buffer is working.
__global__ void AccumulationBufferTest(unsigned char *textureData, uint width, uint height, size_t pitch, DeviceCamera camera, uint hashedFrameNumber) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
 
    if (x >= width || y >= height) {
        return;
    }
 
    // Global threadId
    int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
 
    // Create random number generator
    curandState randState;
    curand_init(hashedFrameNumber + threadId, 0, 0, &randState);
 
    // Generate a uniform random number
    float randNum = curand_uniform(&randState);
 
    // Get a pointer to the pixel at (x,y)
    float *pixel = (float *)(textureData + y * pitch) + 4 /*RGBA*/ * x;
 
    if (x < width && y < height) {
        // Write out pixel data
        if (randNum < 0.33f) {
            pixel[0] += 1.0f;
            pixel[1] += 0.0f;
            pixel[2] += 0.0f;
            pixel[3] = 1.0f;
        } else if (randNum < 0.66f) {
            pixel[0] += 0.0f;
            pixel[1] += 1.0f;
            pixel[2] += 0.0f;
            pixel[3] = 1.0f;
        } else {
            pixel[0] += 0.0f;
            pixel[1] += 0.0f;
            pixel[2] += 1.0f;
            pixel[3] = 1.0f;
        }
    }
}

After the first 60 frames, the image is quite noisy:

However, if we let it sit for a bit, the image converges to the a gray (0.33, 0.33, 0.33) as expected:


Conclusion

Well, there we go! We can generate random numbers and average the results from several frames. The next post will cover ray-object intersections and maybe start in on path tracing itself. Stay tuned!

The code for everything in this post is on GitHub. It's open source under Apache license, so feel free to use it in your own projects.

As always, feel free to ask questions, make comments, and if you find an error, please let me know.


Happy coding!
-RichieSams