## Monday, March 14, 2016

### Woes of the number 1.0

So, here I am, programming away on my path tracer. All is well in the world. When all of a sudden, I notice that after ~3 seconds of rendering (it's a progressive path tracer), black pixels start popping up on the screen. (One on the green sphere, a whole chain to the left of the bottom blue sphere, etc.)

Uhhhh, WHAT?!? WHY?!?!?

This is really odd, since a progressive path tracer is pretty much a running average, so how the hell can I get a perfectly black pixel pop up out of nowhere? So, I slapped a conditional breakpoint on the framebuffer color splat, using the exact pixel coordinates of the black pixels, and sure enough, NaN color.

Ok, that's all fine and dandy, but how am I getting a NaN color from the integrator? "Okay, look for all the things that can make a NaN......... AHA!!"

// Get the new ray direction
// Choose the direction based on the material
float pdf;
float3a normal = normalize(ray.Ng);
float3a wi = material->Sample(ray.dir, normal, sampler, &pdf);

// Accumulate the diffuse/specular weight
weights = weights * material->Eval(wi, normalize(ray.dir), normal) / pdf;

That divide by pdf looks suspicious. If pdf is zero, then weights would be NaN. Ok, so let's dive into Sample()

/**
* Creates a random direction in the hemisphere defined by the normal, weighted by a cosine lobe
*
* Based on http://www.rorydriscoll.com/2009/01/07/better-sampling/
*
* @param wi         The direction of the incoming light
* @param normal     The normal that defines the hemisphere
*
* @param sampler    The sampler to use for internal random number generation
* @return           A cosine weighted random direction in the hemisphere
*/
float3a Sample(float3a wi, float3a normal, UniformSampler *sampler, float *pdf) override {
// Create random coordinates in the local coordinate system
float rand = sampler->NextFloat();
float r = std::sqrtf(rand);
float theta = sampler->NextFloat() * 6.28318530718f /* 2 PI */;

float x = r * std::cosf(theta);
float y = r * std::sinf(theta);
float z = std::sqrtf(1.0f - x * x - y * y);

// Find an axis that is not parallel to normal
float3 majorAxis;
if (abs(normal.x) < 0.57735026919f /* 1 / sqrt(3) */) {
majorAxis = float3(1, 0, 0);
} else if (abs(normal.y) < 0.57735026919f /* 1 / sqrt(3) */) {
majorAxis = float3(0, 1, 0);
} else {
majorAxis = float3(0, 0, 1);
}

// Use majorAxis to create a coordinate system relative to world space
float3 u = normalize(cross(normal, majorAxis));
float3 v = cross(normal, u);
float3 w = normal;

// Transform from local coordinates to world coordinates
float3 direction =  normalize(u * x +
v * y +
w * z);

*pdf = dot(direction, normal) * M_1_PI;
return direction;
}

It creates a cosine-weighted random direction in the hemisphere. Hmmm, the only way for pdf to be zero is if dot(direction, normal) is zero. Aka, the new direction is completely perpendicular to the normal.

Ok, so if direction is perpendicular to normal, then z == 0.0 (since x, y, z are locale coordinates relative to the normal, where the normal == z). How can we get z == 0.0?

Like this:
float rand = sampler->NextFloat();
NextFloat() will return a float in the range [0.0, 1.0]. So, let's suppose it returns 1.0
float r = std::sqrtf(rand);

r = 1.0, since sqrt(1.0) == 1.0

float theta = sampler->NextFloat() * 6.28318530718f /* 2 PI */;

float x = r * std::cosf(theta);
float y = r * std::sinf(theta);
float z = std::sqrtf(1.0f - x * x - y * y);
Since r = 1.0, x * x + y * y will always equal 1.0, so z == 0.0

This is really annoying! It makes perfect sense for 1.0 to be a valid random number, but it completely destroys this particular algorithm....

So, we can either check for 1.0 and reject it, or redefine our random number generator to only generate on [0.0, 1.0). Sample() is going to get called ALOT, so adding a branch is no fun. Granted, branch prediction is going to give us a big help, but still seems kind of gross.

So I went for the latter approach, and fixed the random number generator to [0.0, 1.0). This seems to be the standard for other mathematical things, so perhaps there are other algorithms that don't play well with 1.0?

So in the end, random black pixels on the screen were caused by:
1. Random number generator procs a 1.0
2. The cosine-weighted sampler can't handle 1.0, causing the pdf to be 0.0
3. The pdf is later divided through the sample (as per Monte Carlo integration)
4. Which causes a NaN
5. And since anything added to NaN is NaN, the pixel is now permanently NaN.
6. When the frame buffer is passed to OpenGL to render, it interprets NaN as (0.0, 0.0, 0.0, 1.0)

-RichieSams

## Thursday, April 9, 2015

### Making Our First Pretty Picture

This is the fourth post in a series documenting my adventures in creating a GPU path tracer from scratch
1. Tracing Light in a Virtual World
2. Creating Randomness and Accumulating Change
3. Shooting Object from Across the Way

So we're finally here! We've learned everything we need to actually implement a path tracer! First off, let me apologize for taking so long to get this post up. Things started getting busy at work again and I was in a car accident last week. So I've been recovering from that. Also, the implementation I showed a picture of last post turned out to be mathematically incorrect. (The current simulation is still far from physically correct, but I'll get to that later in the post) But, that's in the past now; let's get started.

### A Quick Review

Before we launch into the code. let's first go over what we're trying to do. For each pixel we want to simulate the light that bouces around the scene in order to calculate a color. This calculation can be described by the Rendering Equation:
$L_{\text{o}}(p, \omega_{\text{o}}) = L_{e}(p, \omega_{\text{o}}) + \int_{\Omega} f(p, \omega_{\text{o}}, \omega_{\text{i}}) L_{\text{i}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right | d\omega_{\text{i}}$
Wow, that's looks intimidating. But once we break it down it's actually a pretty simple concept. In English, the equation is:
The outgoing light from a point equals the emmisive light from the point itself, plus all the incoming light attenuated by a function of the material (the BSDF) and the direction of the incoming light.
The integral is just a mathematical way of adding up all the incoming light coming from the hemisphere around the point.

That said, once you start taking into account light reflected off of things, being absorbed by things, being refracted by things, etc, things start to become very complex. For almost all scenes, this integral can not be analytically solved.

A very simple approximation of this integral is to only take into account the light that is directly coming from light sources. With this, the integral collapses into a simple sum over all the lights in the scene. Aka:
$L_{\text{o}}(p, \omega_{\text{o}}) = L_{e}(p, \omega_{\text{o}}) + \sum_{k = 0}^{n} f(p, \omega_{\text{o}}, \omega_{\text{i}}) L_{\text{i, k}}(p, \omega_{\text{i}}) \left | \cos \theta_{\text{i}} \right |$
This is what Direct Lighting Ray Tracers do. They shoot rays from the point to each light, and test for obscurance between them.

However, you can simplify it further by also ignoring the obscurance checks. That is, you light the pixel with the assumption that it is the only thing in the scene. This is exactly what most real-time renderers do. It is known as a local lighting model. It's extremely fast because each pixel can be lit independently; the incoming light at the pixel is determined only by the lights in the scene, not on the scene itself. However, the speed comes at a cost. By disregarding the scene, you lose all global illumination effects. Aka, shadows, light bleeding, indirect light, specular reflections, etc. These have to be approximated using tricks and/or hacks. (Shadow mapping, light baking, screen space reflections, etc.)

Another way to approximate the integral is by using Monte Carlo Integration.
$\int f(x) \, dx \approx \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{pdf(x_{i})}$
It says that you can approximate an integral by averaging successive random samples from the function. As $N$ gets large, the approximation gets closer and closer to the solution. $pdf(x_{i})$ represents the probability density function of each random sample.

Path tracing uses Monte Carlo Integration to solve the rendering equation. We sample by tracing randomly selected light paths around the scene.

### The Path Tracing Kernel

I figure the best way to understand the path tracing algorithm is just to post the whole kernel, and then go over each line / function, explaining as we go. So here's the whole thing:
__global__ void PathTraceKernel(unsigned char *textureData, uint width, uint height, size_t pitch, DeviceCamera *g_camera, Scene::SceneObjects *g_sceneObjects, uint hashedFrameNumber) {
// Create a local copy of the arguments
DeviceCamera camera = *g_camera;
Scene::SceneObjects sceneObjects = *g_sceneObjects;

int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;

// Create random number generator
curandState randState;

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;

// Calculate the first ray for this pixel
Scene::Ray ray = {camera.Origin, CalculateRayDirectionFromPixel(x, y, width, height, camera, &randState)};

float3 pixelColor = make_float3(0.0f, 0.0f, 0.0f);
float3 accumulatedMaterialColor = make_float3(1.0f, 1.0f, 1.0f);

// Bounce the ray around the scene
for (uint bounces = 0; bounces < 10; ++bounces) {
// Initialize the intersection variables
float closestIntersection = FLT_MAX;
float3 normal;
Scene::LambertMaterial material;

TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);

// Find out if we hit anything
if (closestIntersection < FLT_MAX) {
// We hit an object

pixelColor += accumulatedMaterialColor * material.EmmisiveColor;

// Shoot a new ray

// Set the origin at the intersection point
ray.Origin = ray.Origin + ray.Direction * closestIntersection;
// Offset the origin to prevent self intersection
ray.Origin += normal * 0.001f;

// Choose the direction based on the material
if (material.MaterialType == Scene::MATERIAL_TYPE_DIFFUSE) {
ray.Direction = CreateUniformDirectionInHemisphere(normal, &randState);

// Accumulate the diffuse color
accumulatedMaterialColor *= material.MainColor /* * (1 / PI)  <- this cancels with the PI in the pdf */ * dot(ray.Direction, normal);

// Divide by the pdf
accumulatedMaterialColor *= 2.0f; // pdf == 1 / (2 * PI)
} else if (material.MaterialType == Scene::MATERIAL_TYPE_SPECULAR) {
ray.Direction = reflect(ray.Direction, normal);

// Accumulate the specular color
accumulatedMaterialColor *= material.MainColor;
}

// Russian Roulette
if (bounces > 3) {
float p = max(accumulatedMaterialColor.x, max(accumulatedMaterialColor.y, accumulatedMaterialColor.z));
if (curand_uniform(&randState) > p) {
break;
}
accumulatedMaterialColor *= 1 / p;
}
} else {
// We didn't hit anything, return the sky color
pixelColor += accumulatedMaterialColor * make_float3(0.846f, 0.933f, 0.949f);

break;
}
}

if (x < width && y < height) {
// Get a pointer to the pixel at (x,y)
float *pixel = (float *)(textureData + y * pitch) + 4 /*RGBA*/ * x;

// Write pixel data
pixel += pixelColor.x;
pixel += pixelColor.y;
pixel += pixelColor.z;
// Ignore alpha, since it's hardcoded to 1.0f in the display
// We have to use a RGBA format since CUDA-DirectX interop doesn't support R32G32B32_FLOAT
}
}

### Initializing all the Variables

// Create a local copy of the arguments
DeviceCamera camera = *g_camera;
Scene::SceneObjects sceneObjects = *g_sceneObjects;

int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;

// Create random number generator
curandState randState;
curand_init(hashedFrameNumber + threadId, 0, 0, &randState);

The first thing we do is create a local copy of some of the pointer arguments that are passed into the kernel. It means one less indirection when fetching from global memory. We fetch multiple variables from the camera and we fetch the scene objects many times.

Next we calculate the global id for the thread and use it to create a random state so we can generate random numbers. We covered this in the second post of this series.

### Shooting the First Ray

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;

// Calculate the first ray for this pixel
Scene::Ray ray = {camera.Origin, CalculateRayDirectionFromPixel(x, y, width, height, camera, &randState)};

Then we shoot the initial ray from the camera origin through the pixel for this thread. We covered this in the first post. However, I added one thing to CalculateRayDirectionFromPixel().
__device__ float3 CalculateRayDirectionFromPixel(uint x, uint y, uint width, uint height, DeviceCamera &camera, curandState *randState) {
float3 viewVector = make_float3((((x + curand_uniform(randState)) / width) * 2.0f - 1.0f) * camera.TanFovXDiv2,
-(((y + curand_uniform(randState)) / height) * 2.0f - 1.0f) * camera.TanFovYDiv2,
1.0f);

// Matrix multiply
return normalize(make_float3(dot(viewVector, camera.ViewToWorldMatrixR0),
dot(viewVector, camera.ViewToWorldMatrixR1),
dot(viewVector, camera.ViewToWorldMatrixR2)));
}

In the first post, we always shot the ray through the center of the pixel. ie: (x + 0.5, y + 0.5). Now we use a random number to jitter the ray within the pixel. Why would we do this? By jittering the camera ray, we effectively supersample the scene. Since we're going to shoot the same number of rays either way, we effectively get supersampled anti-aliasing for free.

Let's look at one of the renders from the third post with and without the jitter: No jitter: Ewwwwwww. Look at all those jaggies...... With jitter: Silky smooth :)

### Start Bouncing!

float3 pixelColor = make_float3(0.0f, 0.0f, 0.0f);
float3 accumulatedMaterialColor = make_float3(1.0f, 1.0f, 1.0f);

// Bounce the ray around the scene
for (uint bounces = 0; bounces < 10; ++bounces) {

Here we initialize two float3's used to accumulate color as we bounce. I'll cover them in detail below. Then we start a for loop the determines how many times we can bounce around the scene.

### Shooting the Scene

// Initialize the intersection variables
float closestIntersection = FLT_MAX;
float3 normal;
Scene::LambertMaterial material;

TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);

We initialize the intersection variables and then shoot the ray through the scene. TestSceneIntersection() loops through each object in the scene and tries to intersect it. If it hits, it tests if the new intersection is closer than the current. If so, it replaces the current intersection with the new one. (And updates the normal / material).

__device__ void TestSceneIntersection(Scene::Ray &ray, Scene::SceneObjects &sceneObjects, float *closestIntersection, float3 *normal, Scene::LambertMaterial *material) {
// Try to intersect with the planes
for (uint j = 0; j < sceneObjects.NumPlanes; ++j) {
// Make a local copy
Scene::Plane plane = sceneObjects.Planes[j];

float3 newNormal;
float intersection = TestRayPlaneIntersection(ray, plane, newNormal);
if (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.Materials[plane.MaterialId];
}
}

// Try to intersect with the rectangles;
for (uint j = 0; j < sceneObjects.NumRectangles; ++j) {
// Make a local copy
Scene::Rectangle rectangle = sceneObjects.Rectangles[j];

float3 newNormal;
float intersection = TestRayRectangleIntersection(ray, rectangle, newNormal);
if (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.Materials[rectangle.MaterialId];
}
}

// Try to intersect with the circles;
for (uint j = 0; j < sceneObjects.NumCircles; ++j) {
// Make a local copy
Scene::Circle circle = sceneObjects.Circles[j];

float3 newNormal;
float intersection = TestRayCircleIntersection(ray, circle, newNormal);
if (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.Materials[circle.MaterialId];
}
}

// Try to intersect with the spheres;
for (uint j = 0; j < sceneObjects.NumSpheres; ++j) {
// Make a local copy
Scene::Sphere sphere = sceneObjects.Spheres[j];

float3 newNormal;
float intersection = TestRaySphereIntersection(ray, sphere, newNormal);
if (intersection > 0.0f && intersection < *closestIntersection) {
*closestIntersection = intersection;
*normal = newNormal;
*material = sceneObjects.Materials[sphere.MaterialId];
}
}
}

I added support for two more intersections since the last post. Ray-Circle and Ray-Rectangle. (It actually supports any parallelogram, not just rectangles). The idea for both is to first test intersection with the plane, then test if the intersection point is inside the respective shapes. The code/comments should be self explanatory, but feel free to comment if you have a question.

### Direct Hit!!!!           Or Perhaps Not....

This is where the true path tracing starts to come in. Once we've traced the ray through the scene, there are 2 outcomes: either we hit something, or we missed.

For real-life materials, light doesn't just bounce off the surface of the material, but rather, it also enters the material itself, bouncing and reflecting off the molecules.
Image from Naty Hoffman's Siggraph 2013 Presentation on Physically Based Shading

As you can image, it would be prohibitively expensive to try to calculate the exact scattering of light within the material. Therefore, a common simplification is to calculate the interaction of light with the surface in two parts: diffuse and specular.
Image from Naty Hoffman's Siggraph 2013 Presentation on Physically Based Shading

The diffuse term represents the refraction, absorption, and scattering of light within the material. It is convenient to ignore entry to exit distance for the diffuse term. Doing so allows us to compute the diffuse lighting at a single point.

The specular term represents the reflected light off the surface, aka, mirror-like reflections.

For now, we're going to assume a material is either a perfect diffuse Lambertian diffuse material, or a perfect specular mirror in order to keep things simple.

First, let's look at the case where we hit a perfectly diffuse surface lit by a single light source, $L_{\text{i}}$. If we plug that into the Monte Carlo Estimate of the Rendering Equation for a single sample, $\text{k}$, we get:

$L_{\text{o, k}} = L_e + \frac{L_{\text{i, k}} \: \frac{materialColor_{\text{diffuse}}}{\pi} \: (n \cdot v)}{pdf}$
where $n$ is the surface normal where we hit, and $v$ is the unit vector pointing from the surface to the light. We use the fact that $\left | \cos \theta \right | \equiv (n \cdot v)$ since both $n$ and $v$ are unit length.

For path tracing, $L_{\text{i}}$ represents the light coming from the next path. ie:
$L_{\text{i, k}} = L_{\text{o, k + 1}}$
It's a classic recursive algorithm. ie:
float3 CalculateColorForRay(Scene::Ray &ray, Scene::SceneObjects &sceneObjects) {
// Initialize the intersection variables
float closestIntersection = FLT_MAX;
float3 normal;
Scene::LambertMaterial material;

TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);

// Find out if we hit anything
if (closestIntersection < FLT_MAX) {
// We hit an object

// Shoot a new ray
ray.Origin = ray.Origin + ray.Direction * closestIntersection;

if (material.MaterialType == Scene::MATERIAL_TYPE_DIFFUSE) {
// We hit a diffuse surface
ray.Direction = CreateNewRayDirection();
} else if (material.MaterialType == Scene::MATERIAL_TYPE_SPECULAR) {
// We hit a specular surface
ray.Direction = reflect(ray.Direction, normal);
}

return material.EmmisiveColor + CalculateColorForRay(ray, sceneObjects) * material.MainColor * dot(ray.Direction, normal) * 2.0f /* pdf */;
} else {
// We didn't hit anything, return the sky color
return make_float3(0.846f, 0.933f, 0.949f);
}
}

Unfortunately for us, the GPU doesn't handle recursion well. In fact, it's forbidden in most, if not all, of the GPU programming languages. That said, since multiplication is both commutative and distributive, we can factor out all the $\frac{\frac{materialColor_{\text{diffuse}}}{\pi} \: (n \cdot v)}{pdf}$ terms and turn our path tracing algorithm into a iterative solution:
float3 pixelColor = make_float3(0.0f, 0.0f, 0.0f);
float3 accumulatedMaterialColor = make_float3(1.0f, 1.0f, 1.0f);

// Bounce the ray around the scene
for (uint bounces = 0; bounces < 10; ++bounces) {
// Initialize the intersection variables
float closestIntersection = FLT_MAX;
float3 normal;
Scene::LambertMaterial material;

TestSceneIntersection(ray, sceneObjects, &closestIntersection, &normal, &material);

// Find out if we hit anything
if (closestIntersection < FLT_MAX) {
// We hit an object

pixelColor += accumulatedMaterialColor * material.EmmisiveColor;

// Shoot a new ray

// Set the origin at the intersection point
ray.Origin = ray.Origin + ray.Direction * closestIntersection;
// Offset the origin to prevent self intersection
ray.Origin += normal * 0.001f;

// Choose the direction based on the material
if (material.MaterialType == Scene::MATERIAL_TYPE_DIFFUSE) {
ray.Direction = CreateUniformDirectionInHemisphere(normal, &randState);

// Accumulate the diffuse color
accumulatedMaterialColor *= material.MainColor /* * (1 / PI)  <- this cancels with the PI in the pdf */ * dot(ray.Direction, normal);

// Divide by the pdf
accumulatedMaterialColor *= 2.0f; // pdf == 1 / (2 * PI)
} else if (material.MaterialType == Scene::MATERIAL_TYPE_SPECULAR) {
ray.Direction = reflect(ray.Direction, normal);

// Accumulate the specular color
accumulatedMaterialColor *= material.MainColor;
}
} else {
// We didn't hit anything, return the sky color
pixelColor += accumulatedMaterialColor * make_float3(0.846f, 0.933f, 0.949f);

break;
}
}

### Bouncing is What Tiggers Rays Do Best

The last thing to do is to choose the direction for the next path. For mirror specular reflections, this is as simple as using the reflect function:
ray.Direction = reflect(ray.Direction, normal);


Since it is a perfect reflection, the probability of the reflection in the chosen direction is 1. ie, $pdf_{\text{specular}} = 1$

However, for diffuse reflections, it's a different story. By definition, the diffuse term can represent any internal reflection. Therefore, the light can come out in any direction within the unit hemisphere defined by the surface normal:

So, we need to randomly pick a direction in the unit hemisphere. For now, we'll use use a uniform distribution of directions. In a later post, we'll explore other distributions.
__device__ float3 CreateUniformDirectionInHemisphere(float3 normal, curandState *randState) {
// Create a random coordinate in spherical space
// Then calculate the cartesian equivalent
float z = curand_uniform(randState);
float r = sqrt(1.0f - z * z);
float phi = TWO_PI * curand_uniform(randState);
float x = cos(phi) * r;
float y = sin(phi) * r;

// Find an axis that is not parallel to normal
float3 majorAxis;
if (abs(normal.x) < INV_SQRT_THREE) {
majorAxis = make_float3(1, 0, 0);
} else if (abs(normal.y) < INV_SQRT_THREE) {
majorAxis = make_float3(0, 1, 0);
} else {
majorAxis = make_float3(0, 0, 1);
}

// Use majorAxis to create a coordinate system relative to world space
float3 u = normalize(cross(majorAxis, normal));
float3 v = cross(normal, u);
float3 w = normal;

// Transform from spherical coordinates to the cartesian coordinates space
// we just defined above, then use the definition to transform to world space
return normalize(u * x +
v * y +
w * z);
}

We use two uniformly distributed random numbers to create a random point in the spherical coordinate system. Then we use some trig to convert to Cartesian coordinates. Finally, we transform the random point from local coordinates (ie, relative to the normal) to world coordinates.

To create the transformation coordinate system, we first find the world axis that is most perpendicular to the normal by comparing each axis of the normal to $\frac{1}{\sqrt{3}}$. $\frac{1}{\sqrt{3}}$ is significant because it's the length at which all 3 axes are equal (since we know the normal is a unit vector). So if any are less than $\frac{1}{\sqrt{3}}$, there is a good chance that axis is the smallest.

Next, we use the cross product to find an axis that is perpendicular to the normal. And finally, we use the cross product again to find an axis that is perpendicular to both the new axis and the normal.

The $pdf$ for a uniformly distributed direction in a hemisphere is:
$pdf_{\text{hemi}} = \frac{1}{2 \pi}$
This corresponds to the solid angle that the random directions can come from. A hemisphere has a solid angle of $2\pi$ steradians.

### Shooting Yourself in the Head

// Russian Roulette
if (bounces > 3) {
float p = max(accumulatedMaterialColor.x, max(accumulatedMaterialColor.y, accumulatedMaterialColor.z));
if (curand_uniform(&randState) > p) {
break;
}
accumulatedMaterialColor /= p;
}

During path tracing, we keep bouncing until we miss the scene or exceed the maximum number of bounces. But if the scene is closed and maxBounces is large, we could end up bouncing for a very long time. Remember though, that at each bounce, we accumulate the light attenuation of the material we hit ie: accumulatedMaterialColor will end up getting closer and closer to zero the more bounces we do. Since  accumulatedMaterialColor ends up being multiplied by the emitted light color, it makes very little sense to keep bouncing after a while, since the result won't add very much to the final pixel color.

However, we can't arbitrarily stop bouncing after accumulatedMaterialColor goes below a certain threshold, since this would create bias in the Monte Carlo Integration. What we can do, though, is use a random probability to determine the stopping point, as long as we divide out the $pdf$ of the random sample. This is known as Russian Roulette.

In this implementation, we create a uniform random sample and then compare it against the max component of accumulatedMaterialColor. If the random sample is greater than the max component then we terminate the tracing. If not, we keep the integration unbiased by dividing by the $pdf$ of our calculation. Since we used a uniform random variable, the $pdf$ is just:
$pdf = \text{p}$

### Success!

If we whip up a simple scene with a plane and 9 balls we get:

Whooo!! Look at those nice soft shadows! And the color bleeding! Houston, this is our first small step towards a battle-ready photo-realistic GPU renderer.

Let's do another! Obligatory Cornell Box:

One thing to note: since the Cornell box scene has a light with very small surface area in comparison to the surface area of the rest of the scene, the probability that a ray will hit the light is relatively low. Therefore, it takes a long time to converge. The render above was around 30 minutes. In comparison, the 9 balls scene took about 60 seconds to resolve to an acceptable level.

### Closing Words

There we go. Our very own path tracer. I shall call him Squishy and he shall be mine, and he shall be my Squishy.    (I really need to stop watching videos right before writing this blog)

The code for everything in this post is on GitHub. It's open source under the Apache license, so feel free to use it in your own projects.

As always, feel free to ask questions, make comments, and if you find an error, please let me know.

Happy coding!
-RichieSams

## Friday, March 13, 2015

### Shooting Objects from Across the Way

This is the third post in a series documenting my adventures in creating a GPU path tracer from scratch
1. Tracing Light in a Virtual World
2. Creating Randomness and Accumulating Change

If you recall, the very high level algorithm for path tracing is:
1. From each pixel, fire a ray out into the scene from the eye.
2. If the ray hits an object
1. Use the material properties to accumulate attenuation
2. Bounce the ray in a new direction
3. GOTO 2
3. If the ray hits a light
1. Multiply the light and the attenuation
2. Add the result to the accumulation buffer
3. GOTO 5
4. If the ray doesn't hit anything
1. GOTO 5
5. GOTO 1 until sufficiently converged

The first post in the series covered step 1 and the second post covered step 3b. This post is going to cover how to test if a ray hits an object in the scene (aka, 2 and 3).

### Ray - Sphere Intersection

Foreword: Scratchapixel has an excellent lesson covering ray-object intersections. Much of what I say below is based on their lessons. Much props to the authors.

A sphere can be represented mathematically with vector representing the location of the sphere's center, and its radius. (You can save a multiplication in a later calculation if you store the radius squared instead)
struct Sphere {
float3 Center;
};

A ray can be represented as a vector representing its origin and a unit vector representing its direction:
struct Ray {
float3 Origin;
float3 Direction;
};

Let's look at a picture of the most common type of ray - sphere intersection:
\begin{align} \overrightarrow{R_{o}} &= ray \: origin \\ \overrightarrow{R_{d}} &= ray \: direction \\ \overrightarrow{S_{c}} &= sphere \: center \\ S_{r} &= sphere \: radius \\ \overrightarrow{P_{0}} &= first \: intersection \: point \\ \overrightarrow{P_{1}} &= second \: intersection \: point \\ t_{0} &= distance \: from \: \overrightarrow{R_{o}} \: to \: \overrightarrow{P_{0}} \\ t_{1} &= distance \: from \: \overrightarrow{R_{o}} \: to \: \overrightarrow{P_{1}} \\ \end{align}

We would like to find $\overrightarrow{P_{0}}$ and $\overrightarrow{P_{1}}$. Mathematically, they are defined as:
\begin{align} \overrightarrow{P_{0}} &= \overrightarrow{R_{o}} + t_{0} \overrightarrow{R_{d}} \\ \overrightarrow{P_{1}} &= \overrightarrow{R_{o}} + t_{1} \overrightarrow{R_{d}} \\ \end{align}

We already know $\overrightarrow{R_{o}}$ and $\overrightarrow{R_{d}}$, so we just need to find $t_{0}$ and $t_{1}$. In order to do so, let's define a few new variables.

And then define $t_{0}$ and $t_{1}$ in terms of $t_{ca}$ and $t_{hc}$:
\begin{align} t_{0} &= t_{ca} - t_{hc} \\ t_{1} &= t_{ca} + t_{hc} \\ \end{align}

Now we can begin solving for our unknown variables. To start, let's look at the right triangle formed by $t_{ca}$, $\overrightarrow{L}$, and $d$.

We can solve for $t_{ca}$ by using the definition of cosine and the dot product:
$\cos\left ( \theta \right ) = \frac{adjacent}{hypotenuse} = \frac{t_{ca}}{\left \| \overrightarrow{L} \right \|}$
$\overrightarrow{m} \cdot \overrightarrow{n} = \left \| m \right \| \left \| n \right \| \cos \left ( \theta \right )$

$\begin{split} ie: \end{split} \qquad \qquad \begin{split} \overrightarrow{R_{d}} \cdot \overrightarrow{L} &= \left \| \overrightarrow{R_{d}} \right \| \left \| \overrightarrow{L} \right \| \cos \left ( \theta \right ) \\ &= \frac{\left \| \overrightarrow{L} \right \| t_{ca}}{\left \| \overrightarrow{L} \right \|} \\ &= t_{ca} \end{split}$

$\overrightarrow{R_{d}}$ is a unit vector. Therefore, $\left \| \overrightarrow{R_{d}} \right \| = 1$ and cancels out. Then, if we replace $\cos \left ( \theta \right )$ with its definition, we can cancel $\left \| \overrightarrow{L} \right \|$ from top and bottom. Thus, we're left with just $t_{ca}$.

Using the Pythagorean Theorem and a trick with the dot product, we can solve for d:
$\overrightarrow{m} \cdot \overrightarrow{m} \equiv \left \| \overrightarrow{m} \right \| ^{2}$

\begin{align} t_{ca} \: ^{2} + d^{2} &= \left \| \overrightarrow{L} \right \| ^{2}\\ t_{ca} \: ^{2} + d^{2} &= \overrightarrow{L} \cdot \overrightarrow{L} \\ d^{2} &= \overrightarrow{L} \cdot \overrightarrow{L} - t_{ca} \: ^{2} \end{align}

To solve for $t_{hc}$, let's look at the triangle formed by $S_{r}$, $t_{hc}$, and $d$.

Using the Pythagorean Theorem again, we can solve for $t_{hc}$:
\begin{align} t_{hc} \: ^{2} + d^{2} &= S_{r} \: ^{2} \\ t_{hc} &= \sqrt{S_{r} \: ^{2} - d^{2}} \end{align}

Whew! We made it! Using the above equations, we can find the two intersection points. Let's look at some code that implements the above and walk through some special cases / early outs.
/**
* Test for the intersection of a ray with a sphere
*
* @param ray           The ray to test
* @param sphere        The sphere to test
* @param normal_out    Filled with normal of the surface at the intersection point. Not changed if no intersection.
* @return              The distance from the ray origin to the nearest intersection. -1.0f if no intersection
*/
__device__ float TestRaySphereIntersection(Scene::Ray &ray, Scene::Sphere &sphere, float3 &normal_out) {
float3 L = sphere.Center - ray.Origin;
float t_ca = dot(L, ray.Direction);

// Ray points away from the sphere
if (t_ca < 0) {
return -1.0f;
}

float d_squared = dot(L, L) - t_ca * t_ca;

// Ray misses the sphere
return -1.0f;
}

float t_hc = sqrt(sphere.RadiusSquared - d_squared);

float t_0 = t_ca - t_hc;
float t_1 = t_ca + t_hc;

float nearestIntersection;
float normalDirection;
if (t_0 > 0 && t_1 > 0) {
// Two intersections
// Return the nearest of the two
nearestIntersection = min(t_0, t_1);

normalDirection = 1.0f;
} else {
// Ray starts inside the sphere

// Return the far side of the sphere
nearestIntersection = max(firstIntersection, secondIntersection);

// We reverse the direction of the normal, since we are inside the sphere
normalDirection = -1.0f;
}

normal_out = normalize(((ray.Origin + (ray.Direction * nearestIntersection)) - sphere.Center) * normalDirection);

return nearestIntersection;
}

The algorithm can be summarized as follows:
1. Calculate $t_{ca}$
2. If $t_{ca}$ is negative, $\overrightarrow{R_{d}}$ is pointing away from the sphere. Thus, there can not be an intersection
3. Calculate $d^{2}$
4. If $d^{2}$ is greater than $S_{r} \: ^{2}$, the ray misses the sphere.
5. Calculate $t_{hc}$
6. Calculate $t_{0}$ and $t_{1}$.
7. If $t_{0}$ and $t_{1}$ are both positive, the ray starts outside the sphere and intersects it twice. Choose the closest of the two intersections
8. If either $t_{0}$ or $t_{1}$ is negative, the ray starts inside the sphere and intersects it on the way out. Choose the positive intersection.
• They both can't both be negative, since that would mean the ray is pointing away from the sphere, and we already checked for that.
• See following picture.

### Ray - Plane Intersection

A plane can be represented using any point on the plane, and a unit normal vector from the plane.
struct Plane {
float3 Point;
float3 Normal;
};

\begin{align} \overrightarrow{P_{p}} &= plane \: point \\ \overrightarrow{P_{n}} &= plane \: normal \end{align}

Let's look at the intersection from the top.

We would like to find $d$. We can start by taking the dot product between $\overrightarrow{R_{d}}$ and  $\overrightarrow{P_{n}}$. We get $\cos \left ( \theta_{1} \right )$, since both  $\overrightarrow{R_{d}}$ and  $\overrightarrow{P_{n}}$ are unit vectors.
\begin{align} \overrightarrow{R_{d}} \cdot \overrightarrow{P_{n}} &= \left \| \overrightarrow{R_{d}} \right \| \left \| \overrightarrow{P_{n}} \right \| \cos \left ( \theta_{1} \right ) \\ &= \cos \left ( \theta_{1} \right ) \end{align}

Let's look at the triangle formed by $\overrightarrow{P_{n}}$ and $\overrightarrow{L}$, where $\overrightarrow{L}$ is the vector between $\overrightarrow{P_{p}}$ and $\overrightarrow{R_{o}}$.

If we take the dot product between $\overrightarrow{P_{n}}$ and  $\overrightarrow{L}$, we get $a$.

\begin{align} \overrightarrow{P_{n}} \cdot \overrightarrow{L} &= \left \| \overrightarrow{P_{n}} \right \| \left \| \overrightarrow{L} \right \| \cos \left ( \theta_{2} \right ) \\ &= \frac{\left \| \overrightarrow{L} \right \| a}{\left \| \overrightarrow{L} \right \|} \\ &= a \end{align}

Finally, let's look at the two triangles formed between $\overrightarrow{P_{n}}$, $\overrightarrow{R_{d}}$, and the plane itself.

They are similar triangles, so we can solve for $d$ using the Law of Similar Triangles and the definition of cosine:
$\cos \left ( \theta_{1} \right ) = \frac{b}{c}$

\begin{align} \frac{a + b}{c + d} &= \frac{b}{c} \\ c \left ( a + b \right ) &= b \left ( c + d \right ) \\ ac + bc &= bc + bd \\ ac &= bd \\ d &= \frac{ac}{b} \\ d &= \frac{a}{\cos \left ( \theta_{1} \right )} \end{align}

Let's look at the code implementation of the above and walk through it.
/**
* Test for the intersection of a ray with a plane
*
* @param ray           The ray to test
* @param plane         The plane to test
* @param normal_out    Filled with normal of the surface at the intersection point. Not changed if no intersection.
* @return              The distance from the ray origin to the nearest intersection. -1.0f if no intersection
*/
__device__ float TestRayPlaneIntersection(Scene::Ray &ray, Scene::Plane &plane, float3 &normal_out) {
float cos_theta1 = dot(plane.Normal, ray.Direction);

// If cos_theta1 is greater than -epison,
// the ray is perpendicular to the plane or points away from the plane normal
if (cos_theta1 > -1.0e-6f) {
return -1.0f;
}

normal_out = plane.Normal;
float a = dot(plane.Normal, plane.Point - ray.Origin);
return a / cos_theta1;
}

The algorithm can be summarized as follows:
1. Calculate $\cos \left ( \theta_{1} \right )$
2. If $\cos \left ( \theta_{1} \right )$ is greater than or equal to zero, the ray is perpendicular to the plane, or faces away
• In the code, we use a small episilon to account for floating point innacuracies
3. Calculate a
4. Return $\frac{a}{\cos \left ( \theta_{1} \right )}$

### Testing Them Out

In order to test the code out, I added on to the eye ray code we created in the first post. After creating the ray, I tried to intersect it with an object in the scene. If it hit, I did some basic Lambertian shading with a hard-coded directional light. If it didn't, I colored the pixel black. Here's the results:

Very nice! If I do say so myself. I don't have a plane in the scene because I didn't implement the plane intersection code until I had skipped ahead to the path tracing itself. (I got too excited. Ha ha!) So you'll have to take my word for it that the code does work. Below is the kernel code used to create the image above:
__global__ void PathTraceKernel(unsigned char *textureData, uint width, uint height, size_t pitch, DeviceCamera *g_camera, Scene::Sphere *g_spheres, uint numSpheres, uint hashedFrameNumber) {
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;

// Create a local copy of the camera
DeviceCamera camera = *g_camera;

int threadId = (blockIdx.x + blockIdx.y * gridDim.x) * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;

// Create random number generator
curandState randState;

// Calculate the first ray for this pixel
Scene::Ray ray = {camera.Origin, CalculateRayDirectionFromPixel(x, y, width, height, camera, &randState)};

// Generate a uniform random number
//float randNum = curand_uniform(&randState);

// Try to intersect with the spheres;
float closestIntersection = FLT_MAX;
float3 normal;
for (uint i = 0; i < numSpheres; ++i) {
float3 newNormal;
float intersection = TestRaySphereIntersection(ray, g_spheres[i], newNormal);
if (intersection > 0.0f && intersection < closestIntersection) {
closestIntersection = intersection;
normal = newNormal;
}
}

float3 pixelColor;
if (closestIntersection < FLT_MAX) {
float attentuation = max(dot(normal, make_float3(0.70710678118f, 0.70710678118f, -0.70710678118f)), 0.0f);
pixelColor = make_float3(0.846, 0.933, 0.949) * attentuation + make_float3(0.15f, 0.15f, 0.15f);
} else {
pixelColor = make_float3(0.0f, 0.0f, 0.0f);
}

if (x < width && y < height) {
// Get a pointer to the pixel at (x,y)
float *pixel = (float *)(textureData + y * pitch) + 4 /*RGBA*/ * x;

// Write out pixel data
pixel += pixelColor.x;
pixel += pixelColor.y;
pixel += pixelColor.z;
// Ignore alpha, since it's hardcoded to 1.0f in the display
// We have to use a RGBA format since CUDA-DirectX interop doesn't support R32G32B32_FLOAT
}


### Conclusion

There we go! We now have everything we need to do path tracing. The next post will be just that: creating a basic path tracing kernel. I've already implemented it in code, so it's just a matter of how quickly I can write up the post. (teaser picture). Keep an eye out!

The code for everything in this post is on GitHub. It's open source under the Apache license, so feel free to use it in your own projects. The path tracing code is already pushed to the repo, if you're feeling antsy, you can give that a look.

As always, feel free to ask questions, make comments, and if you find an error, please let me know.

Happy coding!
-RichieSams