• ### Blog Entries

• entries
5
30
• views
28578

We build a fully fledged ray tracer from scratch.

## Meshes and Scenes

In the last entry some simple rendering techniques aimed at improving the speed and quality of the renders were presented. In this short entry the ray tracer will be extended in two rather major ways:

• it will be able to render arbitrary triangle meshes rather than just spheres ("analytical" sphere rendering will be dropped in favor of triangle-only rendering for simplicity, but this is not obligatory, both can work together)
• a simple file format will be defined to be able to load a bunch of models and materials as a single scene, instead of hardcoding everything inside the program
This entry won't be at all math-heavy like the previous entry, so it should be easy reading for the most part. Let's jump into it!

________________

# Acceleration Structures

Recall from the previous entries that the ray tracer uses exactly two functions to interact with geometry:

• Intersection query: given a ray, return the closest intersection of the ray with the geometry
• Occlusion query: given a ray, return whether the ray intersects the geometry at all
It was also found to be useful to be able to perform these queries on a subinterval [min-distance, max-distance] of the ray rather than the entire ray [0, infinity), particularly for occlusion checks for the point light sources, where we only want to check for occluding objects along the segment between some point and said point light source.

These functions are pretty generic. Clearly they are applicable to any kind of geometry, not just spheres, and it would be nice to be able to render arbitrary geometry like, you know, actual 3D models. But how to implement these queries efficiently when you have complex geometry with lots of polygons? The naive implementation used for the spheres (checking all the spheres and retaining the closest intersection) just isn't going to scale beyond much more than a few dozen spheres, yet most interesting meshes are made up of hundreds if not thousands of triangles.

Fortunately, there is a better way. As is often the case in computer science, divide-and-conquer algorithms save the day. The key realization is the intuitively obvious fact that
[quote]
if a ray does not intersect some volume, then it does not intersect any object contained within said volume[/quote]along with the more sophisticated statement that
[quote]
if a ray intersects two volumes V1 and V2, with V1 intersected first and V1 and V2 not overlapping, then any object in V1 intersecting the ray will be a closer intersection than any object in V2 intersecting the ray[/quote]This is sufficient to come up with an efficient ray-geometry intersection algorithm, that can handle hundreds of millions of triangles.

Here a volume is assumed to be a region of 3D space contained by a closed surface,

To demonstrate the basic idea, in 2D for simplicity, suppose that your geometry fits into a square of dimensions 1x1 (the scale is not important). First subdivide your square into four smaller squares of dimension 0.5x0.5 each. For each triangle (or sphere, or whatever) that makes up your geometry, determine in which of the 4 squares it falls inside of, and make a note of which of the 4 squares are empty, if any. For each non-empty 0.5x0.5 square, repeat the process with another four 0.25x0.25 squares inside it, and so on. Stop recursing either at some arbitrary depth limit, or, more adaptively, when there's only a handful of triangles left in a square. An illustration follows:

This dolphin mesh (courtesy of Wikipedia) is made of a couple hundred triangles or so. Now, once this datastructure has been generated, how do we use it to perform intersection queries? We do what is called a traversal on the datastructure (as in, traversing the tree), as follows:

You begin with the 1x1 square, and check whether the ray intersects it. If it doesn't, then it obviously doesn't intersect the dolphin at all, so we can return right off the bat. If it does, then check which of the four smaller squares it intersects (it's easy to see that it may intersect up to 3 of them). Only consider non-empty squares, of course. Find out which one the ray intersects first, and recurse on it. If the ray intersects anything inside that square, we're finished and there is no need to check the remaining squares. If not, then go to the next closest square and continue. The base case is when the ray is found to intersect a "leaf" square that contains actual triangles, in which case the intersection check proceeds as in the naive algorithm, by iterating over each surface and finding the closest intersection (the difference is you have maybe at most 4 or 5 triangles in each leaf square, not thousands).

An animation for the same dolphin above is given below, with a segment of a ray (blue) being intersected against the dolphin:

As you can see, this is a lot less work: instead of naively checking the hundreds of triangles making up the dolphin, all we needed to do was to use the datastructure we built previously, along with maybe a couple dozen or so ray-square intersections, which are pretty cheap (as well as a few ray-triangle intersections in the final leaf square). There are plenty of ways to make this even more efficient. In particular, each square need not be divided into exactly four squares of equal area: the best datastructures select where to subdivide the square (or cube, in 3D) adaptively based on the geometry. In general, these spatial subdivision techniques all have the same asymptotic traversal complexity of O(log n) where n is the total number of triangles, but non-adaptive data structures (like the basic one in the animations) can perform very poorly on some geometries, e.g. if the triangles happen to align with the subdivisions in such a way that many leaf squares need to be examined.

Now, fun question: what do we do if a triangle straddles two squares (i.e. falls right on top of a subdivision). Should we duplicate it in both squares, split the triangle into two smaller ones on each side, change the position of the subdivision or simply stop subdividing? Can you decide off the top of your head which solution is correct, and which solution will be most performant in general, in traversal time and/or memory usage? Yeah, I can't either.

The data structure shown above is a variant of a quad-tree adapted for ray-triangle intersections. The natural 3D equivalent is called an octree. Other data structures of note include kd-trees and bounding volume hierarchies (BVH).

# Embree

Due to the numerous gotchas and edge cases in constructing such data structures, not mentioning floating-point and performance concerns, writing acceleration structures is an entire field in its own right. I recommend you try to implement your own octree or BVH if it interests you, but since we're writing a ray tracer here, let's push forward. There are a lot of open source implementations of varying quality that can be used. One of the better implementations is Intel's Embree library, which does exactly what it says on the tin:
[quote]
Embree is a collection of high-performance ray tracing kernels, developed at Intel. The target user of Embree are graphics application engineers that want to improve the performance of their application by leveraging the optimized ray tracing kernels of Embree.[/quote]
The library is free and open-source, with the code and additional tutorials available at https://github.com/embree/embree.

A ray tracing kernel is the technical term for an intersection/occlusion query algorithm, that can intersect rays against geometry.

Using this library is straightforward: you load your geometry in using its API, and it then lets you perform intersection and occlusion queries against said geometry, with all the performance details taken care of for you (it even dynamically selects an appropriate acceleration structure on the fly for optimal performance, which is pretty neat). I won't give a crash course on Embree here, the basic API is pretty simple and is well-documented with code snippets even if you're new to these kinds of libraries, I recommend you refer to its documentation while reading this entry.

Installing the library is easy, for Linux/Mac simply build it from its github repository, or for Windows get the prebuilt DLL's. If you are using C or C++, include the necessary headers and you are good to go. For other languages, you will need bindings. If there aren't any bindings for your language, they are actually pretty easy to write; read up on your language's foreign function interface (FFI) and the journal's C# raytracer has my C# bindings included.

First, the definitions of some of the words used in this entry:

• a surface is just an idealized 2D surface, with no notion of triangles or anything like that (it doesn't need to be closed or even connected... for now)
• a mesh or triangular mesh is an approximation of a surface as a collection of triangles
• a geometry is any (finitely describable) approximation of a surface; so a triangular mesh is a geometry, just as analytic surfaces like spheres or NURBS are, but not all geometries are meshes
The basic Embree workflow is as follows:

1. you create a generic "scene" object, specifying various flags such as whether you'd prefer Embree to use low-memory-footprint algorithms, or whether to avoid optimizations that might reduce the quality of the ray-geometry intersection algorithms at the expense of performance, etc...
2. you add geometries into this scene; you don't specify the geometry data here, just the main details, such as (in the case of a triangular mesh) how many vertices and triangles are part of the mesh
3. you upload your triangle and vertex data to the geometry buffers for each of your geometries (this is done by writing into a generic memory buffer, so you have lots of freedom in how to carry that out)
4. finally, you "commit" the scene to tell Embree that you're done modifying geometry, it will then build all the acceleration structures
5. you are now ready to do ray-scene intersection/occlusion queries, which is carried out through the intermediary of a "ray structure", which contains both input fields (like the ray's origin, direction, min-distance/max-distance) that you fill in, and output fields (such as which geometry was intersected, the intersection distance) which Embree fills in when you call an intersection or occlusion function
That's pretty much it for the simplest use case. There are a lot more features offered by the library, such as built-in hair geometries (specifically designed for thin, hair-like meshes) but those can be covered later on. The one feature we will want to use though, is called instancing. The idea of instancing is to able to render a single geometry in multiple places at once (each with its own transform relative to the origin) while still only having a single copy of the geometry in memory. That means you can have that 2000-triangle rock rendered in a million different places in varying sizes and rotated various ways, with little memory usage. The key to making this work is to realize that you don't have to move the geometry at all!

Rather, you give each instance a bounding box (for example) and transform that according to the instance's transform, and when the ray intersects this bounding box, you transform the ray itself with the inverse of the instance transform, and can then carry out the intersection with the original, non-instanced geometry. Same principle as view-projection matrices in computer graphics, but applied to instancing. As you can imagine, this can be really useful, as each instance can have its own individual material and texture and whatnot. Conveniently, Embree natively supports single-level instancing, by allowing you to instance an existing scene inside another, with a given transformation matrix attached to it. The instance intersected, if any, is helpfully reported through the ray structure's "instID" field.

And that's it for how to use Embree. Now, of course, the next question is how to integrate Embree with the rendering code. After all, Embree only handles the geometry part of the ray tracer, and identifies geometries and instances with plain numerical ID's. How does the renderer convert that to, for instance, an appropriate material for a given piece of geometry? You first have to answer an important question: what does the renderer actually render?

# Barycentric Coordinates

"What does the renderer actually render"? This is actually not a stupid question. It does not make much sense, for instance, to assign to each triangle its own material. Not only do most neighbouring triangles share a common material, it's unclear whether that is even meaningful, since the triangles that make up a mesh are just an arbitrary subdivision of its surface, that has little to nothing to do with the surface's material.

To really answer the question correctly, we have to take a look at what exactly Embree gives us back when we do an intersection query. Taking a look at the ray structure, it returns quite a bit of data:

• the "geometric" normal: normals are very important for lighting calculations, but more on this later on
• the intersection distance: this is obviously important for the renderer, but we already know how to use it
• the geomID/primID/instID indices which tell us what exactly the ray intersected
• the "u" and "v" fields, which are the (local) barycentric coordinates of the ray intersection point: huh?
What's up with these barycentric coordinates? We certainly haven't needed anything like that thus far, so what are they, and why are they deemed so important by the Embree library that it returns them to us? Barycentric coordinates are essentially a coordinate system over a triangle where each point inside the triangle is weighed by three coordinates indicating how close the point is to each of the triangle's three vertices. For instance, the diagram below gives the (u, v, w) barycentric coordinates for a few points on a generic triangle:

codeproject.com

Here the u-coordinate (the first one) is based on the point's distance from the K vertex, the v-coordinate from the L vertex, and the w-coordinate from the M vertex. This particular coordinate system has three major useful properties:

1. given barycentric coordinates (u, v, w) and the triangle vertices (K, L, M) in the previous diagram, the position of the point corresponding to these coordinates can be very efficiently calculated as uK + vL + wM
2. a point is inside the triangle if and only if all of its barycentric coordinates are between 0 and 1
3. since the barycentric coordinates (u, v, w) must add up to 1, you actually only need two of them - which makes sense since a triangle is a 2D surface, so you should only need two variables to address every point on it - so in most cases the (u, v) coordinates are kept and the last coordinate is implicitly calculated as w = 1 - u - v
The first property is most important to us, because (as you might guess) it means that barycentric coordinates are ideal to interpolate vertex attributes. For instance, if the mesh you want to render comes with vertex normals, you can calculate a smooth normal for every point on each triangle by interpolating the vertex normals using the point's barycentric coordinates. This gives your render a smooth appearance, rather than the faceted appearance you would get if each point on every triangle had the same normal. The same goes for texture mapping, and so on.

Yes, this does mean that the geometric normal returned by Embree is actually a face normal (the surface normal for each triangle) and not a nice smooth normal. So it's not incredibly useful to the renderer compared to smooth normals, but can come in useful in other scenarios.

This immediately explains why Embree returns the (u, v) barycentric coordinates of the ray intersection: without them, it would be difficult to do things like smooth normals or texture mapping, since you'd have to reverse-engineer the barycentric coordinates yourself from the ray intersection point and the triangle vertices anyway. But with those coordinates, together with the primID field, you can now know exactly and unambiguously where the ray intersected your geometry's surface, and in a format easy for the renderer to use.

An inquisitive reader might pause here and wonder why Embree is taking the time to calculate these barycentric coordinates for us, since we can in principle do it ourselves. The main reason is that the usual ray-triangle intersection algorithm involves calculating barycentric coordinates, so Embree might as well save and return them since it has to calculate them anyway. Another reason is that after uploading some geometry to Embree, the user may not want to keep the vertex positions of his meshes around (after all, the renderer itself doesn't need them, though it might need other attributes like vertex normals) without which it is impossible to recalculate the barycentric coordinates. A third reason is that Embree supports more geometry types than just plain triangle meshes, so the algorithm to calculate barycentric coordinates may depend on the type of geometry involved!

All of this has not been a concern to us previously because (a) we were using spheres and (b) the renderer only needed surface normals, which for a sphere do not require any kind of barycentric coordinates, and are readily calculated using only the intersection point and the sphere's center/radius. In other words, the sphere is already a smooth surface to begin with, but with (faceted) triangle meshes coming into play we'll need at least a mechanism to calculate smooth surface normals to get nice renders.

With barycentric coordinates this mechanism is easy: the smooth normal is simply given by:
smoothNormal = normalize(vertexNormals[triangle[ray.primID].V1] * ray.u + vertexNormals[triangle[ray.primID].V2] * ray.v + vertexNormals[triangle[ray.primID].V3] * (1 - ray.u - ray.v));Where the vertex normals are either produced by a CAD tool like Blender, or computed by the renderer while loading the mesh using some heuristic, e.g. averaging the face normals of adjacent triangles. The pseudocode above assumes you have a triangle array that contains three indices into a vertex array, one for each of the triangle's vertices, and an array containing vertex normals at the corresponding vertex indices.

In practice you may prefer to store all vertex attributes directly into the triangle structures to limit pointer indirection costs, even if it means duplicating some data.

Finally, there are some cases where you might not want a smooth normal. The typical case is when rendering a cube using eight vertices: each vertex can only have one normal, but you need every point on each face to have the same face normal, because you want to render a literal cube. The usual solution involves duplicating vertices, so that you now have 4 vertices for each face, each with the same correct normal. An alternative solution is to just turn off smooth normals for this particular mesh, and directly use the face normal provided by Embree.

As an aside, Embree supports adding custom geometry types. For instance, if you want to bring analytical sphere rendering back, you can create a custom geometry type that implements the ray-sphere intersection algorithm (see the Embree docs for how to do that in detail). It's important to note that and "u" and "v" fields don't have to be barycentric coordinates, they just happen to be that for triangle meshes. For a sphere you might return something else, but such that "u" and "v" together still uniquely describe the intersection point on the sphere's surface (for instance, two angles). Again, you should only ever need 2 variables for this, since you're dealing with 2D surfaces!

# Scene Graph

Let's get back to the original problem, which basically is the question of how to separate 2D surfaces (and their material/texture/etc attributes) from their underlying geometric representation, so that we can integrate Embree into the ray tracer without making the renderer completely and hopelessly dependent on Embree's own notion of geometries and meshes, but at the same time efficiently use the ray intersection information returned by Embree.

We've seen before that Embree geometries already logically represent 2D surfaces, with the primID/u/v fields of the ray structure being used to address every point on said surface. But whereas Embree only cares about the geometry (e.g. the triangle mesh), the renderer's idea of a surface is a bit more general, as it needs to know about the normal/material/texture at any point on the surface, and so on. So it is very natural to create our own "surface" type that includes a reference to the underlying geometry (managed by Embree) while also containing the various attributes of interest to the renderer (this is an example of composition).

That said, while we might want to reuse geometries multiple times with different materials, or textures, there are some attributes that are intrinsically part of the geometry (even if Embree happens not to use them for its ray intersection algorithms) and it doesn't really make sense to change them. An example of this is vertex normals/tangents/bitangents, and we don't want to duplicate them for each instance of the geometry.

Now that you have a "surface" type that logically maps to a (or at least one) Embree geometry, the renderer's job becomes simpler: all you need to do is keep a mapping from geometry ID to "surface" objects that you keep up to date when creating new Embree geometry or otherwise modifying the Embree scene, and all the renderer has to do is look up the geometry ID it got back from Embree. Of course, you really want to leverage Embree's support for instanced scenes, so it can make sense to group related "surfaces" inside a single "instance" class that maps to an Embree instance (for example, an "instance" would be a car model, and it would contain a few "surfaces" of different materials such as the windshield, the painted body, the wheels, etc).

I personally did not opt to do that in the C# ray tracer and went with a "1 surface == 1 Embree instance containing 1 Embree geometry" design (i.e. hardcoding geomID to zero) the main reason being that having a two-level hierarchy (surface + instance) is a pain to work with compared with a single collection of objects, not to mention that two levels aren't going to be sufficient for things like animation anyway (where you'd need a proper object hierarchy) and that single-level instancing in Embree is very efficient anyway so it's basically free unless you have billions of instances. The two instID/geomID fields are there functionally if you need them in any case.

And that's pretty much it. At load time you simply load all your meshes separately from e.g. .OBJ files, and you then reference them inside your surfaces. At render time the renderer simply gets back an instID value at each ray-scene intersection, looks it up to retrieve the corresponding surface, and then uses the primID/u/v fields to ask that surface about the normal/material (and possibly texture later on) at the intersection point. Since a "surface" is defined by a "geometry" (i.e. a standalone Embree scene containing some geometry and geometry-related stuff like vertex normals), a "material" and a "location", when the renderer asks for surface information, each of these three components return the data that they know about. For instance, consider the following C# structure which is what the renderer needs to know about some point on a surface:
public struct Attributes{ public Basis Basis; // normal/tangent/bitangent at intersection point public IMaterial Material; // material to use at intersection point}And this is the information that the renderer gives to the surface for it to work it out:
public struct Intersection{ public Int32 SurfaceID; // this is "instID" in the C# ray tracer's design public Int32 PrimitiveID; // primID public float LocalU; // u public float LocalV; // v public float Distance; // intersection distance}The renderer wants to convert one to the other, and it needs the Surface class to do it:
public Attributes At(Intersection intersection){ Attributes attributes = default(Attributes); Geometry.At(intersection, ref attributes); // fills in attributes.Basis Material.At(intersection, ref attributes); // fills in attributes.Material return attributes;}Logically the geometry doesn't know or care about the material, while the material doesn't know or care about the underlying geometry. The "location" object doesn't play a role in this, as Embree already handles the instancing itself elsewhere. To compute the basis the geometry object has to do some barycentric interpolation using whatever vertex normals it has loaded or generated, e.g.:
public void At(Surface.Intersection intersection, ref Surface.Attributes attributes){ Triangle tri = triangles[intersection.PrimitiveID]; if (smoothNormals) { // barycentric interpolation Vector smoothed = tri.V1.Normal * intersection.LocalU + tri.V2.Normal * intersection.LocalV + tri.V0.Normal * (1 - intersection.LocalU - intersection.LocalV); attributes.Basis = new Basis(smoothed.Normalize()); } else { attributes.Basis = new Basis(tri.FaceNormal); }}Notice that the Surface class is pretty general, it doesn't even handle any of the data given to it, it just delegates it to its components and they fill in whatever details they can. The renderer then takes it all and does its thing, completely oblivious to how all this rendering information is being retrieved. This is a pretty lightweight solution, it works pretty well and is the one I went for this time but it's far from the only way to abstract raw asset data from the renderer.

Compact Assignment

Consider the mapping from, say, instID (instance IDs) to your own instance type, whatever that may end up being in your own design. The natural datastructure for the job is a map or dictionary (std::unordered_map, Dictionary, HashTable, etc... depending on what they are called in your language). But a nice feature of the Embree library is that it guarantees compact ID assignment, which means it will assign ID's sequentially starting from zero and immediately reuse ID's of deleted geometries and instances. This in turn means you are able to implement the mapping using a simple array datastructure, like a list or an std::vector or whatnot, making the lookup very fast. This is important, since you'll be doing that lookup very, very often during rendering.

# "Sky" Lighting

When a ray "escapes" from the scene (i.e. does not intersect any geometry) the ray tracing algorithm simply discards that light path and moves on. When rendering large open scenes, it is often beneficial to assume that the scene is surrounded by some kind of light emitter (for instance, the sky). This is easy to implement, all you need to do is modify the algorithm to take that assumption into account:
if ray has escaped: radiance += weights * skyRadiance breakFor simplicity you can let the sky radiance be constant, or you can make it depend on the direction the ray has escaped in (for instance, you might have a sky background texture that you look up based on the escaped ray's direction). This is, of course, more or less equivalent to surrounding your scene with a huge emissive sphere with the corresponding emissivity.

A bonus with open scenes like these is that they are typically easier to render, as basically every ray you trace ends up escaping eventually, and contributes a large amount of radiance whenever it does. So it helps to work with open scenes rather than closed ones for your experiments if possible, at least until we get to advanced light sampling methods in future entries.

# Scene File Format

At this point you might feel that setting up a scene is a lot of work, especially in placing the different objects in the correct place. That is quite a hassle, and because all that configuration stuff is hardcoded in, it pollutes the code (not to mention requiring a recompile whenever you want to tweak something). A way to alleviate these issues is to move all the scene information into a separate file, that your ray tracer can load at runtime before beginning the rendering. Ideally this "scene file" should be easy to write. I would strongly recommend against binary at this point, the time you'll waste writing a generator and a parser will much outweigh the minuscule time saved during loading. Probably your best bet is a textual, hierarchical file format such as JSON, YAML, or (if you must) XML. The geometry files though can be in any format you like, .OBJ is an easy one to parse but you could use, say, Assimp to load meshes from a variety of file types.

If you decided to follow the design I outlined earlier, the actual format of your scene file is quite straightforward: you first have a list of geometries, consisting of (for example) a path to an OBJ file, and perhaps a boolean indicating whether to generate smooth vertex normals for it, then you have a list of materials, e.g. lambertian with so-and-so albedo and emittance, and finally a list of locations, for instance using translation/scaling/rotation information. Each of these items is named, and referenced inside a final list of surfaces, each consisting of one geometry, one material, and one location.

For instance, this is a small extract of the (YAML) scene file for the render at the bottom of the entry used by the C# ray tracer. Notice that the bunny geometry is used twice, by referring to it (by name) in two different surfaces:
geometries: floorMesh: path: Meshes/floor.obj smoothNormals: false bunnyMesh: path: Meshes/bunny.obj smoothNormals: truematerials: grayMaterial: !!lambertian albedo: r: 0.8 g: 0.8 b: 0.8 yellowMatte: !!lambertian albedo: r: 0.75 g: 0.75 b: 0.25 blueMatte: !!lambertian albedo: r: 0.25 g: 0.25 b: 0.75locations: origin: scale: 1 visible: true blueBunnyLoc: scale: 5 visible: true yellowBunnyLoc: scale: 3.5 translation: x: -0.5 y: 0 z: -0.5 visible: truesurfaces: floor: geometry: floorMesh material: grayMaterial location: origin bigBlueBunny: geometry: bunnyMesh material: blueMatte location: blueBunnyLoc yellowBunny: geometry: bunnyMesh material: yellowMatte location: yellowBunnyLocWhile it makes sense to have geometries and materials referenced indirectly by the surfaces (instead of each surface owning its geometry and material), it doesn't appear to make much sense for the locations so far. After all, every object has its own unique location, right? It's not too useful if you are only interested in absolute locations. It starts to become interesting when you want to position objects in relation to others, for instance for animation purposes, making sure an object is always "inside" another, or whatnot. Then you can have locations refer to one another and there's no problem.

Whether you want to put the camera information in your scene file is completely up to you. Also note that I haven't talked about how to store point light sources so far; they are a bit of an annoyance, since they can't really be treated like surfaces. When I'll go over area lights we'll get rid of point lights for good and the problem will disappear, as we can then replace them with normal surfaces that just happen to emit lots of light.

# Conclusion

With all this design work you can now render interesting scenery not just composed of spheres, and can render "open" scenes with an illuminated background that isn't just solid black. For instance:

Notice the soft shadow beneath the teapot. Also note that the mirror sphere on the right side is slightly tinted green. It looks quite unnatural, the main reason is that in real life mirrors don't actually behave like that, the amount of light they reflect does in fact depend on the reflection angle; more on that in later entries.

After this entry the ray tracer is now capable of rendering arbitrarily complex 3D meshes, and can read scene files (taken on the command-line along with some other parameters) to avoid constantly having to recompile it. In the next entry we'll go over anti-aliasing, and then perhaps improve the camera, which hasn't been looked at since basically the first entry.

The snapshot of the C# ray tracer as it was after publishing this entry can be found here (commit a4a675a2).

## Importance Sampling

Beware: this entry is HUGE, lots of stuff covered and it will likely take time for the reader to digest it all.

A version of the rendering equation was introduced in the last entry. As we saw, the integral inside the rendering equation generally has no direct, analytical solution, so there are only two approaches to solving it: evaluating it numerically using statistical methods, or "giving up" and simplifying it - for instance by considering only a small number of relevant directions instead of the entire hemisphere - and producing approximate, non-exact results, but generally faster. Let's go with the former approach, since it's more interesting for now.

Note: I will assume some knowledge of probability theory, including probability density functions and cumulative distribution functions. They should only need an understanding of calculus to learn, so make sure you are up to it. A good reference (among others) is the Khan Academy, specifically the first two sections of this page along with some exercises. Also, it is not required, or expected, to actually compute the various integrals below; a computer can do that. All that matters is that you understand what they represent, and how they are being manipulated.

________________

# Everything is a Light Source

You may have noticed that in the previous entry, while we introduced a general "emission function" that fully determines how any given object emits light, the pseudocode given only considered the point light sources as the only "objects" to emit light. This is easily rectified by simply evaluating said emission function (as a function of the outgoing direction, i.e. ?o or "cosO" in the pseudocode) at each recursion, as per the rendering equation. But what about objects that don't emit light? Instead of making a special case for them, simply give them a constant emission function of zero in every outgoing direction.

From now on let's refer to the values of the emission function for a given outgoing direction ?o as the "emittance" of the material in that direction. Treating all objects as light sources, even if they don't emit any light, will be useful in this entry and later ones.

# Light Paths

The main problem to overcome is that there are a lot of directions over the entire hemisphere (infinitely many, actually) so there's just not enough time to give equal consideration to all of them, unless you want to wait days for each render to finish. Fortunately, there are shortcuts you can use, that actually still give the correct result, but much faster on average, by cleverly using statistics to their advantage. In the last entry I mentioned that the idea of simply shooting off N rays in random directions each time your ray intersects with an object, and recursively calling the radiance function to numerically evaluate the hemisphere integral (up to some maximum recursion depth) is unsustainable, partly because its complexity explodes exponentially. Is there a better way?

Note that if we shoot off a single ray (N = 1) then there is no exponential growth, and in fact our recursion ends up tracing a "path" of successive bounces of light within the world. If we do this several times with different, randomly selected directions, we should expect to eventually cover all possible paths that light can take (that ends at the camera) and their respective radiance contributions, and therefore averaging the results of each trial should in principle converge to the correct answer. This means that as the number of trials goes to infinity, the error between your calculated average radiance and the "true" radiance tends to zero.

Of course, we are interested in making that error tend to zero as fast as possible with respect to the number of trials done. Observe that not all "light paths" are of equal significance: a path traced by light that starts at some bright light source and bounces once on some reflective surface straight into the camera is obviously much more significant than one traced by light starting at some dim, distant light source that bounces several times on dark surfaces, losing energy to absorption at every bounce, before reluctantly entering the camera. But how can we determine in advance which light paths are significant? This is where the theory of probabilities can help us. A lot.

Let's have a look at what our radiance function (in pseudocode) would look like with N = 1 as described above. We'd have (ignore the maximum depth condition for simplicity):

function Radiance(ray): total = (0, 0, 0) // (in rgb) if ray intersects with some object: pos = intersection pos of ray with object normal = surface normal at point pos cosO = -dot(ray direction, surface normal) total += Emittance(cosO) // based on object's material for each point light source: if the light source is visible: cosI = dot(direction from pos to light pos, normal) total += BRDF(cosI, cosO) * cosI * (light intensity / (distance to light)^2) newDir = random direction in hemisphere about normal vector cosI = dot(newDir, normal) total += BRDF(cosI, cosO) * cosI * Radiance(ray starting at pos in direction newDir) return total
If this time we think in terms of light paths instead of directions, we see that our algorithm above picks up a lot of different light paths at each recursion. This is easier to grasp graphically; consider the 2D animation below, where the black lines are the path traced by the recursion above, and the different paths in red are the light paths found at each step of the algorithm.

Note that even if the two objects here don't emit any light, we still form a light path using their emittance (which would then be zero). If they don't emit any light then those light paths contribute zero radiance, which is okay. Note the last frame: the algorithm tries to form a light path from the point light source to C to B to A to the camera, but fails because the green sphere is in the way, so that's not a light path.

Let's take one of the light paths in the example above to find out what is going on with the radiance along the path:

Here the rA, rB coefficients are "reflectance coefficients" that, in some sense, describe how much of the incoming radiance along the path gets reflected along the next segment of the path (the rest being reflected in other directions, or simply absorbed by the object) as per the rendering equation. Notice that in the diagram above we've not only included the radiance emittance from the point light source (eP) but also the radiance emitted by points A and B (eA and eB) respectively. If we expand the final expression for the radiance, we find:

So we see that we've actually accounted for the first three light paths in the animation with this expression: the light path from A to the camera, the one from B to A to the camera, and the one from P to B to A to the camera. And we also see that the radiance contribution for each light path can be expressed as a light path weight (corresponding to the product of the various reflectance coefficients along the light path) multiplied with the emittance of the start point of the light path, and these radiance contributions are, apparently, additive and independent of each other.

The equivalence of this algorithm with the rendering equation can be plainly seen at this point by noting that apart from the "first path" from A to the camera (which corresponds to the emittance of A into the camera, and always has weight 1) all the other light paths correspond to radiance contributions along various directions in the hemisphere integral at A (and that after adding up every possible light path, you do in fact get back precisely the hemisphere integral).

The general algorithm described above is uncreatively called path tracing. It, together with the notion of light paths, forms the basis for many more advanced rendering algorithms, which is why it is covered first, and it is by no means the only "exact" algorithm.

This is nice and all, but where does that get us? The key idea is that by doing all this, we've managed to assign an easily computable "weight" to each light path, which is the product of the reflectance coefficients along said path, and using those weights we can now decide how significant a given light path is. Enter importance sampling to precisely define how to do this.

# Importance Sampling

The idea of importance sampling, as its name suggests, is to sample a probability distribution in such a way that higher probability events are considered (sampled) more often than lower probability events. Intuitively, if a student has an exam with three questions, one worth 5% of the points, another worth 20%, and another worth 75%, then ideally, and in the absence of further information about the specific questions, to get a good grade the student should allocate 5% of his time working on the first question, 20% of it working on the second one, and finally 75% (most) of his time working on the third question, worth most of the marks. It makes no sense to spend, say, half of your time working on the first question - that time is better spent elsewhere. And that is really the fundamental idea behind importance sampling.

Example

Let's take a simple example to visualize importance sampling better. Suppose we have a (discrete) probability distribution where the letter 'A' has probability 10%, the letter 'B' has probability 20%, the letter 'C' has probability 70%, and the letter 'D' has probability 0%. To importance sample this distribution, we want a computer program where, say, each time you run it, the letter 'A' is printed 10% of the time, the letter 'B' is printed 20% of the time, etc...

The simplest program that does this is probably as follows:
p = (uniform) random number between 0 and 1if (0 ? p < 0.1): // probability 0.1 = 10% print('A')else if (0.1 ? p < 0.3): // probability 0.2 = 20% print('B')else: // probability 0.7 = 70% (0.3 ? p ? 1, "the rest") print('C')// D is never printed, having probability zero
This program importance-samples the given probability distribution of letters: each time you ask for a letter, it gives you each letter based on its corresponding probability of occurring. This is ultimately the same as choosing 'A', 'B', 'C' and 'D' at random with 25% chance each, and then weighing them based on their probabilities, but done much more efficiently, as each letter is already drawn according to how much it weighs in the probability distribution.

Now for a more challenging (and relevant) example, with a continuous probability distribution, which is what we will be working with most of the time. Consider the probability density function which to each angle ? between 0 and 90 degrees (?/2 radians) assigns a probability density given by:

You can verify the probability densities add up to 1, as should always be the case in statistics. So now we want to write a computer program similar to the previous one, but this time it will randomly select an angle ? based on its probability density cos? of occurring. We can't use the same algorithm as the previous program, since there are infinitely many possible angles between 0 and ?/2 radians.

One way of approaching the problem is to use the following reasoning. We know that by definition of the cumulative distribution function and from basic calculus that we can compute the probability of drawing an angle less than x (for any angle x between 0 and ?/2) as:

from which it follows that:

So now suppose that we partitioned the range of possible angles [0, ?/2] into four equal-size intervals [0, ?/8), [?/8, 2?/8), [2?/8, 3?/8), [3?/8, ?/2], we can assign probabilities to each:
[0, ?/8) = sin(?/8) - sin(0) = 0.383[?/8, 2?/8) = sin(2?/8) - sin(?/8) = 0.324[2?/8, 3?/8) = sin(3?/8) - sin(2?/8) = 0.217[3?/8, ?/2] = sin(?/2) - sin(3?/8) = 0.076
These probabilities, of course, have to add up to 1, since the returned angle must ultimately fall into one of those intervals. Therefore based on those probabilities we can assign each of those intervals to an interval of the unit interval [0, 1] as:
[0, ?/8) = sin(?/8) - sin(0) => [0, 0.383)[?/8, 2?/8) = sin(2?/8) - sin(?/8) => [0.383, 0.383 + 0.324)[2?/8, 3?/8) = sin(3?/8) - sin(2?/8) => [0.383 + 0.324, 0.383 + 0.324 + 0.217)[3?/8, ?/2] = sin(?/2) - sin(3?/8) => [0.383 + 0.324 + 0.217, 0.383 + 0.324 + 0.217 + 0.076]
That is, simplifying the intervals:
[0, ?/8) = sin(?/8) - sin(0) => [0, 0.383)[?/8, 2?/8) = sin(2?/8) - sin(?/8) => [0.383, 0.707)[2?/8, 3?/8) = sin(3?/8) - sin(2?/8) => [0.707, 0.924)[3?/8, ?/2] = sin(?/2) - sin(3?/8) => [0.924, 1]
So that in principle, we can implement our computer program as:
p = (uniform) random number between 0 and 1if (0 ? p < 0.383): print some angle in [0, ?/8)else if (0.383 ? p < 0.707): print some angle in [?/8, 2?/8)else if (0.707 ? p < 0.924): print some angle in [2?/8, 3?/8)else: // if (0.924 ? p ? 1), "the rest" print some angle in [3?/8, ?/2]
Now this is starting to look really similar to the program in the first example. This is only approximate, since we don't know yet how to select which angle to return from each subinterval in each case. But nothing prevents us from partitioning [0, ?/2] into smaller and smaller intervals, and if you make the size of these intervals tend to zero, you eventually discover that each angle ? is associated with a unique real between 0 and 1 according to the cumulative distribution function above, and that if you select that real between 0 and 1 uniformly at random, and then find the corresponding angle ?, then that angle follows the probability density function we started with.

In our case, this means that we need to select some random number p between 0 and 1, and then solve sin(?) = p, which gives ? = arcsin(p), and so the sought-after computer program can be easily written as:
p = (uniform) random number between 0 and 1? = arcsin(p)print(?)
And that's it. This simple algorithm will generate angles according to the cosine probability density function defined earlier, and if you run it you will find it will return smaller angles more often than larger ones since those have a larger probability density. We've importance-sampled the cosine distribution.

This process is called inverse transform sampling. In general, if you can invert the cumulative distribution analytically (like in this case) then it's pretty efficient. If not, then you may need to use binary search to solve P[? ? x] = p for x (note the CDF is always an increasing function by definition, so binary search will always work).

Application

So the goal that we'd like to achieve here is to, instead of simply picking a random direction for the next ray in the light path and calculating the BRDF afterwards, rather find a way to use the BRDF to make it so that we automatically select ("sample") a direction based on how much it will contribute to the radiance along the rest of the path, so that highly contributing directions will be selected "often" and less significant directions will be selected "less often", and directions that don't contribute at all will be selected with probability zero, aka never.

To this end, look at the hemisphere integral in the rendering equation once again:

If you think about it, the term is pretty close to being a probability density function, as a function of incoming direction (the probability density function is different for each outgoing direction ). Let's write it as:

Can we actually treat it as a probability density function? Not quite. Recall that any probability density function on some domain X must satisfy at least two conditions, among others, namely positivity:

and the second being that the integral of the probability density function over its domain must equal 1 (in essence, "all probabilities must add up to one"):

The "domain" here is, loosely stated, the space of all possible events that the probability density function covers (the "sample space"); for basic PDF's, this is the real numbers, and the PDF assigns a probability density to every real, in our case it will be the set of incoming directions in the hemisphere centered on the surface normal. You can make this idea rigorous by using measure theory, and I encourage you to learn more about it, but you should be okay for most of the PDF's considered in this series.

The term already achieves positivity, since both the BRDF and the cosine term are non-negative for every incoming direction . It is also known that the BRDF conserves energy, which is to say that:

We covered this last time by looking at the Lambertian BRDF: recall that the above inequality must hold because otherwise the surface would reflect more energy than it receives (excluding the emittance term). So this is looking pretty good, it's almost a probability density function. The problem is that in our would-be probability density function can integrate to less than 1, rather than exactly 1 as required. This can happen if the surface absorbs some of the incident light, so that not all incident light is reflected.

Can we still salvage the situation? It turns out that we usually can, by simply scaling the entire pseudo-pdf up just enough so that it does integrate to 1. Let's define "refl" to be a function (with the outgoing direction as a parameter) which returns the reflectance of the integral over the hemisphere of this would-be probability density function:

Which returns values between 0 and 1 which, in some sense, describe how "reflective" the surface is in a particular outgoing direction , the fraction of all light (as energy, i.e. power per surface area) incident to the surface that gets reflected towards . Whenever the reflectance is not zero, it makes sense to define the normalized or "fixed up" probability density function as:

Such that, by construction:

And we now have a true probability density function to work with, which assigns a probability density to each incoming direction based on how much that direction contributes to the radiance along the outgoing direction. So we can now apply importance sampling, by randomly selecting the incoming direction (which will be the "next" ray along the light path, refer to the animation earlier above) based on its corresponding probability density, so that higher-contributing directions will be chosen proportionally more often than lower-contributing directions.

If the reflectance is exactly zero for some outgoing direction then we have a problem. But if it is zero, that means that there is literally no way for the surface to reflect any light whatsoever in that direction, no matter how much light you shine anywhere on the surface. That would be a very odd material indeed! In fact, I cannot think of any situation where this occurs with reflective surfaces. Though even if it did, we could handle it easily as a special case.

A way to interpret this physically is that if you fired a stream of photons at some surface back along the outgoing direction , and had them reflect along some direction at random according to this probability density function, then the resulting distribution of reflected light would be equal to the original BRDF (up to the constant reflectance factor, which we will soon account for).

This might be a bit confusing because in the ray tracing algorithm we are tracing the light path in one direction (the "wrong" direction, camera to light source) but calculating the radiance along the light path in the other direction, so it can become confusing what is meant by "outgoing" and "incoming" directions. Remember though that the two are basically interchangeable because of the reciprocity principle introduced in the first entry.

The implementation of this "weighted" random selection of the incoming ray often depends on the specific BRDF involved. I'll show how to do it for the Lambertian BRDF here, later on when more materials are introduced I'll both give its BRDF and derive the importance sampling technique for it. With the Lambertian material recall that the BRDF is just a constant albedo coefficient R between 0 and 1 divided by ?, and as derived in the previous entry we found that the reflectance of the Lambertian material for any outgoing direction is actually equal to the albedo coefficient, so that the corresponding probability density function is just:

So that only the cosine term is left over. At this point it may seem that we could just reuse the cosine distribution that we importance-sampled in the example earlier, since all we have left is a dot product which is basically just a cosine, but watch out! If we take a look at the cumulative distribution function of this PDF we find that it is equal to:

And if you recall from the previous entry that we define d? = sin(?) d? d? in spherical coordinates, a sine term that was previously "hidden" by the d? notation appears:

And therefore if you try to use the inverse transform sampling method on the ? variable, you end up with:

So that in reality, our inverse transform sampling algorithm is in this case not simply ? = arcsin(p), but actually ? = arcsin(?p), where p is the uniform random number between 0 and 1. Subtle, but important difference, without which your importance sampling is completely wrong (the moral of this is, do not carry out symbolic computations using d?, you will make mistakes! always expand it out using d? = sin(?) d? d?).

This particular section may be difficult to follow; I have provided at the end of the entry an alternative geometric explanation for why a square root is appearing out of seemingly nowhere. It's the same thing, but it may be easier to grasp for some than the purely algebraic derivation above.

To recap, this gives us a way to generate ?, the angle between the importance-sampled direction and the surface normal, so all that is left is to generate the orientation. Since the Lambertian material is independent of the orientation, it is clear that we can just select it at random between 0 and 2?. But just to make sure, we can also arrive at that conclusion by integrating the PDF again, but this time as a function of orientation ?, which gives you:

Which tells you that if you select a uniform random number p between 0 and 1, then you can importance-sample the orientation angle by solving the equation p = ? / (2?) for ?, giving ? = 2?p, and since p is uniform in [0, 1] the math is just telling you to pick a random uniform angle ? in [0, 2?]. (pretty neat, huh?)

From these two angles we generate the importance-sampled direction using the spherical coordinates formula. This gives:
Vector RandomDirectionLambertian: theta = arcsin(sqrt(uniform random number between 0 and 1)) orientation = uniform random number between 0 and 2pi // convert spherical coordinates (orientation, theta) to a unit vector return Vector(sin(theta) * cos(orientation), cos(theta), sin(theta) * sin(orientation))
Which can be optimized a little bit by eliminating redundant sin(arcsin(...)) calculations as follows:
Vector RandomDirectionLambertian: sin2_theta = uniform random number between 0 and 1 // the uniform random number is equal to sin^2(theta) cos2_theta = 1 - sin2_theta // cos^2(x) + sin^2(x) = 1 sin_theta = sqrt(sin2_theta) cos_theta = sqrt(cos2_theta) orientation = uniform random number between 0 and 2pi return Vector(sin_theta * cos(orientation), cos_theta, sin_theta * sin(orientation))
And that's how you importance-sample the Lambertian material. This kind of distribution is called a cosine-weighted (hemisphere) distribution.

Note the above is valid only in "normal space", i.e. when the normal points upwards (0, 1, 0). In general it is easier to work in normal space, and then transform the obtained direction into world space afterwards. I'll talk about this later in this entry, there is still some theory left to cover.

Now that we know how to generate the next direction in the light path using importance sampling rather than just selecting a direction completely at random, the ray tracing algorithm can be modified to use importance sampling. It will look something like this:
function Radiance(ray): total = (0, 0, 0) if ray intersects with some object: pos = intersection pos of ray with object normal = surface normal at point pos cosO = -dot(ray direction, surface normal) total += Emittance(cosO) for each point light source: if the light source is visible: cosI = dot(direction from pos to light pos, normal) total += BRDF(cosI, cosO) * cosI * (light intensity / (distance to light)^2) newDir = IMPORTANCE-SAMPLED direction in unit hemisphere about normal total += reflectance * Radiance(ray starting at pos in direction newDir) return total
Notice the difference: we generate a new direction in newDir using the importance sampling algorithm (such as the one for Lambertian materials just above) and we no longer need to multiply the radiance along that ray by the BRDF nor the cosine term, since those are already included in the importance-sampled distribution! The only thing we need to account for was our (1/refl) scaling factor, which we account for by multiplying the incoming radiance by the reflectance (for the outgoing direction, which is the previous ray direction) once again.

Intuitively, if the reflectance was, say, 0.5 in some outgoing direction, then it means that the surface reflects only 50% of all incoming light in that direction and absorbs the rest, so that whenever we select an importance-sampled direction from the material's probability density function, we have to weigh all radiance along that path by 0.5 (because we never explicitly take into account absorption). Note that is more efficient than just completely discarding half of the light paths.

And that's it. Seems like we covered quite a lot in this section, but there is one last major importance-sampling technique to learn about.

Russian Roulette

With the importance sampling improvement from the previous section, the algorithm has changed a little. The weights rA, rB, etc.. have been "absorbed" into the material probability density functions used to probabilistically select the next ray direction, and have been replaced by the pseudo-pdf reflectances reflA, reflB. Also note that whereas rA, rB and so on may have been greater than 1 (being probability densities), the reflectances on the other hand are genuinely between 0 and 1, so they are truly weights in every sense of the word.

What this means is we can also importance-sample light paths based on these weights, like so basically:
function Radiance(ray): total = (0, 0, 0) if ray intersects with some object: pos = intersection pos of ray with object normal = surface normal at point pos cosO = -dot(ray direction, surface normal) total += Emittance(cosO) for each point light source: if the light source is visible: cosI = dot(direction from pos to light pos, normal) total += BRDF(cosI, cosO) * cosI * (light intensity / (distance to light)^2) newDir = IMPORTANCE-SAMPLED direction in unit hemisphere about normal if (random number between 0 and 1) <= reflectance: total += Radiance(ray starting at pos in direction newDir) else: break return total
Exactly the same idea as importance-sampling a BRF's probability density function, just with a different (simpler) probability distribution. What this does is terminate light paths early based on their potential radiance contributions. Since the weights are always between 0 and 1, the product of the weights can never increase as more bounces are added to the light path. The end result is that the algorithm spends more time overall on light paths which are likely to contribute more radiance, and spends less time on light paths which aren't.

This technique is commonly referred to as russian roulette, because it terminates ("kills") light paths at random probabilistically. Take note that if the reflectance is zero then the break will always be taken, so that's not even a problem anymore; in general probabilistic methods have a tendency of correctly handling edge cases, which is always a plus.
Now, there is a little implementation detail that is too important not to mention here. So far all the theory above has been pretty oblivious to the fact that we are not dealing with single numbers as radiances but instead with RGB radiance vectors. This is not a problem most of the time, as the math can be applied to each RGB channel in the vector component-wise. However, in this case the logic does need to be tweaked, because it makes no sense to compare a vector with a real.

The problem can be worked around by taking the largest component in the RGB reflectance vector, carrying out russian roulette on this number, and (if we are not terminating the light path) dividing the reflectance vector by that component, to account for the termination probability:
probability = max(reflectance.x, reflectance.y, reflectance.z)if (random number between 0 and 1) <= probability: total += (reflectance / probability) * Radiance(...)else: break
Since "probability" is the largest component, the new vector "reflectance / probability" still has all its components between 0 and 1, and whichever component was the largest becomes 1. In the case where all the components are equal, then reflectance / probability = (1, 1, 1) so the weight disappears (as in the single component case).

# Mirror Material

Just to show what the importance sampling technique is capable of, let's add the "perfect mirror" material to our list of materials so far next to the Lambertian material. This particular material is pretty easy to conceptualize: it reflects 100% of the incoming energy along some direction into the reflected direction according to the so-called law of (specular) reflection:

wikipedia.org

It's possible to give the material a reflection coefficient so that it only reflects, say, 90%, or 70%, of the incident light. Let's call that coefficient R, between 0 and 1, as an RGB vector, so that you can have a tinted mirror, that, for example, reflects 100% of red light, 50% of green light, and 20% of blue light, corresponding to R = (1, 0.5, 0.2).

The BRDF for this mirror material is a bit tricky, because it's easy to see it will be zero "almost everywhere". Specifically, for any outgoing direction , it will be zero everywhere except for that one incoming direction where and are reflections of each other about the surface normal. This is problematic because if we assigned any finite value to the BRDF at that particular point, its integral over the unit hemisphere would still be zero, because it would only be nonzero at a single infinitesimal point. The solution (kind of) is the Dirac delta function, which is a special function ? over the reals such that:

What the function is equal to at x = 0 is left unspecified, because we won't actually need to know that. As explained before, it can't be any finite value, so some people like to think of it as equally infinity at x = 0, but that's not very rigorous. Formally, the delta function is not really a function, but it's not the end of the world if you treat it as such (just don't do that in the presence of theoretical mathematicians!). It can be quite mind-bending for a function to be zero almost everywhere and still manage to integrate to 1, but don't get too hung up on that: the function is only used to properly define the BRDF. It will not be used in the code.

Anyway, that gives us a tool for representing the mirror material mathematically. The condition that and are reflections of each other translate to "they must form the same angle with the normal and must be in the same plane (and not equal)". In short, if we denote the angles of and as (?i, ?i) and (?o, ?o) respectively, the two directions are reflections of each other if and only if ?i = ?o (law of reflection) and ?i = ?o + ? (modulo 2?, "in the same plane" condition). So the BRDF can be written:

You can multiply delta functions, in which case the product is zero whenever either of the two arguments is nonzero, and nonzero only if both arguments are zero, as one would expect. The cos?o factor in the denominator is there to correctly reflect all of the incoming energy, remember that the BRDF has units of 1/sr, so it should return a quantity per steradian, which is what this cosine term accounts for.

We then find that:

Where the cosine terms cancel out because ?i = ?o whenever the integrand is nonzero, and the result follows from the properties of the delta function. This shows R is the reflectance of the mirror material's pseudo-probability density function (as you would expect, since that is the mirror's reflection coefficient).

Finally, there will be no need for any fancy mathematics to importance-sample this material: there is only one possible outgoing direction for each incoming direction and vice versa, so all you need is some vector math to work out the reflected direction from the incident direction and the normal according to the law of reflection. How to do this is explained all over the internet so I won't go into detail, read here for a derivation, and here is the formula:
reflect(vector, normal) = 2 * normal * dot(vector, normal) - vector(assuming dot(vector, normal) >= 0 and vector is unit length)
So the importance-sampling code is super simple:
Vector ImportanceSampleMirror(outgoing, normal): return reflect(outgoing, normal)
Though if you really wanted to, you could integrate the probability density function and solve for ? and ? as was done for the Lambertian material, which would give you the (constant) solutions ? = ?o and ? = ?o + ? strictly equivalent to the reflect() function above; it's not interesting, but it works.

Note that if you just picked directions at random you would never be able to pick a direction for which the BRDF evaluates to something other than zero, because you would have to randomly pick the exact direction corresponding to the reflection of the outgoing direction about the surface normal, which is a zero probability event. So the original ray tracing algorithm without importance sampling would not have worked without making the mirror material a special case, whereas importance sampling handles the problem elegantly (and optimally).

This is not an anisotropic material! Though the BRDF does depend on the orientation of the incoming and outgoing directions about the surface normal, the BRDF is unchanged under a rotation of directions about the surface normal (in other words, if you rotate a mirror about its surface normal it doesn't change its appearance). This can be seen in that for any rotation r about the surface normal, we have ?((?i + r) - (?o + r) - ?) = ?(?i - ?o - ?) so there is no difference.

As you can see, importance sampling provides you with, among other things, a solid framework for handling different BRDF's in an efficient, mathematically robust, and systematic way; even the rather degenerate BRDF for the perfect mirror above is not a problem for it. We will see later how to extend it to support things like refraction and volumetric rendering. The math will be more or less the same, just extended to support more generalized types of BRDF's (the definition that we're currently working with only supports a small subset of all possible materials, though you can go a very long way with them).

# Implementation Concerns

All right, if you've made it here, good job, you survived the theory! There are a couple things I want to mention regarding the implementation of what has been covered above. In particular, you may have noticed that the pseudocode for the ray tracing algorithm is at this point littered with cosI, cosO variables. This is a bit dumb, because those cosine variables only make sense when attached to some direction vector (e.g. incoming/outgoing directions), which in turn are only meaningful relative to the surface normal.

Furthermore, I mentioned earlier that the importance sampling technique for the Lambertian material operates in so-called normal space, where the normal is assumed to point upwards as (0, 1, 0), so that to be useful the resulting importance-sampled direction needs to be transformed back to world space where it can be used as the next direction in the algorithm.

We did not actually need to do this normal space -> world space conversion until now, because the only thing we've been doing so far is generating random directions in the unit hemisphere centered on some surface normal, which can be done directly in world space (thanks to spherical symmetry).

Therefore, it is sensible to abstract this stuff away into two objects, both fairly easy to write:

• a basis object, which will represent the transform from normal space to world space (defined by a normal, a tangent, and a bitangent, for instance as a TBN matrix). Currently we only have a surface normal with no meaningful tangent/bitangent information, so any basis we use at the moment will be unoriented; that is, the orientation it defines about the surface normal will be arbitrary (so anisotropic materials cannot be used until we are able to render polygonal geometry with tangent/bitangent data).
• a direction object, defined in relation to a basis, which represents some direction relative to the normal along with its cosine term (dot product of the direction vector with the normal), and, later on, when we have access to an oriented basis, the orientation angle ?.
We also need to add some more code to the material implementations, in light of the importance sampling algorithms introduced above.

The Basis Object

To construct an unoriented basis out of a single normal vector, one needs to find two vectors such that these two vectors together with the normal are linearly independent in 3D space. Then it's just a matter of orthogonalizing this set of vectors while preserving the normal vector's direction. Remember, the basis is unoriented, so it does not matter where the tangent and bitangent are pointing in our case as long as they form an orthonormal basis.

Finding those two vectors is a bit of a pain. Technically, you only need to find one vector orthogonal to the vector, since the other orthogonal vector needed to complete the basis can be obtained via a cross product of said vector with the normal. Just taking an arbitrary vector does not work reliably, since if a normal happens to be collinear with it you'll get a zero vector in your basis and things will screw up (you might be able to get by using this method and relying on luck, but it still smells). Fortunately, it can be done pretty efficiently.

The trick is as follows. You have your nonzero normal vector (x, y, z). Since it is nonzero, at least one of the components will be nonzero. So, take a look at the x-component. If it is zero, then your normal vector is of the form (0, y, z) with y and z not both zero, and so it is orthogonal to, say, (1, 0, 0) and we are done (note that is already a unit length orthogonal vector).

If, on the other hand, x is not zero, then it can't be collinear with e.g. the vector (0, 1, 0), and so cross((0, 1, 0), normal) is orthogonal to the normal vector, as required (in fact, any vector with a zero x-component will do). It only needs to be normalized to unit length, since that's not guaranteed by the cross product as (0, 1, 0) need not be orthogonal to the normal.

This algorithm is straightforward to implement:
find_orthogonal_unit_vector(normal): if normal.x == 0: return (1, 0, 0) else: return normalize(cross((0, 1, 0), normal))
Since (0, 1, 0) consists of zero and one components, you can expand out and simplify if you feel like it, which gives this explicit version:
find_orthogonal_unit_vector(normal): if normal.x == 0: return (1, 0, 0) else: return (normal.z, 0, -normal.x) / sqrt(normal.x * normal.x + normal.z * normal.z)
Once you have a unit length vector orthogonal to the normal, from the algorithm above (or otherwise), call it U, the last vector in the basis can be taken to be V = cross(U, normal). There is no need to normalize this vector, since U and the normal are orthogonal and unit length. This process therefore gets you a complete orthonormal basis including the normal vector:
tangent = find_orthogonal_unit_vector(normal)bitangent = cross(tangent, normal)normal is left unchangedbasis = {tangent, normal, bitangent}
And you now have an unoriented orthonormal basis for your normal. From basic linear algebra, converting a normal space vector (x, y, z) to world space (or, specifically, whatever coordinate space the normal is defined relative to) is as simple as doing:
(x, y, z) to world space = x * tangent + y * normal + z * bitangent
Which is the same as multiplying the (x, y, z) vector by the TBN matrix formed from the tangent, normal and bitangent vectors, of course. This can be used to cheaply transform normal space vectors, such as directions generated by importance-sampling functions, into world space directions.

The Direction Object

There's not much to say about this one, it's just intended to be a logical container for a (direction vector, cos?) pair relative to some basis, since the cosine term is used all the time, so it may as well be bundled together with the direction vector it belongs to.

The Material Object

With the new importance sampling algorithm it's clear we need to extend the material object to provide more information about itself, so that it can do importance sampling for us and return the reflectance of its pseudo-probability density function in addition to simply evaluating its BRDF. We also need to add an emittance function. A suggested, but by no means mandatory, interface to your materials is as follows (copied from the C# code, should be easily translatable to other languages, "Random" is the standard pseudorandom number generator class):
public interface IMaterial{ /// /// Returns the weight to account for when importance-sampling this material. /// (this is the same as the material's reflectance in a given direction) /// /// /// Formally, this is the (per-component) weight associated with /// an importance-sampled direction obtained using SamplePDF(). /// Vector WeightPDF(Direction outgoing, Basis basis); /// /// Importance-samples this material, returning a (possibly randomly selected) /// incoming direction according to this material's probability density /// function, for the outgoing direction provided. /// /// /// The method signature may be improved later on by replacing the /// "random" parameter with something more appropriate. /// Vector SamplePDF(Direction outgoing, Basis basis, Random random); /// /// Evaluates this material's BRDF for an incoming and outgoing direction. /// Vector BRDF(Direction incoming, Direction outgoing, Basis basis); /// /// Evaluates this material's emittance along some outgoing direction. /// Vector Emittance(Direction outgoing, Basis basis);}
This suggested interface gives you all the functions you need to implement the importance-sampled ray tracing algorithm presented in this entry.

Implementing the Lambertian material with this interface is simple: you should now know what to plug into each, as a function of the material's albedo (and emittance). Implementing the mirror material is also pretty simple, except for the BRDF: what to do with those delta functions? In general, you can assume that the outgoing and incoming directions will not be reflections of each other, and so you can just return zero. The reason is that in general, with the importance sampling algorithm, you only call the BRDF when the outgoing and incoming directions don't really depend on each other (in the ray tracing algorithm, to sample point light sources) so that the probability that both directions are reflections of one another is zero. Otherwise, you will just sample the material's probability density function using the importance sampling algorithm, and there is no need for the BRDF at all. Pretty neat.

The Algorithm

With all these improvements, the ray tracing algorithm now looks a bit like this:
function Radiance(ray): total = (0, 0, 0) if ray intersects with some object: pos = intersection point of ray with object basis = (create basis from surface normal at point pos) outgoing = (create direction from negated ray direction) // include this object's emittance as a light path total += Emittance(outgoing, basis) // include the point light source light paths for each point light source: if the light source is visible: incoming = (create direction from pos to light pos) total += BRDF(incoming, outgoing, basis) * incoming.cos * (light intensity / (distance to light)^2) // select direction of next bounce using importance sampling newDir = SamplePDF(outgoing, basis) // continue or terminate according to russian roulette weight = WeightPDF(outgoing, basis) // as a vector of weights for R, G, B channels maxWeight = max(weight.x, weight.y, weight.z) if (random number between 0 and 1) <= maxWeight: total += (weight / maxWeight) * Radiance(ray starting at pos in direction newDir) else: break return total
The iterative version of the ray tracing algorithm is also of interest. To "de-recursify" the algorithm, you just need to look at how the light path weights are accumulated multiplicatively, as in the light path diagrams at the beginning of the entry. Russian roulette also "stacks" multiplicatively, since if the light path has, say, probability 0.7 (70% chance) of continuing at the first bounce, and probability 0.4 (40% chance) of continuing at the second bounce, then in total it has probability 0.7 * 0.4 = 0.28 (28% chance) of surviving both bounces, according to simple rules of probability (each russian roulette trial is performed with statistically independent random numbers).

So you need only keep a running product of weights, like so:
function Radiance(ray): total = (0, 0, 0) weights = (1, 1, 1) repeat forever: // or, put some large bounce limit here, like 10 or 20 if ray does not intersect with any object: break else: pos = intersection point of ray with object basis = (create basis from surface normal at point pos) outgoing = (create direction from negated ray direction) // include this object's emittance as a light path total += weights * Emittance(outgoing, basis) // include the point light source light paths for each point light source: if the light source is visible: incoming = (create direction from pos to light pos) total += weights * BRDF(incoming, outgoing, basis) * incoming.cos * (light intensity / (distance to light)^2) // apply russian roulette weight = WeightPDF(outgoing, basis) // as a vector of weights for R, G, B channels maxWeight = max(weight.x, weight.y, weight.z) if (random number between 0 and 1) <= maxWeight: weight /= maxWeight // correct probability else: break // terminate light path // update light path weights weights *= weight // select direction of next bounce using importance sampling // and proceed to the next bounce (by just changing "ray") newDir = SamplePDF(outgoing, basis) ray = (new ray starting at pos in direction newDir) return total
The structure of the ray tracing algorithm becomes very explicit at this point: at each bounce, you do some intersection checking/setup work, then you accumulate all the light paths you can form at the intersection point (using the running weight at this bounce), and finally you perform russian roulette to determine whether to continue to the next bounce, importance-sampling the next ray direction.

It is not hard to see that the infinite loop will mathematically eventually terminate if sufficiently many objects have materials with reflectances (WeightPDF) that evaluate to strictly less than 1 - in other words, if your world isn't made exclusively out of perfect mirrors - so that russian roulette has a chance of breaking out of the loop. However, this is an extremely weak guarantee in practice, and you'll find that if you don't put some maximum number of iterations your code will often get stuck in an infinite loop. Forcing a break after a certain number of iterations is technically not "correct" physically speaking, but the radiance contribution of 10- or 20-bounce light paths is generally so tiny that the bias is not noticeable.

There is also a subtle but important improvement you can do to the algorithm: only do russian roulette for light paths longer than, say, 2 or 3 bounces. The reason is that if you start doing russian roulette right at the first bounce, the probabilistic nature of the algorithm hasn't really had time to kick in at that point, so there are very few light paths available to it and they are all important (they are all "root" light paths that all the longer light paths depend on) and pre-emptively terminating them introduces too much variance, making your renders noisier than they should be for equal amounts of computation. A good rule of thumb is to start doing russian roulette only after two bounces have occurred, but you can experiment with different geometry and lighting layouts to see how they affect the effectiveness of russian roulette.

The Result

With all this understood and implemented, you should be able to produce renders such as the one below:

Or, with sufficiently many light paths per pixel, like this one with a sphere as a light source (and no point light source):

Area lights (that is, objects/spheres with a nonzero emittance, like the bright sphere in the second picture) seem to be harder to render, as you will probably notice. Indeed, the light paths can only reach area lights through random chance, unlike the point light sources which get automatically sampled at each bounce, which means it's harder to find light paths that include them, and so you need lots of light paths to smooth it out (meaning lots of noise before converging to a crisp picture). There are solutions, of course. For instance, all the light sources (area or point) could be importance-sampled based on the radiance we receive from them at each bounce, so that we can get light paths that end at an area light as easily as we currently do for point light sources, that could work. We will see what we can do about it later, this is quite enough for this entry. (for now I recommend sticking to point light sources for your tests)

# Parallel Rendering

Finally, and to conclude this entry, let's give ourselves a nice bonus to our rendering speed by distributing the rendering over multiple cores! Ray tracing is an embarrassingly parallel problem, since each pixel is independent of the others and the data structures accessed (geometry, materials, and so on) are read-only. So rendering in parallel is literally as simple as spawning N threads and having them work on separate pixels.

The only thing you may have to watch for is your pseudorandom number generator. These are rarely stateless, so you ideally want to initialize one generator for each worker thread so that they don't step on each other's toes. Possible ways to implement this include an array of generators for the workers to use, or a thread-local object (you shouldn't just use a single generator and synchronize access to it as this will likely cause the threads to wait on one another).

When rendering in parallel it's often best to exploit cache coherency if at all possible (the idea being that two nearby pixels often have similar colors due to the flow of light for both pixels - and by extension the rendering calculations - being very similar) so typically you want each thread to render nearby pixels, instead of having each thread render random pixels all over the place. The most common strategy is to divide the image to be rendered into square or rectangle tiles and assigning one thread to each. But all of this becomes much more important when your ray tracer is doing advanced things like complex geometry data structures, packet tracing, and so on. Basic ray tracers will find it easiest to simply assign lines of pixels of the image to each thread, which still gives pretty much linear scaling for each CPU core.

Technically, cache coherency is destroyed pretty much the instant you select a new direction randomly for the light path, though ray tracers tend to optimize the first bounce (from the camera to the first object intersected) very heavily, because it is always the same for a fixed camera viewpoint so you can precompute a bunch of stuff there. Some efficient ray tracers actually use GPU rasterization hardware for the first bounce to get all the camera ray intersection points "for free"!

With C# the parallelization can be done very easily using the Threading.Tasks.Parallel utilities introduced in .NET 4.0, and the original double loop:
for (int y = 0; y < img.Height; ++y) { for (int x = 0; x < img.Width; ++x) { // render pixel (x, y) }}
simply becomes:
Parallel.For(0, img.Height, (y) => { for (int x = 0; x < img.Width; ++x) { // render pixel (x, y) }});
And that's it, a free 2x-8x performance boost, depending on how many cores your CPU has. Of course, you have to make sure that when you go to write the pixel to wherever it is stored before being saved to a file, that this can be done safely from multiple threads (for different pixels). The simplest approach is to simply store the pixel colors into an in-memory buffer, and only at the end of the rendering, copy it into an image to display or save to a file.

# Conclusion

More advanced rendering techniques aimed both at simplifying the ray tracing algorithm and speeding it up (i.e. less noise for an equal amount of computation) were shown in this journal entry, as well as implementation tips. I am not yet decided on what the next entry will be about; perhaps area light handling, or rendering arbitrary geometry, or maybe a short entry presenting more types of materials. I still need to think about it.

The snapshot of the C# ray tracer as it was after publishing this entry can be found here (commit eab43542).

# Appendix A: Cosine-weighted distribution, geometric derivation

The geometric equivalent of generating a cosine-weighted distribution is to select a point on the unit disk uniformly at random, and then "projecting" that point upwards onto the unit hemisphere. But uniformly picking points on the unit disk can be tricky! Your first thought might be to pick a random angle ? between 0 and 2pi, a random distance D between 0 and 1, and then letting your chosen point be (Dcos?, Dsin?) using trigonometry. That seems reasonable, right? Nope, you'd be wrong. This won't give you points that are equally distributed over the unit disk, it will favor points near the center of the disk. A simulation with 5000 random points selected with the above method produces the following distribution:

Not good. The reason? Consider the following diagram of the unit disk:

The unit disk is subdivided into an inner disk of radius 0.5 and an outer ring also of radius 0.5. As one can see, if you use the method previously mentioned, half of the time you will pick a point inside the green circle (of radius 0.5) and half the time you will pick a point inside the blue ring. But the green circle has area ?0.52 ? 0.785 while the blue ring has area ?(1 - 0.52) ? 2.356, much larger! In general, if you pick a point uniformly on the unit disk, then the probability of it falling inside any given ring, such as the blue ring above, should be proportional to the ring's area, not the ring's radius (which is what this flawed method does).

So in short, with the current method the probability that a point has a distance to the origin less than or equal to D is exactly D, since we're choosing the distance uniformly between 0 and 1, but the correct probability (by looking at areas) should be D2. To compensate for that, we need to choose the distance and then square-root it afterwards, so that sqrt(D)2 = D and the probabilities are now what they should be. It is worth noting this is basically inverse transform sampling in disguise: working it out for yourself rigorously is a good exercise!

A simulation with the new version clearly shows the difference:

Another possible method, that is also correct, is to select a point uniformly inside the unit square - which on the other hand is easy and completely straightforward, choose two numbers x and y at random between 0 and 1 and your point is (x, y), done - until it falls inside the unit disk (appropriately called "rejection sampling"). But that procedure can take a variable amount of time, making it undesirable in this case since we now know of a better solution that works in constant time anyway.

And this is where the mysterious square root comes from. Now, once you have that point (x, z) on the unit disk, projecting it up to the unit hemisphere is easy, since you just need to solve x2 + y2 + z2 = 1 (the equation of the unit [hemi]sphere) for the height y so that y = sqrt(1 - x2 - z2), and if you implement this version you will find that it is completely equivalent to the algebraic one, as desired.

## The Rendering Equation

Please forgive the delay between the previous entry and this one. I didn't mention this before, but I'll try to post these at intervals of two weeks to a month or so; I had already prewritten the first two so they were posted in relatively quick succession, this was exceptional so don't expect weekly entries!

In the last entry the ray tracer was extended to support basic lighting with point light sources and resulting shadows. Today we'll be looking closer at the lighting code, try to understand what the different quantities we calculated represent physically, and start implementing some global illumination features. In particular we will be going over the basic ideas of physically-based ray tracing on which most if not all modern algorithms are based upon on some level. This entry will have some physics in it, whenever appropriate units will be given in bold inside square brackets next to the relevant quantity, e.g. 5 [m/s2].

I had to use images for some formulas in this entry because the journal section does not yet support LaTeX. If any formula images are down please let me know via the comments and I will reupload them. I'll also update them to actual LaTeX if/when the journal section begins supporting it. Also please excuse the somewhat broken formatting in places, the site software doesn't want to cooperate right now.

________________

# Some definitions

Light is energy, specifically electromagnetic radiation. However, light transport (that is, the mechanisms of light reflection, absorption and so on) is usually described in terms of power, that is, energy per second, which is useful in order to be able to factor time into ray tracing calculations. Of course, most ray tracers make the assumption that light travels instantaneously anyway - after all, the speed of light is close enough to infinite speed for most purposes - but this is by no means a requirement. Recall that energy is measured in joules and power is measured in watts, defined as joules per second. Watts are denoted [W].

Technically "watts" don't have any concept of color, so how exactly do these units translate to pretty images? We'll go into that into more detail later, for now it is good enough to just assume that we are working with red, green and blue channels separately.

# What does it mean to "receive" light?

Probably the most confusing aspect of the last entry was that in order to calculate the amount of light received by a point (in our case the intersection point of a camera ray with some sphere) we had to imagine that the point was actually a small piece of the sphere's surface. The reason is pretty simple: points have zero area, and things that have zero area can't "receive" anything.

Of course, this is perfectly fine and gives the correct results, if we assume that that little piece of the sphere's surface has infinitesimal area. If you recall how to calculate the length of a curve in calculus, you divide the curve into smaller and smaller linear segments, add up the total length of all the linear segments, and then make the length of those segments tend to zero (by making the number of segments the curve is divided into tend to infinity). And that gives you the exact length of the curve.

(lamar.edu)

The same principle applies in the 2D case, and we simply make the area of our tiny surfaces tend to zero, so that they are effectively "points" but retain an infinitesimal yet nonzero area, which will be essential for our lighting calculations. And since the area of the surface tends to zero, the surface can be considered to be effectively planar (assuming the surface is continuous, which it really should be anyway) which means it has a unique, well-defined surface normal. You can think of it as dividing the surface of our spheres (or any model really) into infinitesimally small planar surfaces, just like a curve can be divided into infinitesimally small linear segments.

To disambiguate between "real" surfaces and infinitesimal surfaces and avoid confusion, we'll call the infinitesimal surface a "differential surface" (both because its area is a differential/infinitesimal quantity, and because it stands out). For now we can denote the area of such a differential surface as dA and the (unit length) normal vector to that surface can be denoted .

From this idea of surfaces alone we can already try and redefine the notion of "light ray" as an infinitesimally thin beam between two differential surfaces, the key being that it has a nonzero cross-section, as opposed to a plain line. This is useful, but not quite as general as we'd like, because when we talk about a differential surface receiving light, we don't really care from what it's coming from (light is light, after all) but from where. That is, from which direction. So we need one more concept, called the solid angle.

A solid angle is not really an angle in the usual sense, and can be better understood as a "projected area". To illustrate with an example, suppose P is any point in 3D space, and let S be some arbitrary surface. Now take every differential surface on S, and project it (in the geometrical sense) onto the sphere of unit radius centered on P. The total area of the unit sphere covered by the projection of S is called the solid angle of S with respect to P, and has unit steradians [sr] (not metres squared). Since the unit sphere has area 4?, the solid angle of a surface can range from zero to 4? steradians; a surface completely surrounding P would have solid angle 4?.

The steradian is a dimensionless unit, because it is in fact a ratio of area to distance squared (or sometimes a ratio of area to area) as seen above.

In the same way that we can subdivide a surface into lots of small bits of surface, we can subdivide the unit sphere into a bunch of small solid angles, which again in the limit tend to points, and can be parameterized as two angles, uniquely determining a point on the sphere. Consider the diagram below:

(howyoulightit.com)

Here r is 1 since we are on the unit sphere. These small solid angles are called differential solid angles, and are written d?. Now since the differential solid angle in the diagram spans a vertical angle d?, and spans a horizontal angle d?, one might think that its area is just d? d?, but that's not quite right. The height of the dark surface is of course d?, however its width decreases as ? increases, as the sphere's horizontal cross-section decreases with height, and is given by sin(?) d?. So the solid angle d? is equal to sin(?) d? d? (this can be derived easily using some trigonometry).

Importantly, differential solid angles can be used to define a direction, since the two angles uniquely describe a point on the sphere, which uniquely describes a unit vector in 3D space.

So, armed with these definitions, let's look at our Lambertian material that we tried to implement last time, and see what is really happening. First we start with a differential surface that's going to be receiving light (it's located on the sphere at the intersection point of some camera ray). That is, in two dimensions:

Where the differential surface is in bold black, its surface normal the blue arrow, and the hemisphere centered on it as the dotted line. We don't use a full sphere here because for now we assume that light will always either be reflected or absorbed by the surface. Now suppose some light arrives onto our surface from some solid angle:

The incident light is of course given in units of power per surface area per solid angle, that is, watts per metre squared per steradian [W/m2/sr]. Now, in order to be able to measure how much energy to reflect (or absorb) in any particular direction, we need to convert this incident light (which is per solid angle) into some direction-independent quantity, namely power per surface area [W/m2]. We can do this by looking at the cross-section of the incident light beam:

Indeed, the cross-section of the beam is the main thing that matters when it comes to transporting power: only so much energy can pass through the cross-section of the beam in a given unit of time, and that energy is "smeared" over the area of the differential surface. That means that the actual power per surface area that falls onto our differential surface is equal to the power per surface area per solid angle [W/m2/sr] multiplied by the ratio of the cross-section area to the differential surface's area. This ratio of surface areas has unit steradians [sr] (if you think about it we are really back to the notion of projecting the light beam onto the differential surface) and so the resulting quantity indeed has units of power per surface area [W/m2]. As we've seen before, that magic ratio is actually equal to cos(?) where ? is the angle between the surface normal and the light beam, but you can convince yourself of that by using trigonometry on the 2D diagram above.

As an illustration of how important it is to take this cross-section into account, consider a light beam that just grazes the differential surface:

In this case the cross-section area is very small compared to the differential surface's area, and so the actual amount of power per surface area [W/m2] received by the differential surface is rather small, despite the fact that the power per surface area per solid angle [W/m2/sr] of this light beam and of the previous one may well have been equal.

To conclude, if we measure incident light in power per surface area per solid angle (and we will) we need to be aware of the above geometric conversion factor between the incident light and the power per surface area that the differential surface actually receives. Note that all we've discussed above applies to any surface, not just Lambertian materials: it's a purely geometric fact.

We should also give some names to the physical quantities above. First of all, the intensity of the light beam or ray measured in power per surface area per solid angle [W/m2/sr] is called the radiance along that ray. Note it is independent of both surface area and direction, so it is an adequate quantity to represent the amount of power carried by a generic beam of light. The amount of light received by the differential surface, measured in power per surface area [W/m2] from every direction is called the irradiance incident on (or falling onto) the differential surface.

These quantities belong to the field of physics known as radiometry, which is very widely used in computer graphics. Another useful field is photometry which deals more with the perception of color by an observer.

# What does it mean to "reflect" light?

Suppose our differential surface just received some amount of light, say X [W/m2] (in units of power per surface area, as we saw previously). What does the surface do with this energy? For now we will consider two out of a large number of possible interactions: it will either absorb it for good, or will reflect it. Now all that energy probably won't be reflected in a single direction, unless your surface is a mirror; for now, since we are looking at last time's Lambertian material, we'll make the same assumption and assume that energy is reflected equally in all directions.

So for the sake of argument suppose we say that the surface will reflect X [W/m2/sr] in every direction. The problem is that if you actually work through the math, you find that doing this means the surface ends up reflecting a total of ?X [W/m2] over all directions (i.e. over all differential solid angles in the hemisphere). This blatant violation of conservation of energy is a result of the geometry of the hemisphere, and can be fixed by simply multiplying the reflected energy by 1 / ?. For those interested in the math, see (*) at the end of the entry (preferably after reading the rest of it).

Note that we can also make the surface reflect less light (by absorbing some) by throwing in a constant reflection coefficient R between 0 and 1. This coefficient is generally known as albedo. Now suppose we had a function that took two solid angles, namely an incident solid angle ?i and an outgoing solid angle ?o. This function would look at its arguments and asks the question: "this surface received light in [W/m2] from ?i, how much light in [W/m2/sr] should be reflected into ?o?". In the case of the Lambertian material we've just found that this function is constant:

f(?i, ?o) = R / ?

This function is called the bidirectional reflectance function, or BRDF for short, and all it does is mediate the distribution of reflected light from received light, by assigning a reflection ratio (called reflectance) to every possible input and output light direction. That's all it is. Mathematically the BRDF has units [1/sr], because the ratio is of outgoing watts per metre squared per steradian [W/m2/sr] to incident watts per metre squared [W/m2].

# Emitting light

The entirety of this journal entry so far has been dedicated to describing the process of receiving and reflecting light. There's one last detail which is kind of important: how does light actually get emitted from light sources to begin with? A "light source" is really just a surface that happens to emit light (in addition to possibly reflecting light, but generally the amount of light it emits overwhelms the amount it reflects). That means we just need to define an "emission function" that determines how much light is emitted outwards by that surface in any given direction, as a radiance quantity [W/m2/sr].

This is straightforward to implement for area lights, where you might just emit a constant radiance in every direction based on the area light's intensity, at every differential surface on the area light's surface. Spot lights could work by restricting the range of directions in which light is emitted. For point light sources this is somewhat tricky, because point light sources are unphysical and have no surface area. However, if we can define the intensity of the point light source as a quantity I in units of power per surface area [W/m2], and keeping in mind that the point light source emits light in all outward directions, i.e. into 4? steradians, we see that the point light source emits a radiance of I / (4?) [W/m2/sr] in every direction, and by varying the radius of the sphere centered on the point light source we can see that it emits a radiance of I / (4?r^2) [W/m2/sr] at a distance r.

If we redefine point light source intensity to take into account this 1 / (4?) factor then we are back to the familiar I / r^2 expression.

# Putting it all together

With all the theory above we are now ready to give a definition which relates outgoing radiance to incident radiance and emitted radiance at a differential surface: The outgoing radiance in a direction ?o (leaving the differential surface) is equal to the radiance emitted by the differential surface in that direction plus the sum of all incident radiance from every incident direction ?i reflected in that direction according to the BRDF for this differential surface. Mathematically, this gives the following integral formula:

Here the dot product between ?i and is equal to the cosine of the angle between the surface normal and the incident direction (since both vectors are unit length) which as seen previously is the geometric conversion factor between the incident radiance along that direction and the power per surface area received by the differential surface. The symbol ? denotes the set of directions in the hemisphere, as in "integrate over every possible direction d? in the hemisphere".

Breaking this formula down shows how the different notions covered in this journal entry fit together into the picture:

This is a simplified version of the rendering equation, which as can be seen above captures the interaction of light with a given surface through its BRDF by collecting all light contributions from every direction (as well as the emitted light from the surface itself) and weighing how much of it gets reflected in any given outgoing direction. All ray tracers aiming to render realistic objects try to evaluate this expression, which amounts to calculating the radiance along each camera ray, which directly maps to how bright each pixel should be (in both red, green and blue components). There is no simple way to solve it exactly, so numerical techniques must be used.

Note the rendering equation is recursive, for an incident ray to a given surface is just an outgoing ray from another surface, i.e. you can substitute another instance of the rendering equation into each Li(?i) term recursively (this is why it also captures global illumination).

Clearly the most complicated aspect of the rendering equation is the integral, since in theory every direction must be considered to get the right result, and there are infinitely many possible directions. There are three main approaches to evaluating this integral efficiently:

• by trying to estimate which incident directions contribute the most to the integral (those which carry the most light towards the outgoing direction of interest, for instance bright light sources) and spending more time on those compared to the other less relevant directions: this procedure converges towards the exact result as quickly as possible, but is generally quite slow, and being statistical in nature it tends to produce somewhat grainy images by default (examples: most path tracing variants)
• by simply focusing on some high-contribution incident directions and/or making assumptions on the materials involved and ignoring all the rest: this family of methods doesn't technically produce "correct" solutions to the rendering equation but rather approximations or simplifications, but are usually faster and can be used to great effect in realtime applications (examples: radiosity, irradiance caching)
• hybrid techniques based on both ideas above (example: photon mapping)
The ray tracer developed in the previous entry would fall into the second category, as our solution to the rendering equation worked by replacing the integral over all incident directions by a sum over all point light sources in the world. These are obviously the directions that contribute the most to the surface lighting, and indeed we obtained sensible results, but we completely ignored all other directions, which means the ray tracer did not account for light reflecting off one surface and contributing to the lighting of another (indirect lighting).

Enough theory. Let's start using all this to implement global illumination into the ray tracer!

What we should do is straightforward: we need a "radiance" function that takes a ray and returns the radiance along that ray (as an RGB color, i.e. radiance for each color channel). For each pixel we give it the corresponding camera ray. The function may call itself recursively; in practice this is implemented without directly using recursion for performance, but let's stick as closely as possible to the math above for now.

To evaluate the integral in the rendering equation we can use a simple, naive approach: first evaluate the radiance from the direction of the point light source(s), since those will likely be the highest light contributions, and then evaluate the light contribution over the hemisphere by selecting N directions at random and recursively calling the radiance function on each. In pseudocode:
function Radiance(ray): total = 0, 0, 0 // (in rgb) if ray intersects with some object: pos = intersection pos of ray with object normal = surface normal at point pos cosO = -dot(ray direction, surface normal) // negated since the ray points into the surface for each point light source: if the light source is visible: cosI = dot(direction from intersection pos to light pos, normal) total += BRDF(cosI, cosO) * cosI * (light intensity / (distance to light)^2) // note the above is just BRDF * cosI * (radiance from direction of light) // now the global illumination bit repeat N times: dir = random direction in hemisphere about normal vector cosI = dot(dir, normal) total += BRDF(cosI, cosO) * cosI * Radiance(ray starting at intersection point in direction dir) / N // the division by N is because we are approximating an integral through a sum return total
In reality, you want to stop recursing after a while. For the purposes of this entry I coded the ray tracer to only do the "global illumination bit" once per pixel, so that only one bounce of light is simulated, though fixing the number of bounces is not the only way to proceed. In fact, the pseudocode above is generally not how global illumination is implemented, because it's difficult to configure and has complexity exponential in the number of bounces; it is provided to give a more hands-on representation of the rendering equation, and in the next few entries we'll be phasing it out in favour of more efficient code.

You might have noticed that the piece of code above passes the cosines of the angles made by the incident/outgoing directions with the surface normal, while the theoretical definition of the BRDF just takes in the two directions. There are a few reasons for this:

• Many materials are isotropic, meaning they only depend on the angles made by the incident and outgoing directions with the surface normal; the relative orientation of those directions about the surface normal is not relevant; this means there is no "preferred" orientation of the surface about its surface normal that affects how the material reflects light
• Almost all of the time, the cosine of these angles, or some related trigonometric quantity like their sine or tangent, is involved, not the angles themselves. Since we can efficiently compute those cosines through a dot product, it's easier to work with that from the start rather than keep the angle around (in many ways the cosine is actually more fundamental than the angle itself in this context)
The BRDF function needs to implemented for the Lambertian material here. Let's not go crazy with the material class hierarchy just yet, so for now we can stick it into a bare struct. Recall the only parameter of the Lambertian material is an albedo between 0 and 1, for red, green, blue channels; by having different albedos for different channels you can give a color to your object.
public struct Material{ private Vector albedo; public Material(Vector albedo) { this.albedo = albedo; // 0 <= albedo <= 1 (for each channel) } public Vector BRDF(float cosI, float cosO) { return this.albedo / (float)Math.PI; }}We'll also need to arrange our spheres in a configuration that actually shows some global illumination effects in action. The easiest way to do that is to take two spheres, a red one and a white one, put them next to each other, and put a point light source in between them. The white surface should take on a lightly red hue due to the red sphere reflecting light onto the white sphere back into the camera. It's a pretty subtle effect, though with only point light sources and Lambertian materials there's not much else you can see at this point. As we add more and more features into the ray tracer more interesting effects will become available to us.

The result is, as before, quite oversaturated, so the parts of the spheres closest to the light source are washed out white, although the effect is quite visible in the less illuminated areas.

# Conclusion

This particular entry was quite heavy on theory and wasn't very sexy render-wise, and really just lays the groundwork for the remainder of the series. It's important to get an intuitive understanding of these concepts, since they are used quite often in the field. As a bonus this will likely help you read some computer graphics papers if you want to implement other algorithms and so on. Only the most used concepts are defined and explained here, there are more which we may cover in future entries as needed.

I was originally planning to have the next entry introduce more materials and show different methods to evaluating the rendering integral more efficiently, but I think before that I'll write a small entry on tone mapping, since it's an easy way to get more out of your renders and will certainly help with oversaturated images.

The snapshot of the C# ray tracer as it was after publishing this entry can be found here (commit 2d672f53).

________________

(*) The derivation for the Lambertian material "problem" is as follows: suppose we just received X [W/m2] in our surface (from any particular direction; it's already in W/m2, i.e. we've already accounted for the light beam's cross-section). Now the surface reflects X [W/m2/sr] in every direction. Therefore the total energy (in [W/m2]) reflected will be:

Where the integral is over every possible directions of reflected light d?, ? is the angle between that direction and the surface normal, and the cos(?) factor is again to actually measure the reflected energy in [W/m2], not just in [W/m2/sr] (those two units cannot be compared!). As you can see the total energy reflected is X? [W/m2], a factor of ? more than we started with. The solution is to enforce a reflectance coefficient of at most 1 / ? in the Lambertian BRDF, for which energy conservation checks out.

Note that this process of integrating the BRDF as above and then compensating for any geometric constant factor is called "normalization".

In the last entry we set up a simple ray tracer capable of rendering two spheres into an image with solid colors, without any kind of lighting. For this entry we will extend our ray tracer a little bit:

• handle arbitrarily many spheres instead of just hardcoding two of them
• better color representation
• introduce point lights and a simple lighting model
• go over the aspect ratio problem we had in the previous entry

So let's jump into it!

________________

# Towards a better ray-world intersection algorithm

In the last entry we went over how to perform ray-sphere intersection tests. But we didn't really go over how to intersect a ray against two or more spheres, instead just checking every intersection and finding the closest. We're not going to do better than that yet in terms of complexity, but we're going to be improving the code a bit.

The basic idea is to write a function which takes the list of all spheres as well as a ray, and finds the closest one (i.e. the sphere which the ray intersects first) via a for loop that keeps track of the closest sphere so far. The function would then return two things: the distance to the intersection, which we'll need to find out where the intersection occurred for lighting, and the sphere which was intersected, which we'll also need for lighting. In pseudocode, it would look something like this:function intersect(sphere_list, ray): closest_dist = INFINITY closest_sphere = null for sphere in sphere_list: if (ray intersects sphere) and (distance_to_sphere < closest_dist): closest_dist = distance_to_sphere closest_sphere = sphere if closest_sphere == null: // (no intersection) return false else: return true, closest_dist, closest_sphere
Of course, the exact way you return the distance and intersected sphere is arbitrary as long as you are consistent with your choices. For instance, in the C# ray tracer, spheres are structs, so the ray-world intersection function sends back an index into the sphere list if an intersection occurs.

Now for reasons which will become clear when we get to the lighting, we also want this intersection function to be able to return intersections only within a certain range of distances, say minDistance and maxDistance. It's easy to modify the function to take this into account:function intersect(sphere_list, ray, minDistance (= 0), maxDistance (= INFINITY)): closest_dist = maxDistance closest_sphere = null for sphere in sphere_list: if (ray intersects sphere) and (minDistance < distance_to_sphere < closest_dist): closest_dist = distance_to_sphere closest_sphere = sphere if closest_sphere == null: return false else: return true, closest_dist, closest_sphere
With this function ready we can now simply create a list of spheres we'd like to have in our world (or, say, load that list from some kind of file) and the function will always find the correct intersection, if one exists. On that note, I will now share a neat trick: you can approximate a plane by using very large spheres (think of our planet, which is a sphere, though locally it looks pretty flat). This means you can create primitive floors, walls, and ceilings out of spheres, which will be helpful when we get to shadows, because we'll need something for the shadow to fall upon in order to see it.

# Colors

So far we've been directly outputting constant solid colors to the rendered image, so we never manipulated colors directly inside the ray tracer, but with lighting we are going to need to add colors together (for instance, to calculate the contribution of two light sources to the color of a single pixel). Ideally we would have an appropriate structure to store colors in (and we might need to once we get to more advanced types of ray tracing), but for now we can just save ourselves some time and store colors in 3D vectors as R, G, B components, so that floating-point zero or less corresponds to R/G/B 0 and floating-point 1 or greater corresponds to R/G/B 255. It's a bit of a type hack, but it will get the job done until we need something better.

As always, if your language, or the tools you are using, somehow happen to come with a scientific high-precision color type, then it's a good idea to use it.

# Lighting

A lighting model is, essentially, an algorithm used to approximate the flow of light in the world, or at least the flow of light that eventually makes its way to the camera. We can select these particular light rays which do make it back into the camera by ray-tracing starting from the camera, as we discussed in the previous entry, which can save time as we don't need to consider light rays which the camera never registers anyway (and hence never contribute to the rendered image).

The simplest lighting model is called direct lighting. It works as follows: for each camera ray (pixel in the image), find out if it intersects something. If it does, look at that intersection point and calculate the amount of light received from each light source at that particular point, and from that derive how much of that light would get reflected along the camera ray back inside the camera (taking into account the color of whatever object was hit by the camera ray, of course). That's it.

Actually, direct lighting is something a bit more general than that, however this will be covered in a future entry.

Important: when we say "point" here we really mean a tiny patch of the object's surface around the intersection point, this will be important later on when we get to calculating reflection using integrals.

So we have two questions to answer here, which we will go over one at a time.

How to calculate the amount of light received at a point from a particular light source?

This depends in particular on three things: the light source's brightness (intensity), its distance from the point, and its shape. For now we will consider only point light sources, which have no "shape": they are just an infinitesimal point in space that emits light in all directions. Needless to say, they don't exist in the real world, but they are useful because there is only one path the light can take between the point light source and the point considered, which is just the ray between these two points.

Now consider a point light source, and some small surface ("point") at which we want to calculate the amount of received light. We can draw a diagram:

Now suppose we put the surface a lot closer to the light:

Much more light hits the surface. This is the famous "inverse square law", and in fact the light received by the surface is inversely proportional to its distance to the point light source, squared (here it is inversely proportional to the distance, but only because we plotted a 2D cut of the situation: when you consider all light rays in 3D space, you get one more dimension which is where the "squared" comes from).

But now consider one last case, with the surface oriented a bit differently:

No light reaches the surface! This is because the amount of light that reaches the surface is also dependent on the angle the (planar) surface makes with the vector from the point light source to the surface. It can be shown that it is in fact proportional to cos(theta) where theta is the angle between the "surface -> light source" vector and the normal vector to the surface (the vector "perpendicular" to the surface). In the diagram above the two are essentially perpendicular, hence theta is 90 degrees and cos(90 degrees) = 0, so the surface receives zero light.

In other words, if I is the light source's intensity, r is the distance between the light source and the surface, N is the normal vector to the surface, L is the vector from the surface to the point light source, and theta (?) is the angle between N and L, then the amount of light received by the surface from that light source is given by:

Or at least proportional to that (whether it is correct really depends on what units you use for I, but we haven't yet covered the physics to go into that in more detail, so for now as long as it looks good we are okay).

You may be confused by the fact that I only drew a small number of rays originating from the point light source in the diagrams. This has no impact on the results, and is just for convenience: you can draw as many rays as you want (as long as they are equally distributed) and the ratio of rays which hit the surface to total rays will converge to a fixed number.

How to calculate how much of that light is reflected along the camera ray towards the camera?

This very much depends on the material the object is made of. As you probably know, not all objects reflect light in the same way, for instance a matte wall, a mirror, glass, water, etc... for the purposes of this entry we will assume the sphere is made of an "ideal" material which reflects any light it receives equally in all directions. Such a material is called Lambertian.

Ultimately this all boils down to the Lambertian surface reflecting a constant amount of light in every direction for each unit of light it receives, with hidden constant coefficients required to ensure energy conservation and so on. Note that we haven't really defined what "light" is, that's because we'll need a bit of math and physics to properly define all the terms we use and make sense of all quantities and units involved. So for now we'll again go with the "as long as it looks okay" mindset and just assume all light received is reflected back to the camera, which isn't really physically correct but will be enough to demonstrate the lighting algorithm without getting bogged down with details.

Finally, we need to take into account the sphere's color. In the simplest case all you do is multiply the sphere's color by the reflected light and use that, since after all the "color" of an object is whatever wavelengths of light it happens to reflect to our eyes (so a green object reflects mostly green light). In reality whether the sphere's color should be taken into account for a given material depends on the exact physical process that causes the reflection. We will see a few of those physical processes. including diffuse and specular reflection, in the next few entries.

In any case, the code to implement lighting then becomes quite simple. There are a few steps, though. If the camera ray intersects an object, then we need to calculate lighting (otherwise, just make the pixel black, as usual). At this point we want to calculate the intersection point, which is given by:intersectionPoint = cameraRay.origin + cameraRay.direction * intersectionDistance;
Now is a good time to calculate the surface normal at the intersection point, since it doesn't depend on the light sources. For spheres, this amounts to finding the vector between the sphere's center and the intersection point, and normalizing. In the C# ray tracer:public Vector Normal(Point pos){ return Vector.Normalize(pos - center); // "center" is the sphere's center}
At this stage all our objects are one-sided in that light rays cannot penetrate the spheres (they are opaque) so by convention we make normals point outwards of the object (so sphere normals point away from the sphere center). More on that when we get to implementing transparent objects.

Also, normals are always unit length!

Then comes the time to iterate over each point light source to calculate its light contribution to the pixel. We need to calculate the various N, L, theta, r variables to plug into the expression we found for the amount of light received. We already calculated N, which is the surface normal at the intersection point. L can be calculated by taking the vector between the intersection point and the light source's position, saving its length (to find the distance to the light source) and then normalizing it. If you know your linear algebra, you will recall that since N and L are both of unit length, the cosine of the angle between them (aka cos(theta)) is equal to their dot product. The light contribution is then:contribution = sphereColor * dot(N, L) * lightIntensity / distanceToLight^2;
It then suffices to sum up all those light contributions for each light to produce the final color for that pixel, and we're done! Note that here dot(N, L) is assumed to be >= 0, since otherwise that would imply L is pointing back inside the surface, which would mean the light from that light source cannot reach the intersection point anyway...

There is one thing we forgot in the previous section: we assumed that each light source was always visible from the intersection point. This is not always the case, there could be an obstacle between the two. This is a complicated way of saying that we haven't implemented shadows. Fortunately, it is extremely easy to add that in, since all that needs to be done is to use our ray-world intersection function with a ray starting at the intersection point, pointing towards the light source, and checking if we hit anything up to the light source. This is where the minDistance/maxDistance feature is useful: we don't care if there are obstacles beyond the light source, we only care about obstacles between the light source and the point, so we should restrict the search for intersections to that range.

In fact, we don't even need to find the closest obstacle, if there is even one obstacle we know there will be a shadow. So we can further generalize the ray-world intersection function to optionally find any intersection, not just the closest one (and discard the resulting distance). Of course, there is not much gain yet with our tiny 2-sphere worlds, but later on we will develop a framework for this kind of functionality.

After all this work, we can now render nice Lambertian spheres with shadows. In the picture below, there is a (large) green sphere used as a ground plane, and there are two point light sources of different intensity:

It's a start! Note how bright the blue sphere is, this is because our final image's colors have a maximum brightness, so it gets saturated if the pixels are too bright. There is a solution to this, which is called tone mapping, and we will implement that too later on. We won't need it just yet though.

You may encounter a problem while implementing lighting, where it seems to only work for certain pixels. This is almost certainly due to floating-point inaccuracy, where the intersection point actually ends up slightly *inside* the sphere and so is always considered to be shadowed. Possible solutions are to set a small minimum distance to all your intersection tests, or to "nudge" your intersection points so they always end up outside the sphere, though both may (rarely) produce small but noticeable artifacts. This is unfortunately something we all have to deal with, as far as I know the only general purpose solution is to ditch floating-point numbers and switch to fixed-point arithmetic...

# Camera aspect ratio fix

Recall the aspect ratio problem we had in the previous entry. This is because we tried to map a rectangular (non-square) image to a square view plane of dimensions (-1, -1) to (1, 1), making it look stretched. The solution is simple: make the view plane used in the camera projection have the same aspect ratio as the output image! There are two ways to go about this, though. Assume you want to render a widescreen image, where width > height. Then you can either make the view plane wider to match the aspect ratio... or you can reduce the view plane's height instead. Conversely, if you are rendering a portrait image, you can either increase the view plane's height, or reduce its width. Both options make sense, but I recommend always increasing the view plane's dimensions to match the desired aspect ratio, so that you never "see less" just by changing your aspect ratio.

This fix can be implemented as follows:float uScale = 1;float vScale = 1;if (width > height) { // widescreen, want to increase width uScale = (float)width / height;} else if (height > width) { // portrait, want to increase height vScale = (float)height / width;}
And when calculating the (u, v) coordinates for each pixel, multiply by those scales to correct the viewplane's aspect ratio:u *= uScale;v *= vScale;
Before (600x400):

After (600x400, 400x600 respectively):

And there we go, the problem is fixed!

# Conclusion

We finally rendered our first "real" ray-traced image for this series. In the next few entries we will go over all the lighting stuff from a more theoretical point of view, introducing materials and the rendering equation and looking at how the ray tracing algorithm we have worked with so far can be used recursively to implement global illumination. Beware that the next entries may have quite a bit of math (and.. may take a while to write, unless I break them up quite a bit).

The snapshot of the C# ray tracer as it was after publishing this entry can be found here (commit 3b5e377d).

## First Steps

So about a year ago I posted about writing a ray tracing article series. Unfortunately I did not actually have the time to begin working on it last year, and since then it has been gnawing at me, because it's something that I wanted to do and (apparently) people were also looking forward to it. But it's never too late, and so I created this journal. I've decided to focus more on the devlog aspect of the series rather than a series of formal articles, simply because journal entries are easier to write. So this is my suggestion: I'll post these entries as I go, and every now and then after completing major milestones I may write up some kind of article that summarizes the interesting bits (or focuses on one specific part) with a link to the relevant entries in this journal. I've also setup a github repository to track the ray tracer's progress, using git's tagging feature to tag the state of the repository as it is at each journal entry, and I'll link that here at the bottom. That way the source code linked should always be consistent with the contents of the devlog. Anyway, here we go.

To new readers: I am quite familiar with the implementation of ray tracers, therefore the journal entries to follow may be somewhat rhetorical in nature as I start off with naive implementations and then discuss the various issues run into and propose common solutions that have worked for myself and others, rather than immediately presenting optimized code. However I do not know everything, and so will eventually run into problems I do not know the answer to. That is okay, as we are all here to learn, and knowledgeable members are encouraged to suggest better approaches or to correct me in the comments section, while other members are welcome to ask questions.

________________

Expected background for this entry:

• basic linear algebra (vectors, matrices, linear transformations)
• basic knowledge of C#

# Introduction

Let's write a ray tracer! As the name suggests, ray tracing works by tracing rays to follow the path of light as it bounces around in the world in order to calculate lighting. But how does it work exactly? There are fundamentally two parts to ray tracing. One is calculating intersections between light rays and the various objects in the world (i.e. when and how the light rays "hit" the objects), and the other is calculating the interaction of those light rays with the objects they hit, to make them bounce correctly to accurately simulate light. In reality, it's not quite as clear-cut, but this separation will be good enough for quite a while.

# We're going to need some 3D stuff

This isn't a linear algebra course, so I'll just provide the necessary vector/matrix classes without going into too much detail about them. There are no standard classes in C# for them, so I just wrote reference implementations for the purpose of this series. They are simple to use and have no gotchas, but are not particularly optimized - they may be replaced by more performant implementations as the need arises. If you are following this series using a different language which has built-in or standard 3D types, I recommend you use them. We won't need quaternions yet (we may need them later on when we get to more advanced model representations, but we certainly will not need to use them for the ray tracing itself, which is happy with just vectors and orthonormal bases).

Also note that we define a ray (as in a light ray) as a 3D point indicating the ray's origin and a 3D vector of unit length (length 1) indicating the ray's direction. This is quite important, and we'll find out why this is a good convention later on. For now, just keep this in mind and make sure your ray directions have unit length or things will break.

I will be using a left-handed coordinate system with +X going right, +Y going up, and +Z going into the screen throughout this series, unless stated otherwise. You can use whatever coordinate system you'd like, of course, but just keep that in mind when reading the entries or the associated C# ray tracer code.

# The basic constituents of a ray tracer

Most ray tracers start their life as five distinct components:

1. A 3D math library
2. An image raster component
3. A geometry component
4. A camera component
5. The ray tracing logic

We've covered the first one already. The image raster is used simply to hold the ray-traced image once rendered, and contains functionality to display it to the screen or save it to a file. The geometry component will store the set of objects in the world and have functionality to calculate intersections between rays and these objects. The camera component is used to describe the observer, i.e. from which perspective the objects should be rendered from. And finally, the ray tracing logic uses the previous components to calculate the color of each pixel.

The image raster

We're going to keep this component pretty simple to start with. For instance, a simple 2D array of colors would do the trick. In our case, C# has the System.Drawing.Bitmap class which already gets the job done with its SetPixel() method, but overall this is reasonably straightforward to implement. The displaying and saving part depends somewhat on the language you are using: if you are using a low-level language, it's often simpler to save the image to a file and view it manually afterwards; if you don't have any built-in facilities to save images in various formats, the absolute simplest format is the PPM format, which is simply a file containing the following lines:P3 ...
Where each r/g/b value ranges from 0 to maxval (usually, maxval is 255). You can read more at http://paulbourke.net/dataformats/ppm/. Please note that these are extremely old and barebones image formats, their main redeeming feature being that it's stupidly easy to write parsers for them. If you wish, though, you can use something better. In the case of the C# ray tracer, we will simply again leverage the Bitmap class which happens to come with a Save() method featuring a large number of possible image formats, and we'll go with the PNG format.

Most Linux distributions can view PPM images without issues. For Windows, Irfanview (among others) can display them. No idea about OS X, but the GIMP can read them.

You can of course choose to display the rendered image in some kind of window if you wish. It's up to you!

The geometry

Ideally a ray tracer should be able to support a variety of different types of geometry, including triangles, quads, cubes, spheres, procedural objects, and so on... or should just go 100% triangles. However it turns out the simplest object to work with in the beginning is actually the sphere. Why? Because it's a closed object that happens to have a relatively simple ray intersection algorithm, making it an excellent choice for budding ray tracers.

The line-sphere intersection algorithm is derived on Wikipedia in appreciable detail. Because we are considering rays and not lines, however, we need to exclude "intersections" where the line the ray lies on intersects the sphere "behind" the ray's origin, e.g. (in 2D):

It can be seen that a point on the line described by (origin, direction) is on the ray described by (origin, direction) if and only if the distance along the line from the origin is greater than or equal to zero. In other words, in the intersection algorithm given by the Wikipedia page, we should discard all intersections with negative distance. Furthermore there is one more special case to take into account, where the ray's origin is actually inside the sphere. In this case it is easy to see we will get two intersections, one with a positive distance and one with a negative distance.

We can condense this into a very simple intersection test which goes as follows:[code=nocode:0]compute the two intersections as distances along the line from the origin(using the formula from the Wikipedia page, for example)if the first intersection is negative: keep the second oneelse if the second intersection is negative: keep the first oneelse: keep whichever intersection has the smallest distanceif the resulting distance is negative: return no intersectionelse: return intersection (with that distance)
This is how it is currently implemented in the C# ray tracer:public bool Intersect(Ray ray, out float distance){ Vector s = center - ray.Origin; // "center" is the sphere's center float sd = Vector.Dot(s, ray.Direction); float ss = Vector.Dot(s, s); float disc = sd * sd - ss + radius * radius; // "radius" is the sphere's radius if (disc < 0) { distance = 0; return false; } float q = (float)Math.Sqrt(disc); float p1 = sd - q; // distance of intersection 1 float p2 = sd + q; // distance of intersection 2 if (p1 < 0) { distance = p2; } else if (p2 < 0) { distance = p1; } else { distance = Math.Min(p1, p2); } return distance >= 0;}
Here the "sphere" is just a struct containing the sphere's center and radius, of course.

The camera

The camera component, at its core, merely takes in each pixel coordinate (x, y) of the image, and converts it to a "camera ray", which is the light ray which is going to contribute to that particular pixel's color. Observant readers will have noticed that this means we are tracing the light rays "backwards", from the camera to the world, whereas in the real world "light" is emitted from light sources and eventually finds its way into our eyes and cameras. As it turns out, light has a very special property in that it obeys the Helmholtz reciprocity principle, which basically states that the two versions are in general equivalent.

To this end, we probably need our camera to hold at least the observer's position in the world, and the direction he is looking in. Then, in order to calculate those camera rays, we'll need to define some kind of projection for the camera to use to "project" its view of the world onto a two-dimensional image. If you have done some graphics programming, you will be familiar with the orthographic projection, the perspective projection, and maybe the fisheye projection. All those are just different ways to map each pixel to its corresponding camera ray. Also note that in general projections are done by considering normalized pixel coordinates that range from -1 to 1, so that the exact width and height does not matter. I'll call these uv-coordinates in the context of camera projections, and we have the mapping (u, v) = (2 (x / (width - 1)) - 1, 1 - 2 (y / (height - 1))) where 0 <= x <= width - 1 and 0 <= y <= height - 1. Note that the v-coordinate has the form 1 - 2y instead of 2y - 1 because in general the pixel y-coordinate goes from top to bottom whereas our coordinate system goes upwards (if we didn't account for this the image would appear vertically flipped).

The perspective projection is the best-known of all, and can be visualized as follows:

Here each point on the image corresponds to a point on the view plane, which itself is mapped to the unique camera ray originating at the center of projection (the camera position) which goes through it. The dimensions of the view plane (assuming its distance to the camera position remain constant), or equivalently the distance of the view plane from the camera position (assuming its dimensions remain constant) are related to the camera's field of view parameter - more on that later.

An orthographic projection is different, in that there is no center of projection, but all camera rays originate from their corresponding point on the view plane, all parallel to the view direction (hence there is no perspective). A fisheye projection is yet different, and the camera rays are simply projected in a hemisphere (or sphere) around the camera's position, and are mapped to image (u, v) coordinates through e.g. horizontal and vertical angles.

Focusing on the perspective projection for now, it can be implemented in a variety of ways. The generic way is to first assume that the camera position is at the origin (0, 0, 0), that the camera direction points along some reference axis (usually the +z axis), and then calculate the point on the view plane corresponding to each (u, v) point on the image, which will be given by (u, v, view-plane-distance), where view-plane-distance depends on the field of view (looking at the diagram above, the farther away the view plane is with the same size, the closer all the camera rays will be to one another, corresponding to a smaller field of view). The camera ray can then be derived from this, and that ray is then translated/rotated according to the camera's position and view direction. Specifically, if your field of view is equal to FOV (in radians) then the distance of the viewplane from the origin should be equal to 1 / tan(FOV / 2). This can be derived through some trigonometry.

It is possible to view some types of projections as special types of (affine or projective) matrix transformations. This can be helpful, however I have never found much use for those interpretations in the context of ray tracing and find it more flexible to reason in terms of camera rays, for example to implement effects such as depth of field which often have easy descriptions in terms of camera rays.

In the Camera class in the C# ray tracer, the view direction is controlled via a pitch and a yaw angle, which uniquely define the direction vector. This makes it easy to implement things like mouse control (where moving the mouse horizontally corresponds to increasing the yaw angle, and vertically the pitch angle) but isn't great when you already have a specific direction in mind. This is one aspect of the camera class that will likely be improved in the next entry.

Ray tracing logic

Whew, almost there! This component is in reality quite complex and made up of many smaller subcomponents. In our first version, however, it is going to be extremely simple. Indeed, we're going to have it iterate on each pixel in the image, calculate the corresponding camera ray, and intersect that ray against the geometry in the world. If there is no intersection, we'll make that pixel black, if not we'll give it a color based on whichever object it intersected first. So there's no actual lighting going on yet, we're focusing on the geometry part first.

The C# ray tracer here spawns two spheres called A and B at specific positions in the world:Sphere A = new Sphere(new Point(-1, 1, 10), 2);Sphere B = new Sphere(new Point(1, -1, 4), 1);
And a camera located at the origin and pointing towards the +z axis (which correspond to view angles yaw = 0, pitch = 0) with field of view 75 degrees. Note that since the two spheres above are located along the +z axis, this means the camera is looking towards the spheres:var camera = new Camera(new Point(0, 0, 0), 0, 0, (float)(75 * Math.PI / 180));
For each pixel, we calculate its (u, v) coordinates and the corresponding camera ray:float u = 2 * ((float)x / (img.Width - 1)) - 1;float v = 1 - 2 * ((float)y / (img.Height -1));var ray = camera.TraceRay(u, v);
And finally, we intersect that ray against the two spheres A and B. If it intersects A first, make the pixel red. If it intersects B first, make the pixel blue. Otherwise, make it black. As you can see, we are not calculating any lighting yet, which will be for the next entry. This is also the reason why we are not already rendering huge models with thousands of polygons: we need to do an intersection check against every polygon to find the closest one! There are solutions to this, but they are too advanced to go into right now, which is why spheres are helpful for us here.

# Results

Now, after saving the resulting image to a file, this is what we get:

Which is what we wanted. Note that the red sphere (A) appears smaller than the blue sphere (B) despite having a larger radius, that is because it's further away from the camera: perspective at work. Let's check everything works properly by moving the camera a little bit to the left, by giving its position a negative x-coordinate (say -2) without changing its direction:

Good. Note that the visible distance between the two spheres has increased, this is because the blue sphere is closer than the red one, an example of parallax, due to our perspective projection. And what about making the camera point upwards a little bit (by increasing its pitch angle):

As expected (you can play with the parameters to see if the resulting render makes sense). Finally, let's try rendering a non-square image:

Ouch. We will see in the next entry why non-square images end up stretched, and did you notice the visible distortion in some of the (non-stretched) renders above, especially near the edges? We'll discuss that next time as well, and what we should do about it to be able to render nice widescreen images.

# Conclusion

And there we have the skeleton of a ray tracer! It looks like crap (actually, it doesn't look like anything) but the basic ingredients are there; if you've done DirectX/OpenGL work, this is pretty much the "got a white triangle on the screen" equivalent. In the next entry we will be getting some much nicer results, including some simple shading with shadows if we're lucky!

The snapshot of the C# ray tracer as it was after publishing this entry can be found here (commit ef2b92dc).