Off-line renderer - Be sure to watch in 720p!
[media]
[/media]
I've implemented a system in which you give a set of (point, direction) pairs, which the camera will be at at some point. You also specify a list of times, which indicate the time it takes for the camera to go from one pair to the next. If you now specify the number of frames per second, the system will interpolate all camera positions and directions for all frames.
Using this system, I set the camera position and direction, let it run a given number of samples per pixel, save the resulting image as a .jpeg file, and then move on to the next position and direction. In the end, I use some other program (MonkeyJam) to paste all these images together into a movie file.
The YouTube movie you see above is rendered at 1280x720, with 200 samples per pixel, at 30 frames per second.
Block editor
[media]
[/media]
We can move around, much like a ghost cam in some FPS. You use ASWD to move around, Spacebar and CTRL to ascend and descend, and the mouse to look around. Left mouse-click adds an object, while right mouse-click removes one.
Using other keys on the keyboard, you can choose which color the next object will have, its material type, its albedo (or brightness for a light), and its shape. You can also increase and decrease the brightness of the "skylight".
Octree
My friend did all the work on the octree. Since CUDA doesn't support recursion very well, we had to do all operations on this tree stackless.
Speed
We are happy with the results, though we did expect more speed gain from the switch to CUDA. Since we are not using textures or meshes, all we have in GPU memory is our octree, which is rather small. The time spent on memory access was rather insignificant compared to time spent on performing calculations. This made a lot of memory optimizations, which we learned about in class, not useful.
We think that the minor gains are to blame on the very branching nature of our kernels. Running one instruction on a whole lot of different data is fast, but if the instruction that we're at with the calculations for one pixel is different from another, we won't benefit from this. So whenever one ray hits a different type of material than another ray does, a different piece of code is run to sample the new direction. We think that the path tracer would be a whole lot faster if we could figure out ways to reduce the branching.
Nice! Stackless is definitely the way to go. For materials, this is something I had issues with as well, eventually I determined the best approach may simply be to have a "one size fits all", very generic material, and encode the material parameters for each object in textures. That way the nasty branching is converted into texture fetches, something GPU's can deal with (sort of). The catch with this is that you can't apply material-specific optimizations, which can be a big problem when doing importance sampling, but for a limited set of materials it would work pretty well I think, so it might be worth looking into.