I spent some time to add some classes to the code to handle multi-threading tasks. This allows to scale performance-hungry code to any number of cores / cpus. Of course, this optimization can only apply to code that is easily parallelizable, for example heavy processing on sequential or independant data.
One area that benefits a lot from multi-threading is the terrain patch generation. A terrain patch is made of NxN vertices, with N = 2^i + 1 ( i is an integer ). The default is 33x33 vertices.
When the quadtree gets subdivided ( you zoom on the terrain ), 4 child terrain patches are created: that's a total of 33x33x4 vertices ( 4356 ) to generate. Each vertex requires various processing, listed here by decreasing order of complexity.
- generating the height value by evaluating many procedural noise functions ( by far the slowest ).
- generating two displacement values for the cliff/overhangs effect
- generating a normal at this vertex.
- generating the vertex attributes such as position, tangent space and texture coordinates.
The first two tasks are the important ones ( as far as performance goes ). The height generation requires, for each vertex, to evaluate 3 or 4 procedural functions ( typically, ridged fbm, a few fractals, terrace effect, maths operations ). Because you need enough details when you descend at ground level, I'm forced to use a high amount of octaves in those procedurals ( the main one, the ridged fbm, is 16 octaves ). Counting the total, I'm probably calling the 3D perlin noise basis 30-40 times per vertex, and that's still excluding all the intermediary maths operations which add some serious overhead.
The displacement for cliffs/overhangs require evaluating two fractals that are 7-8 octaves each, so it's also pretty performance-hungry.
The net result is that on complex planets, when you are moving around and a node gets subdivided in its 4 children, you see a small slowdown. It's very short ( like 20 milliseconds ), but long enough to make the camera motion feel unsmooth. This problem cannot be fixed due to the nature of procedurals ( else I'd go back to diamond-square and have boring planets but okay performance ) but it can be lessened with various tricks, and one of them is this multi-threading.
I have a quad-core Q6600 at home, it's good to know that it'll eventually be used in something :)
Another idea to improve performance is to write SSE-optimized noise generation code, but it's a pretty low-level optimization, so I'd rather do that as a "last resort" solution.
About procedural details
The good thing with using procedural generation is that you are more future-proof than other engines/games.
What I mean is this: just by changing a detail level in the config dialogs, you will be able to increase the quality and details to extreme levels. No computer that exists today has enough memory, or video cards and cpus good enough to make it run on the highest detail levels, simply because there's no hard limit to how far those details can go.
Case in point: at the moment the terrain patches are 33x33 vertices; that's the default for medium to high-end gaming computers, for example AMD FX/Intel Core2Duos on a NV7800, NV8800 or ATIX1800.
If your computer is good enough, you can try to increase the details by 4: terrain patches of 65x65. I tried on my Quad-core machine ( Q6600 with a 8800 GTX ), and as expected it slowed down quite a lot, but it was still real-time ( 40-50 fps ). Compare the screenshots below, and notice how the silhouette of mountains, and the details in lighting / shadows are better in the bottom version.
The 65x65 version almost brings my Q6600+8800 to its knees. But it doesn't stop here. Although I haven't been able to see it myself, I'm guessing the best computers out here, with 4 gigs of ram, extreme quad-core cpus and a 8800 ultra, would be able to run a 129x129 version at low framerates. Of course that wouldn't be good enough to play comfortably, but..
Imagine in a few years. The graphical details will scale to your new computers/video cards.
33x33 terrain patches:
65x65 terrain patches:
Planetary surface types
I've cleaned the code a bit to support multiple planetary types ( I was previously experimenting everything in the same class, commenting / uncommenting pieces of code ). Now I do have some factories for different planet types with different procedural noise functions.
The first one, which I've already presented in depth in the previous updates, are what I described as a "Gaian" planet.
I added a "Desert" planet type, and a "Selenian" planet type. The desert one is based on ridged perlin noise to generate sand dunes. It's extremely boring, so I need to work on it a bit before presenting it ( plus texturing will be important ).
The "Selenian" type is basically a planetary surface based on craters. Lots of them. To implement that, I created a new type of noise which I called "spherical", and which is based on the idea on Voronoi diagrams. A set of spheres are pre-generated, randomly placed in a cube, and when I evaluate the noise at run-time, I calculate the distance between the position and each sphere, and based on the distance, I generate a value that looks like the one of a crater.
It works pretty well, although it's quite slow, but it's no big deal, as there is lots of room to optimize this "spherical" noise.
Why spherical by the way ? Craters are 2D, but the noise basis must be 3D so that you don't get strange seams on the planet ( since all the noise functions are based on the position of the vertex on the planet sphere surface ).
While playing with craters, I found an interesting place, that I didn't know could even exist. It's basically some small craters that got displaced by some extreme cliffs/overhangs. It's hard to see in still pictures, but when you're exploring it really in 3D, it looks surreal. Like alien rock formations.
I love getting surprised by my own creations :)
Motion blur
Last but not least, I spent some time to add a render pipe for motion blur. The basic motion blur technique is rendering the scene to a texture and alpha-blending it with the previous frame, but it's tricky and doesn't look very good ( but it's extremely fast, so I might fall back on that method on computers that have performance problems ).
What I implemented is "true" motion blur: each object gets transformed both by the current frame's transformation matrices and by the old frame's, and a velocity is computed in eye-space. This velocity is stored in a velocity texture. The scene is also rendered into a texture. In post-processing, each pixel samples the velocity texture, and blurs the scene texture N times ( in my current shader, N = 32 ) in the direction of the velocity vector.
The results are pretty good in motion, but show lots of artifacts on screenshots. One of the problem is that no blur is happening outside the silhouette of a moving object, the so-called "occlusion" problem.
In OpenGL shader tricks GDC 2003, Nvidia describes a "solution" to that problem by computing in the shader the dot product between the normal and the motion vector, and using as a vertex either the one of the current frame ( angle > 0 ), either the one of the previous frame ( angle < 0 ). The idea is to extrude the object in the inverse direction of the motion.
I have experimented that idea, but unfortunately it creates more artifacts than it solves.
Project offset apparently uses a similar technique but has found a way to reduce / remove all those artifacts. If anybody has an idea about what they did to solve it, I'm interested.
Because the motion blur requires rendering the scene to a separate velocity texture ( a second pass ), and because there's already 200-300K triangles per frame, there's a serious slowdown ( 50% performance hit ) to enabling true motion blur.
I would recommend you doing this and not leaving it as a last resort. I wrote a fractal SSE-simplex noise function. In the worst case it was slower (for 1 iteration) by a couple of percent, but for every other case (more than 1 iteration) it was faster. Over 4 iterations and it had easily doubled in speed.