If you usually don't understand the technicalities of my journals, move on ! because this one will be particularly tough.
State of the art
Preprocessed textures with detail maps
Textures are generated from real photos, retouched by artists, or pre-processed by another software ( like Terragen ). They are stored in files on disk, usually split into areas, and each area is applied to a part of the terrain. For example, Google Earth, Flight Simulator, etc.. are based on real photos, but other games use artist-made textures.
The size of the world is essentially limited by disk space. Because the resolution isn't good enough for close up views, most games apply some detailed textures. Old games only had one detailed texture ( grayscale ), but more recent ones use a set of colored detailed textures.
If I'm not mistken, this is also the technique used in Far Cry / Crysis.
In this approach, artists create small textures, called tiles, that contain all the possible transitions between various terrain types. Many of them are packed together in a bigger texture, a set / pack, for efficient usage. This is quite fast and requires less artistic work, but it's an old technique.
Its main flaw is its difficulty to be applied on a terrain with level of detail, and the lack of variety, the repetition patterns, although that can be hidden with a high amount of tiles.
Think of Warcraft 3.
With this technique, a set of layers ( grass, rock, snow, etc.. ) are blended together depending on local terrain parameters such as slope / altitude.
Most of the time, the algorithm runs at the vertex level: the cpu computes the parameters, and blending weights are passed to the vertex shader. In the pixel shader, the layers are all sampled and combined based on the weights. For example, with 3 layers (GLSL):
vec4 tex0 = texture2D(grassTex, uv);
vec4 tex1 = texture2D(rockTex, uv);
vec4 tex2 = texture2D(snowTex, uv);
vec4 color = tex0 * weight.x + tex1 * weight.y + tex2 * weight.z
One huge problem with this technique is its cost: it's not too bad for 3 layers, but the higher the number of layers, the slower it becomes. Imagine 10 layers. Now, imagine that you also need to sample the normal maps; you're now sampling 20 textures.
I will come back on this technique since it is the heart of what I've finally chosen, with important changes.
Disk space is minimal. Since pixel shading is dependant on the amount of layers that must be combined, a good way to optimize this technique is to compute the N layers the most important per terrain patch, and only sample those.
The old approach
Back in 2005, my first terrain texturing prototype used a derivative of the splatting algorithm. This is, to date, the one still used in the screenshots of the terrain on the website.
The algorithm allocated a unique texture per terrain patch ( let's say 128x128 ), and used the GPU to render to this texture to generate the texturing per slope and altitude, combining N layers.
At this time, I was using 8 layers, and no bump/normal maps. So all it required to work was 8 texture units and pixel shaders 2.0.
There were some real drawbacks though:
- video memory usage: for 1000 patches, a 256x256x4 per patch consume dup to 256 MB of video memory. At 128x128, only 64, but it started to look more blurry.
- small stalls that depend on how fast the pixel shader is at rendering one texture patch.
- no texture compression is possible, as the textures get computed on the gpu, and the compression is implemented in the drivers on the CPU. Enabling texture compression caused the texture to be downloaded from the gpu to the cpu, compressed, then reuploaded cpu to gpu, which cause insane freezes.
- because there's a unique texture per patch, when the terrain LOD stops at the maximum depth level ( lowest to the ground ), textures of course cannot get more refined, and you get a blurry mess.
- screen resolutions always increase. At that time, 128x128 might have been okay in 800x600, but unacceptable for 1680x1050. 256x256 better, but still blurry. The next step would be to use 512x512, but this would cost 1 GB of video memory.
- of course, some video memory must remain for ship textures, backgrounds, effects, GUI, effects buffers, etc.. So it isn't realistic to use 100% of the video memory just for terrain texturing. 50% would be a better number.
- no bump/normal mapping. This would require another renderable texture per patch, multiplying the video memory cost by 2 again.
If I simply reused this technique today, the results would be:
- let's assume a video card with 512 MB of video memory.
- the budget is 256 MB for terrain texturing
- divided by 2 to have bump mapping, so the real budget is 128 MB
- for 1000 patches, this means the highest resolution the patches could be is 128x128.
- clearly this wouldn't look too good in high resolutions (1280x1024, 1680x1050, etc.. )
In January 2008, when I reimplemented the terrain texturing, I experimented many ideas.
The basic one is texture splatting, pretty much as everybody implements it.
But, unlike everybody, I don't have a 10 Km x 10 Km zone to texture. I have a whole planet.
In my early experiments, I have found that 10 layers is the absolute minimum per planet to recreate believable variety. The quota is quickly reached: for an Earth like planet for example: 2 grass, 2 rocks, 2 snow, 2 sand, 1 mud, 1 forest. 16 layers would be more comfortable. But let's stick to 10 layers for now.
The first problem is that most video cards under the latest generation ( GF8 ) only have 16 texture units. Working in multiple passes is a sure framerate killed. So how to do texturing with 10 layers in 1 pass ?
If you want at least some bump mapping, you need 20 texture units. Then you have additional effects that require TMUs too. For example, at the highest quality, shadow maps use 4 TMUs.
So what could be done ? Make a choice:
- goodbye bump mapping
- goodbye special effects (and particularly shadows)
- goodbye variety (reducing the number of layers to 6, which would just fit the 6*2+4 = 16 TMUs).
- goodbye framerate (going to multi-pass and re-render 300K triangles per frame, and keep in mind that the per-object overhead is higher in I-Novae than in other engines due to it working in camera space for high precision).
Clearly, none of this sounds too good either...
Texture sampling explosion
Another problem with the technique described above is that it quickly leads to an explosion of texture sampling instructions. Let's see:
- 10 layers, requires 10 sampling operations.
- with bump mapping, multiply this by 2 -> 20
- UVs must be adjusted to work at any level, from close ground views up to space orbits. For this, you need to sample once for a given UV, adjust the frequency, sample again, and then interpolate. The global cost is x2, so we're now at -> 40
- finally, to avoid popping of textures, blending must be done between 2 textures too. The global cost is again x2, so we're now at -> 80.
You read this right: for 10 layers, a proper shader would need to sample our 10 textures 80 times !
My first problem was how to keep all those features in a single pass within 16 TMUs: The Quest For Reducing The Number Of Texture Units (tm), and if possible, reduce the number of samples instructions.
I quickly thought it would be possible to trick it, by packing all the texture layers together as a "stack" into a 3D ( volumetric ) texture.
It worked.. kinda. The TMUs count problem disapeared, but new ones appeared: mipmapping. I though that by playing with filters, it would be possible to only mipmap in 2D but not in the stack / Z direction, but the hardware doesn't work that way..
Disabling mipmaps ? Say bye bye to your framerate, especially in high altitudes views, as all your pixels are high frequency and need to access the volume texture in a random order.
Later I realized that this could be properly implemented ( with good mipmapping and performance ) with texture arrays. But this is a GF8+ extension, so this would mean requiring pixel shaders 4.0. Maybe in a few years..
I was starting to run out of ideas when I realized the layers could be packed together into a huge texture. For example, if all texture layers are 512x512, you can pack 16 of them (4x4) in a single 2048x2048 pack.
Mipmapping wasn't obvious anymore, as mipmaps had to be recreated manually by taking care of the packing and adjacent pixels.
Tiling wasn't obvious anymore, as what you actually need is sub-tiling: tiling UVs within a region of a bigger texture. But this could be faked in the shader..
Mipmapping also required special care, as you need to compute the mipmap level yourself in the shader ( rather than letting the hardware do it ), else you get seams when mipmapping between tiles within the pack.
Fortunately, all of those, while tricky, are relatively inexpensive, a few instructions each. And they are not per-layer, but per pack, so finally, you only need to do it once or twice in the complete shader..
The procedural texturing had to be changed, too. Instead of getting blending weights, I had to sample for a texture ID given a slope/altitude. This ID is then used to compute the UVs for the tile within the pack. This is also a fast operation.
In total, to have mipmapping + bump mapping + 10 or more layers + morphing + UVs at all altitudes, the shader only needs 12 texture samples. Much better than the 80s of the previous algorithm.
The downside is that it's no longer a pure texturing bottleneck, as all the tricks require arithmetic operations. While each one is cheap, it quickly sums up and becomes expensive.
The final shader is around 300 instructions, and the texturing part only consumes 3 TMUs:
- one TMU for the diffuse texture pack
- one TMU for the bump texture pack
- one TMU for the lookup table (slope/altitude -> layerID).
If you have followed so far, you would have noticed something else: the LUT gives a single layerID. So only one layer is used per pixel. This leads to the "sharp edges" in terrain features that many people have noticed and criticized in the latest terrain screenshots.
The LUT can of course be used to store a second layerID, but this multiplies the number of samples by 2. Since the shader is already super slow.. I'm living with this "limitation".
The future ?
The terrain shader is by far the most complex and tricky shader I've ever written, but there's still a lot of room for optimizations. I already have ideas to save 20-30 instructions in the shader without too much effort. By optimizing even more, I'm confudent I can get it down to 250 instructions. That's still a lot, so in high resolutions ( and even not-so-high ones ), the game tends to be heavily pixel-shader limited.
This is of course assuming max quality settings. The shaders all have pre-processing directives, so complete features / effects can be disabled to save performance on slower video cards.
I have no plans to continue to work on the terrain shader in the short term. Maybe in 1 or 2 years, I will come back to it, especially as more and more video cards become pixel shaders 4.0+ able, the texture arrays approach would allow to reimplement this shader in a much more efficient way. Combine that + the rise of video cards powers, and it is likely that in 2 years, a texture-arrays shader on a next-gen cards would be 2 or 3 times faster than what I currently have on my GF8.
If people are interested in all those tricks for the shader, I can post some snippets.
In a future dev journal, I will also come back on noise, and how I need to combine 10 octaves per pixel ( so 10 additional texture samples ) in order to make the features more natural. I'll also mention a few words about geometry ( procedural heightmaps generation ), clouds and water.