Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

Hyunkel

Member Since 01 Jan 2009
Offline Last Active Apr 20 2013 06:16 PM
-----

Topics I've Started

GPU normal vector generation for high precision planetary terrain

27 September 2012 - 01:37 PM

I generate procedural planets pretty much entirely using compute shaders. (CPU manages a modified quad tree for LOD calculations)
The compute shader outputs vertex data on a per terrain patch basis, which is stored in buffers.
Normal vectors are calculated during this stage by using a sobel operator on the generated position data:

[source lang="cpp"]// Only operate on non-padded threadsif((GroupThreadID.x > 0) && (GroupThreadID.x < PaddedX - 1) && (GroupThreadID.y > 0) && (GroupThreadID.y < PaddedY - 1)){ // Generate normal vectors float3 C = VertexPosition; float3 T = GetSharedPosition(GroupThreadID.x, GroupThreadID.y + 1); float3 TR = GetSharedPosition(GroupThreadID.x + 1, GroupThreadID.y + 1); float3 R = GetSharedPosition(GroupThreadID.x + 1, GroupThreadID.y); float3 BR = GetSharedPosition(GroupThreadID.x + 1, GroupThreadID.y - 1); float3 B = GetSharedPosition(GroupThreadID.x, GroupThreadID.y - 1); float3 BL = GetSharedPosition(GroupThreadID.x - 1, GroupThreadID.y - 1); float3 L = GetSharedPosition(GroupThreadID.x - 1, GroupThreadID.y); float3 TL = GetSharedPosition(GroupThreadID.x - 1, GroupThreadID.y + 1); float3 v1 = normalize((TR + 2.0*R + BR) * 0.25 - C); float3 v2 = normalize((TL + 2.0*T + TR) * 0.25 - C); float3 v3 = normalize((TL + 2.0*L + BL) * 0.25 - C); float3 v4 = normalize((BL + 2.0*B + BR) * 0.25 - C); float3 N1 = cross(v1, v2); float3 N2 = cross(v3, v4); Normal = (N1 + N2) * 0.5; // Write Normal to Shared Memory SharedMemory[GroupIndex].Normal = Normal;}[/source]

This works very well in most situations.
Unfortunately, once I get to very high LOD levels, floating point precision causes quite a few issues.

In order to illustrate the problem I make the compute shader generate a sphere of radius 1.
I then use the following code to display the error rate of the generated normal vectors:
[source lang="cpp"]float3 NormalError = abs(Normal - normalize(PositionWS)) * 10.0;[/source]

LOD 16 - First signs of errors, no visual artifacts
Posted Image


LOD 20 - First visual artifacts. Can be masked with normal mapping or some perlin noise.
Posted Image


LOD 24 (highest lod): Visual artifacts are visible all over the terrain.

Posted Image


At this LOD, vertices are only 0.0000000596 units apart from each other, hence the problem with my current method for generating normal vectors.

I understand that I'm pushing the limits of floating point precision here, and not having that high of a terrain resolution isn't that big of an issue, but I was wondering if anyone had any ideas on how to squeeze out a little more detail?

Cheers,
Hyu

Constant Buffer usage

16 September 2012 - 07:35 AM

I went over some of my code with a colleague yesterday and he was quite surprised by how I manage my constant buffers.
Except for a few rare and very specific situations, I only use 2 "global" constant buffers.

A per-frame buffer, which contains data which only needs to be updated once per frame.
[source lang="cpp"]cbuffer PerFrameCB : register (b0){ float4x4 CameraView : packoffset( c0.x); float4x4 CameraProjection : packoffset( c4.x); float4 CameraPosition : packoffset( c8.x); float4 SunDirection : packoffset( c9.x); float2 ViewportSize : packoffset(c10.x);}[/source]

And another buffer used for everything else.
This buffer is 1kb in size, and is updated whenever new data is needed, which is multiple times per frame.
Both of these buffers are always bound to the registers b0 and b1 for all shader stages.

I've been told that this is the "wrong" way to do it. I'm supposed to split this up into individual constant buffers.

However I don't understand why that is the case.
If I split my current b1 constant buffer into X buffers, not only do I still need to update these buffers, but I'll also need to bind a new constant buffer whenever new data is needed.

I don't see how my method is wrong, but I'm a little paranoid when I hear such claims because I am self-taught.
So I figured it's better to ask than potentially doing something wrong.

Cheers,
Hyu

Structured buffer float compression

05 June 2012 - 04:12 AM

I have a computer shader that generates procedural planetary terrain and stores vertices in a structured buffer which has the following layout.
struct PlanetVertex
{
  float3 Position;
  float3 Normal;
  float Temperature;
  float Humidity;
};

That's 10 floats with 4 bytes per float -> 40 bytes per vertex.
A terrain node or patch contains 33x33 vertices, which is 43560 bytes.
At the highest quality setting, the compute shader will output up to 5000 nodes,
so the buffer needs to be 5000 * 43560 bytes, which is
217800000 bytes or ~207mb.

Due to the way I handle load balancing between rendering and terrain generation, I need to have this buffer in memory twice, so I use
~415mb only for vertex data.
This is okay I guess, since a planet is sort of the primary object, but I want to reduce this buffer size if possible.

For example the normal vector: It doesn't need 32bit precision per channel, 16 would be more than enough.
As for temperature and humidity, they could even fit in an 8bit unorm, but I doubt that's available here.

I found that there are f32tof16 and f16tof32 functions in hlsl, which I assume are what I need, but I cannot quite figure out how they're supposed to work:
http://msdn.microsoft.com/en-us/library/windows/desktop/ff471399(v=vs.85).aspx

It says here that f32to16 returns a uint, but isn't that 32 bit as well?

Cheers,
Hyu

SM5.0 Dynamic Branching Performance

24 April 2012 - 05:57 PM

I'm currently trying to implement some form of triplanar multitexturing for my procedural planet renderer.
I'm storing all of my terrain diffuse textures in a dxt1 compressed texture array.
Somehow I was under the impression that it would be horribly inefficient to do conditional sampling on each texture array slice due to dynamic branching.
I couldn't really figure out a good way to calculate the 2 texture array indices I need to sample from ahead of time though, so I decided to use conditional sampling as a temporary solution until I figure out something more efficient.

And to my surprise it worked brilliantly, which made me realize that dynamic branching does not have the performance issues I thought it would have.
At least not on modern cards.

Here's what I'm currently doing:
float3 CDiffuse = 0.0;
[unroll]
for(int i = 0; i < 5; i++)
{
  if(TIntensity[i] > 0.0)
	CDiffuse += TIntensity[i] * GetTriplanarSample(BlendWeights, coord1, coord2, coord3, i, TADiffuse, TSampler);
}

TIntensity[] are the texture intensities. They are calculated within the same shader.
GetTriplanarSample() gets 3 texture samples and blends them together using triplanar texturing from array slice i.
There are always only 2 TIntensity values higher than 0.0 at any given time.

And to my surprise, this has the same performance as doing:
CDiffuse += 0.5 * GetTriplanarSample(BlendWeights, coord1, coord2, coord3, 0, TADiffuse, TSampler);
CDiffuse += 0.5 * GetTriplanarSample(BlendWeights, coord1, coord2, coord3, 1, TADiffuse, TSampler);

On the other hand if I remove the conditional branch:
float3 CDiffuse = 0.0;
[unroll]
for(int i = 0; i < 5; i++)
{
  //if(TIntensity[i] > 0.0)
	CDiffuse += TIntensity[i] * GetTriplanarSample(BlendWeights, coord1, coord2, coord3, i, TADiffuse, TSampler);
}

Sampling takes about 2ms longer on my gtx580 in this situation, clearly showing that the card is branching with no noticeable performance loss.

Is dynamic branching in shaders really this powerful nowadays, or is this a special case where it performs well?

Cheers,
Hyu

3d noise turbulence functions for terrain generation

10 April 2012 - 08:58 AM

For my masters thesis I am investigating methods to generate entire planets procedurally on the gpu.
I'm currently generating 33x33 size patches in compute shaders, and store height and normal data in a structured buffer.
I then render a single 33x33 grid using hardware instancing, by projecting it to its correct location and sampling the generated data from the structured buffer.
So far this is working very well, and it is easily fast enough to regenerate the entire geometry every frame.

Whilst this is sufficient for the thesis, using only fBm turbulence looks very boring, and not at all like actual terrain.
Unfortunately I'm having a very hard time coming up with decent turbulence functions to produce good results.

My current approach is to first generate a continent/ocean map, using 5 octaves of fBm turbulence, which produces something like this:
Posted Image

This is not bad, I can work with that as a basis.
If I add climate zones to that (equator gets more sand like textures, and poles get snow at lower height levels...).
I should also be able to generate a mountain map using the same technique with only a few octaves of noise.

But I'm at a loss as to how to generate the close up geometry.
I can't figure out how to generate any decent looking mountains at all, which are probably the most important feature.
I've experimented with ridged multifractals a lot, but this is the best I can come up with:
Posted Image
Posted Image


For which I've used the following turbulence function:
float rnoise(float3 p)
{
float n = 1.0f - abs(inoise(p) * 2.0);
return n*n - 0.5;
}
float rmf(float3 p, int octaves, float frequency, float lacunarity, float gain)
{
float sum = 0;
float amp = 1.0;
for(int i = 0; i < octaves; i++)
{
  float n = rnoise(p * frequency);
  sum += n * amp;
  frequency *= lacunarity;
  amp *= gain;
}
return sum;
}

With the parameters
float Height = pow(2.0, rmf(p, 18, 300.0, 1.75, 0.6));

And this looks pretty bad.
The placeholder textures aren't helping either, but that's another issue.
I don't really know where to go from here.
The only turbulence functions I can find are fBm and ridged multifractals.

The exception being Gilliam de Carpentier, who has a few really cool examples of using noise derivatives on his blog:
http://www.decarpent...ural-extensions

Unfortunately I get very different results with the same techniques, which I'm assuming is because he's using 2d noise and I'm using 3d noise.


I don't really know where to go from here.
Clearly I need different turbulence functions, but I don't know where to find, or how to discover new ones.
I appreciate any suggestions on the matter.

Cheers,
Hyu

PARTNERS