Emulating a HLSL Sample Operation on the CPU

Started by
5 comments, last by BenS1 11 years, 4 months ago
Hi

In my game I'm generating a terrain in realtime on the GPU using an fBm noise function. This works fine.

Now what I need is to be able to workout the height of a given position on the landscape on the CPU side so that I can do things such as position objects on the surface of the terrain. This means I need to port the GPU code over to the CPU.

So far I've come across a couple of interesting things when running the HLSL code in the Visual Studio 2012 debugger and then doing the same for the CPU....

Firstly I can the code through the GPU and for a given pixel the debugger showed:
PosW = x = 1214.034000000, z = -1214.034000000

When I tried putting these same values in the CPU side I found that on the CPU the values were actually:
PosW = x 1214.03406, z = -1214.03406

i.e. the CPU couldn't represent the GPU float values exactly. Is this to be expected? Do they not both conform to the exact IEEE standard for a float? I noticed this in several places.

The second problem, the one I'm struggling with is regarding how to emulate a Sample HLSL function on the CPU.

Here is what I have in HLSL:

const float mipLevel = 0;
float4 n;
n.x = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel).x;
n.y = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel, int2(1,0)).x;
n.z = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel, int2(0,1)).x;
n.w = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel, int2(1,1)).x;

(Where g_NoiseTexture is a 256x256 grayscale texture. I thinkt he Sampler name is self explanatory)

And I've tried to emulate this on the CPU like this:

float nx, ny, nz, nw;
nx = m_NoiseData[(int)(iy) % 256][(int)(ix) % 256] / 256.0f;
ny = m_NoiseData[(int)(iy) % 256][(int)(ix + 1.0f) % 256] / 256.0f;
nz = m_NoiseData[(int)(iy + 1.0f) % 256][(int)(ix) % 256] / 256.0f;
nw = m_NoiseData[(int)(iy + 1.0f) % 256][(int)(ix + 1.0f) % 256] / 256.0f;

(Where m_NoiseData is defined as "unsigned char m_NoiseData[256][256]" and contains the same data as the g_NoiseTexture)

The problem is that I'm getting completely different for n.x vs nx, n.y vs ny etc.

I've even tried to compensate for pixel centres by adding 0.5f to each pixel like this:


float nx, ny, nz, nw;
nx = m_NoiseData[(int)(iy + 0.5f) % 256][(int)(ix + 0.5f) % 256] / 256.0f;
ny = m_NoiseData[(int)(iy + 0.5f) % 256][(int)(ix + 1.5f) % 256] / 256.0f;
nz = m_NoiseData[(int)(iy + 1.5f) % 256][(int)(ix + 0.5f) % 256] / 256.0f;
nw = m_NoiseData[(int)(iy + 1.5f) % 256][(int)(ix + 1.5f) % 256] / 256.0f;


Any ideas?

Any help much appreciated.

Kind Regards
Ben
Advertisement
The "i" coordinate is in texture space, e.g. (0, 0) to (1, 1). You need to multiply your ix and iy by the texture's width and height respectively to reach the correct texel in your 2D array. Otherwise it is correct and the +0.5 is correct as well (for rounding). Be aware that you will need to simulate bilinear or trilinear sampling on the CPU if you use those kinds of samplers which is a pain (here it is a nearest-neighbor point sampler, so there's no need).

nx = m_NoiseData[(int)(iy * 256 + 0.5f) % 256][(int)(ix * 256 + 0.5f) % 256] / 256.0f;

If you are using D3D10 or better, you could use a compute shader to get this kind of information from the GPU without a rewrite (and you can also use it to get height information for other entities like enemies). Not necessarily more efficient, but it spares you the need to port GPU stuff like perlin noise etc.. to the CPU (and it's not like one height query is the bottleneck.. right?)

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Thanks Bacterius,

Of course... a silly mistake on my part, thanks for spotting that for me.

I updated the code and ran through it but I'm still getting very different results. So I worked it out manually and sure enough the CPU side seems to be correct.... now on to debug what's going on with the GPU code.

Regarding using a Compute shader, I could (I'm using DX11) but the overhead of transferring info to and from the GPU seems a bit overkill just for finding the height of a single position on the terrain (e.g. so that a vehicle drives on top of the terrain and not through it).

Thanks again for your help
Ben
Ah ha, I found the problem. It was actually on the CPU side... it appears that simply using a modulus to emulate wrapping is not enough as it doesn't work with negative values. i.e. -6 % 256 gives -6 and not 250. Simple utility function fix.

Now my results are close to the GPU results. The problem is that by the time I scale the results the small error is amplified to about 80 meters in world space. Obviously I don't want my vehicles floating 80m above the terrain, so I'll keep investigating.

Thanks
Ben
I think I've found the problem...

The texture was originally a JPG file. I then converted that to a RAW file for loading on the CPU side and a DDS for loading as a texture for the GPU. It appears that the conversion to DDS changed some of the values slightly.

I guess I should be using one file for both anyway.

Thanks again
Ben
Hmmm, I still a bit stuck.

I've confirmed that my CPU side data and the GPU side texture are identical, yet I'm getting slightly different results.

In fact the GPU side results don't really make sense.

The texture is 256 x 256 pixels and is L8 format (So a single 8bit channel, normalised to the 0..1 range).

If I read position 0, 0 on the CPU I get value 105. I've checked the texture and sure enough the pixel at that position has a value of 105.

If I try the same on the GPU:

n.x = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, float2(0.0f, 0.0f), mipLevel).x;

I get value x = 0.411764700, which when multiplied by 256 = 105.4117632, instead of the expected 105.0.

Now given that its an 8 bit texture and I'm using Point Sampling, the value I get back when multiplied by 256 should always give me an integer (Or very close excepting any small floating point errors). So why am I getting .4117632 over the expect value?

I can't see how this can be possible.

Any ideas?

Thanks
Ben
Ah ha, got it!

I should be dividing by 255 on the CPU side when normalizing to the 0..1 range (Or multiplying by 255 on the GPU side when denormalizing back to 8 bits)... not by 256 as I had been doing!

One of those stupid time consuming silly mistakes!

Thanks for your help
Ben

This topic is closed to new replies.

Advertisement