• Create Account

# Emulating a HLSL Sample Operation on the CPU

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

6 replies to this topic

### #1BenS1  Members   -  Reputation: 321

Like
0Likes
Like

Posted 16 December 2012 - 03:01 PM

Hi

In my game I'm generating a terrain in realtime on the GPU using an fBm noise function. This works fine.

Now what I need is to be able to workout the height of a given position on the landscape on the CPU side so that I can do things such as position objects on the surface of the terrain. This means I need to port the GPU code over to the CPU.

So far I've come across a couple of interesting things when running the HLSL code in the Visual Studio 2012 debugger and then doing the same for the CPU....

Firstly I can the code through the GPU and for a given pixel the debugger showed:
PosW = x = 1214.034000000, z = -1214.034000000

When I tried putting these same values in the CPU side I found that on the CPU the values were actually:
PosW = x 1214.03406, z = -1214.03406

i.e. the CPU couldn't represent the GPU float values exactly. Is this to be expected? Do they not both conform to the exact IEEE standard for a float? I noticed this in several places.

The second problem, the one I'm struggling with is regarding how to emulate a Sample HLSL function on the CPU.

Here is what I have in HLSL:
const float mipLevel = 0;
float4 n;
n.x = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel).x;
n.y = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel, int2(1,0)).x;
n.z = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel, int2(0,1)).x;
n.w = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, i, mipLevel, int2(1,1)).x;

(Where g_NoiseTexture is a 256x256 grayscale texture. I thinkt he Sampler name is self explanatory)

And I've tried to emulate this on the CPU like this:
float nx, ny, nz, nw;
nx = m_NoiseData[(int)(iy)  % 256][(int)(ix) % 256] / 256.0f;
ny = m_NoiseData[(int)(iy)  % 256][(int)(ix + 1.0f) % 256] / 256.0f;
nz = m_NoiseData[(int)(iy + 1.0f)  % 256][(int)(ix) % 256] / 256.0f;
nw = m_NoiseData[(int)(iy + 1.0f)  % 256][(int)(ix + 1.0f) % 256] / 256.0f;

(Where m_NoiseData is defined as "unsigned char m_NoiseData[256][256]" and contains the same data as the g_NoiseTexture)

The problem is that I'm getting completely different for n.x vs nx, n.y vs ny etc.

I've even tried to compensate for pixel centres by adding 0.5f to each pixel like this:

float nx, ny, nz, nw;
nx = m_NoiseData[(int)(iy + 0.5f)  % 256][(int)(ix + 0.5f) % 256] / 256.0f;
ny = m_NoiseData[(int)(iy + 0.5f)  % 256][(int)(ix + 1.5f) % 256] / 256.0f;
nz = m_NoiseData[(int)(iy + 1.5f)  % 256][(int)(ix + 0.5f) % 256] / 256.0f;
nw = m_NoiseData[(int)(iy + 1.5f)  % 256][(int)(ix + 1.5f) % 256] / 256.0f;


Any ideas?

Any help much appreciated.

Kind Regards
Ben

### #2Bacterius  Crossbones+   -  Reputation: 8506

Like
2Likes
Like

Posted 16 December 2012 - 04:25 PM

The "i" coordinate is in texture space, e.g. (0, 0) to (1, 1). You need to multiply your ix and iy by the texture's width and height respectively to reach the correct texel in your 2D array. Otherwise it is correct and the +0.5 is correct as well (for rounding). Be aware that you will need to simulate bilinear or trilinear sampling on the CPU if you use those kinds of samplers which is a pain (here it is a nearest-neighbor point sampler, so there's no need).

nx = m_NoiseData[(int)(iy * 256 + 0.5f)  % 256][(int)(ix * 256 + 0.5f) % 256] / 256.0f;

If you are using D3D10 or better, you could use a compute shader to get this kind of information from the GPU without a rewrite (and you can also use it to get height information for other entities like enemies). Not necessarily more efficient, but it spares you the need to port GPU stuff like perlin noise etc.. to the CPU (and it's not like one height query is the bottleneck.. right?)

Edited by Bacterius, 17 December 2012 - 02:17 AM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

- Pessimal Algorithms and Simplexity Analysis

### #3BenS1  Members   -  Reputation: 321

Like
0Likes
Like

Posted 17 December 2012 - 02:22 AM

Thanks Bacterius,

Of course... a silly mistake on my part, thanks for spotting that for me.

I updated the code and ran through it but I'm still getting very different results. So I worked it out manually and sure enough the CPU side seems to be correct.... now on to debug what's going on with the GPU code.

Regarding using a Compute shader, I could (I'm using DX11) but the overhead of transferring info to and from the GPU seems a bit overkill just for finding the height of a single position on the terrain (e.g. so that a vehicle drives on top of the terrain and not through it).

Ben

### #4BenS1  Members   -  Reputation: 321

Like
0Likes
Like

Posted 17 December 2012 - 02:53 AM

Ah ha, I found the problem. It was actually on the CPU side... it appears that simply using a modulus to emulate wrapping is not enough as it doesn't work with negative values. i.e. -6 % 256 gives -6 and not 250. Simple utility function fix.

Now my results are close to the GPU results. The problem is that by the time I scale the results the small error is amplified to about 80 meters in world space. Obviously I don't want my vehicles floating 80m above the terrain, so I'll keep investigating.

Thanks
Ben

### #5BenS1  Members   -  Reputation: 321

Like
0Likes
Like

Posted 17 December 2012 - 03:51 AM

I think I've found the problem...

The texture was originally a JPG file. I then converted that to a RAW file for loading on the CPU side and a DDS for loading as a texture for the GPU. It appears that the conversion to DDS changed some of the values slightly.

I guess I should be using one file for both anyway.

Thanks again
Ben

### #6BenS1  Members   -  Reputation: 321

Like
0Likes
Like

Posted 17 December 2012 - 11:29 AM

Hmmm, I still a bit stuck.

I've confirmed that my CPU side data and the GPU side texture are identical, yet I'm getting slightly different results.

In fact the GPU side results don't really make sense.

The texture is 256 x 256 pixels and is L8 format (So a single 8bit channel, normalised to the 0..1 range).

If I read position 0, 0 on the CPU I get value 105. I've checked the texture and sure enough the pixel at that position has a value of 105.

If I try the same on the GPU:
n.x = g_NoiseTexture.SampleLevel(SamplerRepeatPoint, float2(0.0f, 0.0f), mipLevel).x;

I get value x = 0.411764700, which when multiplied by 256 = 105.4117632, instead of the expected 105.0.

Now given that its an 8 bit texture and I'm using Point Sampling, the value I get back when multiplied by 256 should always give me an integer (Or very close excepting any small floating point errors). So why am I getting .4117632 over the expect value?

I can't see how this can be possible.

Any ideas?

Thanks
Ben

### #7BenS1  Members   -  Reputation: 321

Like
1Likes
Like

Posted 17 December 2012 - 11:51 AM

Ah ha, got it!

I should be dividing by 255 on the CPU side when normalizing to the 0..1 range (Or multiplying by 255 on the GPU side when denormalizing back to 8 bits)... not by 256 as I had been doing!

One of those stupid time consuming silly mistakes!