Sign in to follow this  
sogetsu

Get value from shader?

Recommended Posts

I have a 100*100 texture with format D3DFMT_R32F, now i want to calculate the max value of the datas stored in this texture. How can i do this job in gpu shader and return the max value to cpu for some following work. Thank you very much! The following code is what i have try, but it seems not work! float4 CalMaxValue() : POSITION { half a1; int i, j; float2 coord; for(i = 1; i < W - 1; i++) { for(j = 1; j < H - 1; j++) { coord = float2(1.0f * i / W, 1.0f * j / H); a1 = abs(tex2D(RTextureSampler, coord).x); maxValue = max(a1, maxValue); } } return float4(0, 0, 0, 1); } technique MainLoop_CalMaxValue { pass P0 { VertexShader = compile vs_2_0 CalMaxValue(); PixelShader = NULL; } } The PTextureSampler is a sampler and maxValue is a global float.(W and H are contants) In my d3d9 program i try to render a single vertex to startup this vertex shader, and later i use ID3DXEffect::GetFloat() to get back the max value, but i get always the same value 0.0f (that is the init value)?

Share this post


Link to post
Share on other sites
Your code to calculate the max value seems okay for me, except the abs() call(in line "a1 = abs(tex2D(RTextureSampler, coord).x);"), the values returned with a texture fetch are always between 0 and 1,therefore they are always positive, so you do not need to use abs();

Regarding your problem, the only way how to get a value from a shader is to actually write this value to a texture and then read it back. So, in your case I would move the caluculations to the pixel shader(outputting the max value as the color) and render to a 1x1 texture, reading this pixel back will be fast, hence having a size of 128 bit at most, which is a candy for the bus.

Share this post


Link to post
Share on other sites
1) sampling a texture from inside a vertex shader is not supported on VS2.0. You need at least VS3.0 to be able to do that.


2) values you write to global variables from a vertex shader are only valid for the current vertex - they do not persist. Think of the constants in a vertex shader being read only - a write to a constant is like making a temporary copy of the register for the duration of that vertex - anything you do to it doesn't affect other vertices.

The main reason for this is maintaining parallelism. Modern GPUs process multiple (e.g. 64) vertices in parallel - being able to write to a global constant would require synchronisation and serialisation which would result in lower performance.

So when you read back the value of your global variable, all you get is the original value that it contained.


3) What might work would be to write the maximum value out as part of the vertex, then implement a pixel shader which writes that value out as a colour. You could then set a small (say 1x1) render target texture to recieve this value. When you need to read the value with the CPU, you Lock() the texture and use it as necessary.


4) Expecting to read back the result of some GPU operation with the CPU is likely to cause serialisation/stalls and so lose the benefit of having a GPU unless you're very careful. If the output is a render target texture, then multi-buffering the output would help some.

Share this post


Link to post
Share on other sites
Your way of calculating the maximum value is extremely inefficient. You should downsample multiple times calculating the maximum of each 4x4 or 2x2 block till you generate a 1x1 texture, containing the maximum. Reading back a 1x1 fp32 texture is simple.

Share this post


Link to post
Share on other sites
Quote:
Original post by vNistelrooy
Your way of calculating the maximum value is extremely inefficient. You should downsample multiple times calculating the maximum of each 4x4 or 2x2 block till you generate a 1x1 texture, containing the maximum. Reading back a 1x1 fp32 texture is simple.


Seconded, that's actually a really good point.

Quote:
Original post by vNistelrooy
Quote:
Original post by S1CA
Modern GPUs process multiple (e.g. 64) vertices in parallel


Correct me if I'm wrong, but modern GPUs aren't capable of proccesing even so many pixels at once.


64 sounds right to me, hence the fact that multiple vertices can be processed within a single pipe(for a GPU having 8 vertex pipes, it'll be 8 vertices per pipe , provided that the vertex shaders are really simple(=do not use large amount of registers, dynamic branching and so on))

Share this post


Link to post
Share on other sites
Quote:
Original post by MePHyst0
64 sounds right to me, hence the fact that multiple vertices can be processed within a single pipe(for a GPU having 8 vertex pipes, it'll be 8 vertices per pipe , provided that the vertex shaders are really simple(=do not use large amount of registers, dynamic branching and so on))


We ain't talking about the same clock cycle, are we?

Share this post


Link to post
Share on other sites
Quote:
Original post by vNistelrooy
Your way of calculating the maximum value is extremely inefficient. You should downsample multiple times calculating the maximum of each 4x4 or 2x2 block till you generate a 1x1 texture, containing the maximum. Reading back a 1x1 fp32 texture is simple.


But my constant W and H would take any value besides pow(2, x), it seem hard to do the work by reduce the texture in your way.

And i have changed the codes to below:

float4 IfEnd() : COLOR
{
half a1;
int i, j;
float2 coord;
float maxValue = 0;

for(i = 1; i < W - 1; i++)
{
for(j = 1; j < H - 1; j++)
{
coord = float2(1.0f * i / W, 1.0f * j / H);
a1 = abs(tex2D(RTextureSampler, coord).x);
maxValue = max(a1, maxValue);
}
}

return float4(maxValue, 0, 0, 0);
}

and i use it as a pixel shader to render an 1*1 texture. Finally i get the correct value.
But now i get another problem: when the effect try to compile the .fx it take a long time and about 400M system memory.
Does the compiler try to expand all the for loops while compiling?

Share this post


Link to post
Share on other sites
Quote:
Original post by sogetsu
But my constant W and H would take any value besides pow(2, x), it seem hard to do the work by reduce the texture in your way.

And i have changed the codes to below:

float4 IfEnd() : COLOR
{
half a1;
int i, j;
float2 coord;
float maxValue = 0;

for(i = 1; i < W - 1; i++)
{
for(j = 1; j < H - 1; j++)
{
coord = float2(1.0f * i / W, 1.0f * j / H);
a1 = abs(tex2D(RTextureSampler, coord).x);
maxValue = max(a1, maxValue);
}
}

return float4(maxValue, 0, 0, 0);
}

and i use it as a pixel shader to render an 1*1 texture. Finally i get the correct value.
But now i get another problem: when the effect try to compile the .fx it take a long time and about 400M system memory.
Does the compiler try to expand all the for loops while compiling?


If your ps profile doesn't support dynamic branching than yes, it does.
I still think my way is better, just create the smallest power-of-two texture that is bigger than your source texture and put 0s in the unused space.

Share this post


Link to post
Share on other sites
You can try compiling your .fx file using the command-line fxc.exe and outputting the results to HTML. Off the top of my head specifying /Fc and /Cc will do the job - but you best check the documentation.

Doing this should allow you to see the actual assembly that the compiler generates - thus you can get an idea of whether it is looping or not (etc..)

I'm also curious as to how you got a valid SM2 compile on a 100x100 texture - you've a maximum of 32 texture instructions [smile]

Anyway, I'd highly recommend you use some sort of down-sampling method. You can achieve the same results efficiently (both in code and performance) - its what most people use for retrieving the maximum luminance for an HDR image. Check out the "HDRPipeline" demo in the SDK - the code in there does this.

hth
Jack

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this