Accumulate color value along x-axis

Started by
5 comments, last by PhilObyte 8 years, 7 months ago

Hey,

in order to compute light intensity for a pixel, I need to sum per-pixel attenuation along a ray to a buffer. The ray is parallel to the x-axis, so I have to sum values along the x-axis:

eg. 0112011 would become: 0124456 (number is sum of all previous numbers plus itself)

This is a rather complex operation, depending on length of ray (width of texture) and number of rays (height).

I thought a solution would be to draw the texture additive multiple times, shifting it to the right one pixel each time:


 0112011
+ 011201
+  01120
+   0112
+    011
+     01
+      0
_________
 0124456

This works, but is computation expensive as well. Also, there are only 255 values which yields chunky results. I can't encapsulate numbers to 3-digit base 256 colors because additive blending would break this (it just adds per channel obviously).

Please ask if I have not described the problem well. I hope someone can help me!

Thanks, Phil

Advertisement

Some thoughts:

First off, use more bits to overcome the 255 value limit, e.g. a 16bit float.

Then always try to regard a single pixel, that is:


new_pixel += sum(pixel_to_the_left)

So, is there a limit to how many pixels you should go to the left ?

The simplest way would be to just sum up in a loop:


float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH;i++) {
  // remember to sample center pixel !
  value = tex(x-(i+0.5),y);
}

More advanced is to use the linear filtering ability of the GPU, that is, if you sample a texture between two pixels, you get the linear interpolation of both values:


float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH/2;i++) {
  // you get pixel_i*0.5+(pixel_i+1)*0.5, therefor double value
  value = tex(x-(i*2+1,y) * 2.0;
}

Even more advanced is to calculate mipmaps first, which contains the sum of blocks (4-block,8-blocks,16-blocks) etc and use this in a smart way to speed up sampling of larger ranges.

Thanks for your reply! I saw a similar approach utilizing bi-lerps for performant gaussian blur.

Regarding mipmaps: is there a way to use them for non-square blocks? I would like so read blocks of the x-axis only, without values from adjacent lines. This would decrease the number of rays resulting in artifacts.


Regarding mipmaps: is there a way to use them for non-square blocks?

You shouldn't use real mipmaps here, just several render targets to downsample the x-axis. In fact you can use only two render targets, e.g.

source texture 1024x1024 and two target buffers of size 1k x 1k

downsample source to A (1024->512)

downsample A to B (512->256)

downsample B to A (256->128, use offset 512)

downsample A to B (128->64 use offset 256)

...

In the final path you need only to bind source,A,B , some clever access strategy and linear filtering to minimize the number of texture fetches.


I would like so read blocks of the x-axis only, without values from adjacent lines.

Center the sample directly on the texel line to minimize influence of other lines.

You can do it like this (I am assuming the number of things you are adding, N, is a power of 2, for simplicity):
for (int block_size = 1; block_size < N; block_size *= 2) {
  for (int i = 0; i < N; i += block_size) {
    for (int j = 0; j < block_size; ++j)
      x[i + block_size + j] += x[i + block_size - 1];
  }
}

That code makes log2(N) passes through the data and each one of them touches N/2 objects, so it runs in time O(N*log(N)). This is worse than the naive O(N) algorithm, but it parallelizes much better: Everything inside the outer loop is completely parallelizable, so if you had N/2 processors you could do it in log2(N) steps.

So if you are implementing this in a shader, I imagine my method could be much faster than the original.

Yeah, this is an incredibly common operation in many GPU-accelerated solutions to ... just about everything. Search for "prefix sum", which is the problem, and also leads to the standard solutions (like Alvaro's).

Gonna try "Fast Summed-Area Table Generation and its Applications" (http://developer.amd.com/wordpress/media/2012/10/Hensley-SAT(EG05).pdf). I hope the cost for swapping render targets is no very high, I will test different sample widths and look what works for me. Thanks for all replys, I'm still very new to HLSL and sometimes don't know how to solve problems with it.

This topic is closed to new replies.

Advertisement