Jump to content
  • Advertisement
Sign in to follow this  
PhilObyte

Accumulate color value along x-axis

This topic is 1061 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey,

in order to compute light intensity for a pixel, I need to sum per-pixel attenuation along a ray to a buffer. The ray is parallel to the x-axis, so I have to sum values along the x-axis:

 

eg. 0112011 would become: 0124456 (number is sum of all previous numbers plus itself)

This is a rather complex operation, depending on length of ray (width of texture) and number of rays (height).

 

I thought a solution would be to draw the texture additive multiple times, shifting it to the right one pixel each time:

 0112011
+ 011201
+  01120
+   0112
+    011
+     01
+      0
_________
 0124456

This works, but is computation expensive as well. Also, there are only 255 values which yields chunky results. I can't encapsulate numbers to 3-digit base 256 colors because additive blending would break this (it just adds per channel obviously).

Please ask if I have not described the problem well. I hope someone can help me!

Thanks, Phil

Edited by PhilObyte

Share this post


Link to post
Share on other sites
Advertisement

Some thoughts:

First off, use more bits to overcome the 255 value limit, e.g. a 16bit float.

Then always try to regard a single pixel, that is:

new_pixel += sum(pixel_to_the_left)

So, is there a limit to how many pixels you should go to the left ?

The simplest way would be to just sum up in a loop:

float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH;i++) {
  // remember to sample center pixel !
  value = tex(x-(i+0.5),y);
}

More advanced is to use the linear filtering ability of the GPU, that is, if you sample a texture between two pixels, you get the linear interpolation of both values:

float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH/2;i++) {
  // you get pixel_i*0.5+(pixel_i+1)*0.5, therefor double value
  value = tex(x-(i*2+1,y) * 2.0;
}

Even more advanced is to calculate mipmaps first, which contains the sum of blocks (4-block,8-blocks,16-blocks) etc and use this in a smart way to speed up sampling of larger ranges.

Share this post


Link to post
Share on other sites

Thanks for your reply! I saw a similar approach utilizing bi-lerps for performant gaussian blur.

Regarding mipmaps: is there a way to use them for non-square blocks? I would like so read blocks of the x-axis only, without values from adjacent lines. This would decrease the number of rays resulting in artifacts.

Share this post


Link to post
Share on other sites


Regarding mipmaps: is there a way to use them for non-square blocks?

You shouldn't use real mipmaps here, just several render targets to downsample the x-axis. In fact you can use only two render targets, e.g.

source texture 1024x1024 and two target buffers of size 1k x 1k

downsample source to A (1024->512)

downsample A to B (512->256)

downsample B to A (256->128, use offset 512)

downsample A to B (128->64 use offset 256)

...

In the final path you need only to bind source,A,B , some clever access strategy and linear filtering to minimize the number of texture fetches.

 

 

 


I would like so read blocks of the x-axis only, without values from adjacent lines.

Center the sample directly on the texel line to minimize influence of other lines.

Edited by Ashaman73

Share this post


Link to post
Share on other sites
You can do it like this (I am assuming the number of things you are adding, N, is a power of 2, for simplicity):
for (int block_size = 1; block_size < N; block_size *= 2) {
  for (int i = 0; i < N; i += block_size) {
    for (int j = 0; j < block_size; ++j)
      x[i + block_size + j] += x[i + block_size - 1];
  }
}

That code makes log2(N) passes through the data and each one of them touches N/2 objects, so it runs in time O(N*log(N)). This is worse than the naive O(N) algorithm, but it parallelizes much better: Everything inside the outer loop is completely parallelizable, so if you had N/2 processors you could do it in log2(N) steps.

So if you are implementing this in a shader, I imagine my method could be much faster than the original.

Share this post


Link to post
Share on other sites

Yeah, this is an incredibly common operation in many GPU-accelerated solutions to ... just about everything. Search for "prefix sum", which is the problem, and also leads to the standard solutions (like Alvaro's).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!