• 11
• 9
• 10
• 9
• 10

# Accumulate color value along x-axis

This topic is 972 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey,

in order to compute light intensity for a pixel, I need to sum per-pixel attenuation along a ray to a buffer. The ray is parallel to the x-axis, so I have to sum values along the x-axis:

eg. 0112011 would become: 0124456 (number is sum of all previous numbers plus itself)

This is a rather complex operation, depending on length of ray (width of texture) and number of rays (height).

I thought a solution would be to draw the texture additive multiple times, shifting it to the right one pixel each time:

 0112011
+ 011201
+  01120
+   0112
+    011
+     01
+      0
_________
0124456


This works, but is computation expensive as well. Also, there are only 255 values which yields chunky results. I can't encapsulate numbers to 3-digit base 256 colors because additive blending would break this (it just adds per channel obviously).

Please ask if I have not described the problem well. I hope someone can help me!

Thanks, Phil

Edited by PhilObyte

##### Share on other sites

Some thoughts:

First off, use more bits to overcome the 255 value limit, e.g. a 16bit float.

Then always try to regard a single pixel, that is:

new_pixel += sum(pixel_to_the_left)


So, is there a limit to how many pixels you should go to the left ?

The simplest way would be to just sum up in a loop:

float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH;i++) {
// remember to sample center pixel !
value = tex(x-(i+0.5),y);
}


More advanced is to use the linear filtering ability of the GPU, that is, if you sample a texture between two pixels, you get the linear interpolation of both values:

float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH/2;i++) {
// you get pixel_i*0.5+(pixel_i+1)*0.5, therefor double value
value = tex(x-(i*2+1,y) * 2.0;
}


Even more advanced is to calculate mipmaps first, which contains the sum of blocks (4-block,8-blocks,16-blocks) etc and use this in a smart way to speed up sampling of larger ranges.

##### Share on other sites

Thanks for your reply! I saw a similar approach utilizing bi-lerps for performant gaussian blur.

Regarding mipmaps: is there a way to use them for non-square blocks? I would like so read blocks of the x-axis only, without values from adjacent lines. This would decrease the number of rays resulting in artifacts.

##### Share on other sites

Regarding mipmaps: is there a way to use them for non-square blocks?

You shouldn't use real mipmaps here, just several render targets to downsample the x-axis. In fact you can use only two render targets, e.g.

source texture 1024x1024 and two target buffers of size 1k x 1k

downsample source to A (1024->512)

downsample A to B (512->256)

downsample B to A (256->128, use offset 512)

downsample A to B (128->64 use offset 256)

...

In the final path you need only to bind source,A,B , some clever access strategy and linear filtering to minimize the number of texture fetches.

I would like so read blocks of the x-axis only, without values from adjacent lines.

Center the sample directly on the texel line to minimize influence of other lines.

Edited by Ashaman73

##### Share on other sites
You can do it like this (I am assuming the number of things you are adding, N, is a power of 2, for simplicity):
for (int block_size = 1; block_size < N; block_size *= 2) {
for (int i = 0; i < N; i += block_size) {
for (int j = 0; j < block_size; ++j)
x[i + block_size + j] += x[i + block_size - 1];
}
}


That code makes log2(N) passes through the data and each one of them touches N/2 objects, so it runs in time O(N*log(N)). This is worse than the naive O(N) algorithm, but it parallelizes much better: Everything inside the outer loop is completely parallelizable, so if you had N/2 processors you could do it in log2(N) steps.

So if you are implementing this in a shader, I imagine my method could be much faster than the original.

##### Share on other sites

Yeah, this is an incredibly common operation in many GPU-accelerated solutions to ... just about everything. Search for "prefix sum", which is the problem, and also leads to the standard solutions (like Alvaro's).