# Accumulate color value along x-axis

This topic is 1203 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey,

in order to compute light intensity for a pixel, I need to sum per-pixel attenuation along a ray to a buffer. The ray is parallel to the x-axis, so I have to sum values along the x-axis:

eg. 0112011 would become: 0124456 (number is sum of all previous numbers plus itself)

This is a rather complex operation, depending on length of ray (width of texture) and number of rays (height).

I thought a solution would be to draw the texture additive multiple times, shifting it to the right one pixel each time:

 0112011
+ 011201
+  01120
+   0112
+    011
+     01
+      0
_________
0124456


This works, but is computation expensive as well. Also, there are only 255 values which yields chunky results. I can't encapsulate numbers to 3-digit base 256 colors because additive blending would break this (it just adds per channel obviously).

Please ask if I have not described the problem well. I hope someone can help me!

Thanks, Phil

Edited by PhilObyte

##### Share on other sites

Some thoughts:

First off, use more bits to overcome the 255 value limit, e.g. a 16bit float.

Then always try to regard a single pixel, that is:

new_pixel += sum(pixel_to_the_left)


So, is there a limit to how many pixels you should go to the left ?

The simplest way would be to just sum up in a loop:

float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH;i++) {
// remember to sample center pixel !
value = tex(x-(i+0.5),y);
}


More advanced is to use the linear filtering ability of the GPU, that is, if you sample a texture between two pixels, you get the linear interpolation of both values:

float value = tex(x,y);
for(i=0;i<MAX_RAY_LENGTH/2;i++) {
// you get pixel_i*0.5+(pixel_i+1)*0.5, therefor double value
value = tex(x-(i*2+1,y) * 2.0;
}


Even more advanced is to calculate mipmaps first, which contains the sum of blocks (4-block,8-blocks,16-blocks) etc and use this in a smart way to speed up sampling of larger ranges.

##### Share on other sites

Thanks for your reply! I saw a similar approach utilizing bi-lerps for performant gaussian blur.

Regarding mipmaps: is there a way to use them for non-square blocks? I would like so read blocks of the x-axis only, without values from adjacent lines. This would decrease the number of rays resulting in artifacts.

##### Share on other sites

Regarding mipmaps: is there a way to use them for non-square blocks?

You shouldn't use real mipmaps here, just several render targets to downsample the x-axis. In fact you can use only two render targets, e.g.

source texture 1024x1024 and two target buffers of size 1k x 1k

downsample source to A (1024->512)

downsample A to B (512->256)

downsample B to A (256->128, use offset 512)

downsample A to B (128->64 use offset 256)

...

In the final path you need only to bind source,A,B , some clever access strategy and linear filtering to minimize the number of texture fetches.

I would like so read blocks of the x-axis only, without values from adjacent lines.

Center the sample directly on the texel line to minimize influence of other lines.

Edited by Ashaman73

##### Share on other sites
You can do it like this (I am assuming the number of things you are adding, N, is a power of 2, for simplicity):
for (int block_size = 1; block_size < N; block_size *= 2) {
for (int i = 0; i < N; i += block_size) {
for (int j = 0; j < block_size; ++j)
x[i + block_size + j] += x[i + block_size - 1];
}
}


That code makes log2(N) passes through the data and each one of them touches N/2 objects, so it runs in time O(N*log(N)). This is worse than the naive O(N) algorithm, but it parallelizes much better: Everything inside the outer loop is completely parallelizable, so if you had N/2 processors you could do it in log2(N) steps.

So if you are implementing this in a shader, I imagine my method could be much faster than the original.

##### Share on other sites

Yeah, this is an incredibly common operation in many GPU-accelerated solutions to ... just about everything. Search for "prefix sum", which is the problem, and also leads to the standard solutions (like Alvaro's).

##### Share on other sites

Gonna try "Fast Summed-Area Table Generation and its Applications" (http://developer.amd.com/wordpress/media/2012/10/Hensley-SAT(EG05).pdf). I hope the cost for swapping render targets is no very high, I will test different sample widths and look what works for me. Thanks for all replys, I'm still very new to HLSL and sometimes don't know how to solve problems with it.

1. 1
2. 2
Rutin
30
3. 3
4. 4
5. 5
khawk
14

• 11
• 11
• 23
• 10
• 9
• ### Forum Statistics

• Total Topics
633647
• Total Posts
3013108
×