woof...after thinking for about 3 hours I finally got it.
The full problem is this the "Step 2: Grid Generation"
So the bad things about that method is that it uses 4 draw calls and then have to clear all the depth-stencil texture after each draw of the 4 draws
Faster Solution, uses less memory too:
create the grid texture with same format as it is in the nvidia document
Don't create the depth-stencil texture
this method will just use one draw()
So now the blending of the grid texture will be configured like this
indexOfBall*ValueAlredyOnThePixel + indexOfBal
set the default value of the pixels to (0.0f,0.0f,0.0f,0.0f)
Each pixel has float4, I will use this format so in my method "indexOfBall" will not be exactly one float when rendered to the pixel, it will be tranformed to this format:
float indexOfBall tranformed to
float4 indexOfBall = (indexOfBall , indexOfBall*2, indexOfBall*3 , indexOfBall*4)
when rendering to a pixel I will have something like this
x = first indexOfBall rendered to the pixel --->example...x = (indexOfBall , indexOfBall*2, indexOfBall*3 , indexOfBall*4)
y = second indexOfBall rendered to the pixel
z = third indexOfBall rendered to the pixel
w = fourth indexOfBall rendered to the pixel
Now I create another grid texture the same size of the original grid but the pixels will have the format of 8 bit floats
then I create a blending for this textue and configure it like this:
ValueAlredyOnThePixel + newValue
I set the default value of this texture to 0.0f
when I render to this texture I render the value 0.0039f that means 1/256, because the pixel format is a float that goes from 0 to 1 and can just have 256 different values
I have now two grid textures I set the two textures at the same time as render targets and render to them as I described
after rendering:
The second grid texture will have the number of indexOfBall rendered to the pixel it represents in the first grid texture
The first grid texture will have 5 different cases of values:
1* no values were rendered to the pixel
2* one value was rendered to the pixel
In this case the value of the second grid texture will be 1 and
the value of the first grid texture will be float4(x , x*2, x*3 , x*4)
to get x we just take first value of the float4
3* two values were rendered to the pixel
In this case the value of the pixel of the second grid texture will be 2 and
the value of the pixel of the first grid texture will be (x*y+y , 2*x*2*y+2*y, 3*x*3*y+3*y , 4*x*4*y+4*y)
to find x and y we solve the equations x*y+y , 2*x*2*y+2*y
4* three values were rendered to the pixel
In this case the value of the pixel of the second grid texture will be 3 and
the value of the pixel of the first grid texture will be ((x*y+y)*z+z , (2*x*2*y+2*y)*2*z+2*z, (3*x*3*y+3*y)*3*z+3*z , (4*x*4*y+4*y)*4*z+4*z)
to find x and y and z we solve the equations (x*y+y)*z+z , (2*x*2*y+2*y)*2*z+2*z, (3*x*3*y+3*y)*3*z+3*z
you can easily predict what to do in case 5*
the equations can be solved by wolframalpha
and that's it
Edit: I am combining 4 numbers and storing them in 32 bits, that's not good