Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 16 Jun 2001
Offline Last Active Jul 24 2014 06:16 PM

Topics I've Started

Matrix to Quaternion calculation instable

13 July 2014 - 06:55 AM

For animation purpose I move between matrix and quaterions in different places and for this I use the tracing method found on the internet:

const double trace = a11 + a22 + a33 + 1.0;
if( trace > 0.0001 ) {
    const double s = 0.5 / sqrt( trace );
    return decQuaternion( ( a32 - a23 ) * s, ( a13 - a31 ) * s, ( a21 - a12 ) * s, 0.25 / s );
}else if( a11 > a22 && a11 > a33 ){
    const double s = 2.0 * sqrt( 1.0 + a11 - a22 - a33 );
    return decQuaternion( 0.25 * s, ( a12 + a21 ) / s, ( a13 + a31 ) / s, ( a23 - a32 ) / s );
}else if( a22 > a33 ){
    const double s = 2.0 * sqrt( 1.0 + a22 - a11 - a33 );
    return decQuaternion( ( a12 + a21 ) / s, 0.25 * s, ( a23 + a32 ) / s, ( a13 - a31 ) / s );
    const double s = 2.0 * sqrt( 1.0 + a33 - a11 - a22 );
    return decQuaternion( ( a13 + a31 ) / s, ( a23 + a32 ) / s, 0.25 * s, ( a12 - a21 ) / s );

The matrix is in row major order and quaterions are in the (x,y,z,w) format.


If I do for example a small sweep (from [0,175°,0] to [0,185°,0]) across the XZ plane (hence with Y axis fixed to [0,1,0] where I'm using DX coordinate system) around the backwards pole (0,180°,0) I end up with a slight twiching of the camera near the [0,180°,0] point. I tracked it down to the calculated quaterion to be slightly off near the point where you use any other than the first if-case. Augmenting the step value to 0.0001 did help in some cases but not in this one here. I even went all the way up to 0.01 in which case the slight twiching just moved a bit further away from the problem point.


I also do not think the quaterion-to-matrix is the culprit since this code there does not use any if-cases and thus should be stable. Furthermore tweaking the above mentioned value does modify the error behavior so it has to be this code causing troubles. I can at the time being cheat around the problem but I'm looking for a proper solution.


So my question is what other possibility is there to calculate a quaterion from a matrix which is stable? Is there a known problem with the trace based method used here that doesn't work around the backwards point? I'm concerned more about an error free solution than the fastest solution on earth.

Speed up glClearBuffer*

08 August 2013 - 03:45 PM

Today I made a strange observation doing some timing tests with glFinish to check how long it takes the hardware to clear a depth+stencil and a color buffer. From various places we know that depth+stencil has stuff like HiZ, Z-Compression and what not else going on which is supposed to speed up rendering. So depth+stencil should be clearable by simply setting flags of tiles to "cleared" which should be faster than clearing all pixels like in the color buffer case. But the numbers look way different. This is what I got:


Clear depth+stencil buffer (32-bit): 800ys

Clear 1 color buffer (64-bit, RGBA16F): 150ys


As you can see clearing a floating point color buffer is more than 4 times faster than clearing a depth+stencil. So I'm wondering how depth+stencil clearing can be sped up. Any ideas? Here is what I use for the test case:

glDepthMask( GL_TRUE );

glClearBufferfi( GL_DEPTH_STENCIL, 0, 1.0f, 0 );
glClearBufferfv( GL_COLOR, 0, &clearColor[ 0 ] ) );

Timing is over the individual glClearBuffer* calls each accompanied by a glFinish to get the total time required for a full clear. How can it be clearing the depth+stencil buffer is over 4 times slower than clearing a color buffer?

low sample count screen space reflections

30 July 2013 - 06:31 AM

The basic idea behind SSR is clear to me. Although there is little useful information around the basic idea is to ismply march along a ray in either view or screen space. Personally I do it in screen space as I think this is better but that I can't say for sure for the lack of information around.


Whatever the case the common approach seems to be to do a linear stepping along the ray and then doing a bisecting search to refine the result. The bisecting search is clear and depending on the step size is around 5-6 steps for a large screen and a ray running across a large part of the screen. The problematic part is the step size.


I made tests with a step size of 20 (not counting the refinement). In this case for a large screen (1680x1050 as an example) this gives for a moderately long ray bouncing from one side of the screen to the other of lets say 1000 pixel length a step size of 1000/20 = 50 pixels. This is quite large and steps right scross thiner geometry like for example the edges of the boxes in the test-bed I put together attached below (smaller than 1680x1050 as it's from the editor). Furthermore it leads to incorrect sampling as seen on the right side.

Attached File  test1b.jpg   80.01KB   4 downloads


Now I've seen other people claiming they do (on the same large screen or larger) 16 samples only even for long rays running across the screen. 16 Samples is even less than the 20 I used in the test which already misses geometry a large deal. Nobody ever stated though how these under-sampling issues work out with such a low sampling count. In my tests I required 80-100 samples to keep these undersampling issues somewhat at bay (speed is gruesome).


So the question is:


1) how can 16 samples for the linear search possibly work without these undersampling issues?




Another issue is stuff like a chair or table resting on the ground. All rays passing underneath would work with an exhaustive search across the entire ray. With the linear test though the test goes into the bisecting phase at the first step the ray crosses geometry like the table or chair. The bisecting test then finds no solution and thus leaks the env-map through. Some others seem to not be affected by this problem but what happens there? Do they continue steping along the ray if the bisecting fails? This though would increase the sample count beyond 20+6 and kills the worst case. So another question is:


2) with rays passing underneath geometry at the first linear search hit and bisecting fails to return a result, what do you do? continue on the ray with worse worst case sample count or fail out?


3) how to detect these cases properly to fade out? bluring or more intelligent?

how to do "bvec4 & bvec4" in GLSL?

18 July 2013 - 07:36 AM

A simple problem: calculate two limits for 4 points at the same time. Using GLSL this would look like this:

vec4 testPoint, limitA, limitB = ...

bvec4 result = lessThan( testPoint, limitA ) & greaterThan( testPoint, limitB)


Problem is, this is not possible in GLSL it looks like. Compiler says there is no operator for "bvec4 & bvec4". I tried also with && but the same, no operator for "bvec4 && bvec4". I even tried something like multiplication (since using 0 and 1 as false and true it would work with 0*0=0, 0*1=0, 1*0=0 and 1*1=1) but again no operator for "bvec4 * bvec4".


How is one supposed to make a component wise AND operation on two bvec4 in GLSL? Or has GLSL overlooked this very important operation?

glTextureBarrier not working according to specs

16 April 2013 - 02:32 PM

I read recently about glTextureBarrier or rather the extension about it: NV_texture_barrier. The idea behind it is simple. It allows to render to the same texture you read from (bound to a texture unit and used as FBO attachment) under the condition that you render and read from/to different locations in the same texture. Once you want to read again from a written area you need to call glTextureBarrier . So far the specs but the reality doesn't work like this.


Let's say I have a texture 1024x1024. I use now a 512x512 texture on the left side (A) and a 256x256 on the right side (B). Plenty of room between them to not get into troubles. I do now the following:


Step 1: Render from some other source texture to A

Step 2: Bind A to texture unit 1 and leave it as attachment.

Step 3: glTextureBarrier (as we want to read from A)

Step 4: Render from A to B (down-sampling)

Step 5: glTextureBarrier (so B can be read again)

Step 6: Render from B to A (down-sampling to 128x128)

rince and repeat a couple of times.


The problem is the following. Step 1-4 work correctly. Step 6 though fails. The image written is all black not reading the pixels written by step 4. The problem happens on ATI as well as on nVidia so no vendor specific problem or computer specific problem. It looks like glTextureBarrier in step 3 works but not in step 5.


But what is wrong? Why does glTextureBarrier not work as advertized in step 5 but not in step 3 it does? Did I miss something special I have to do to get glTextureBarrier working? Or is glTextureBarrier totally not working as advertized but the binding in step 2 made it look like it works?


Has anybody experience with glTextureBarrier and got it working? It would save me GPU memory if I could get ping-pong working with a single texture this way.