Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 16 Jun 2001
Offline Last Active Jun 21 2015 12:49 PM

Topics I've Started

Pixel precise point rendering with GLSL has errors

04 January 2015 - 06:46 PM

This one seems interesting. I need to render XYZ points into a 2D texture where each pixel represents a result. The data comes in as as GL_POINTS and is multiplied by a geometry shader across 6 pixels. Blending is GL_ONE/GL_ONE so summing up results hitting the same pixels. Now the problem is that the result is incorrect. Basically the following happens:


layout( points ) in;
layout( points, max_vertices=6 ) out;

ivec3 tc1U = inPoint % ivec3( pOutputWidth ); // ivec3 inPoint
ivec3 tc1V = inPoint / ivec3( pOutputWidth );


vTC1 = vec2( tc1U.x, tc1V.x ) * pTCTransform.xy + pTCTransform.zw;

// and so forth, 6 times

pTCTransform is (2/outputWidth, 2/outputHeight, -1, -1) so mapping pixel indices in the range (0,0)-(outputWidth,outputHeight) to (-1,-1)-(1,1). In one particular case I have outputSize=(256,37). Some rows have the correct result (compared to do the calculation on CPU) while other rows are incorrect (like 1 row correct, 2 rows incorrect 2 rows correct, and so forth). With some other outputHeight it works correctly with others again not.


Is OpenGL points rendering not pixel-precise? If so how can you do pixel precise rendering (hence render with one primitive to exactly one point at a predefined location (x,y) in pixels)?

What is better? TransformFeedback or OpenCL Kernel?

28 December 2014 - 12:38 PM

Let's say you have some mesh you want to do skinning calculations on the vertices. Basically you have a position data, weight matrices and indices to tell you what matrix to use. You have now two possible path to work with:


1) Use Transform-Feedback. You use a TBO for the weight matrices, an input VBO with the positions and an output VBO where the transformed positions go to


2) Use an OpenCL kernel with two input data arrays one for weight matrices and the other for the positions and an output data array for the transformed position. If possible feed the data directly into a connected OpenGL buffer otherwise do some data transfer somehow.


As I see it (1) has the advantage of sending the data already to the VBO where you want it to be but it requires uploading of weight matrices to the TBO and setting up and running transform-feedback.


For (2) there is the advantage of not having to mess with OpenGL states to do a transform feedback in a safe way and data copy of weight matrices might be faster (not sure about that one, I'm not proficient with OpenCL right now). The disadvantage would be that you need to get the result data back to OpenGL. I've seen extensions to allow feeding the data directly into an OpenGL buffer object so this disadvantage might be nullified?


What do you think, what would be faster in general?

Matrix to Quaternion calculation instable

13 July 2014 - 06:55 AM

For animation purpose I move between matrix and quaterions in different places and for this I use the tracing method found on the internet:

const double trace = a11 + a22 + a33 + 1.0;
if( trace > 0.0001 ) {
    const double s = 0.5 / sqrt( trace );
    return decQuaternion( ( a32 - a23 ) * s, ( a13 - a31 ) * s, ( a21 - a12 ) * s, 0.25 / s );
}else if( a11 > a22 && a11 > a33 ){
    const double s = 2.0 * sqrt( 1.0 + a11 - a22 - a33 );
    return decQuaternion( 0.25 * s, ( a12 + a21 ) / s, ( a13 + a31 ) / s, ( a23 - a32 ) / s );
}else if( a22 > a33 ){
    const double s = 2.0 * sqrt( 1.0 + a22 - a11 - a33 );
    return decQuaternion( ( a12 + a21 ) / s, 0.25 * s, ( a23 + a32 ) / s, ( a13 - a31 ) / s );
    const double s = 2.0 * sqrt( 1.0 + a33 - a11 - a22 );
    return decQuaternion( ( a13 + a31 ) / s, ( a23 + a32 ) / s, 0.25 * s, ( a12 - a21 ) / s );

The matrix is in row major order and quaterions are in the (x,y,z,w) format.


If I do for example a small sweep (from [0,175°,0] to [0,185°,0]) across the XZ plane (hence with Y axis fixed to [0,1,0] where I'm using DX coordinate system) around the backwards pole (0,180°,0) I end up with a slight twiching of the camera near the [0,180°,0] point. I tracked it down to the calculated quaterion to be slightly off near the point where you use any other than the first if-case. Augmenting the step value to 0.0001 did help in some cases but not in this one here. I even went all the way up to 0.01 in which case the slight twiching just moved a bit further away from the problem point.


I also do not think the quaterion-to-matrix is the culprit since this code there does not use any if-cases and thus should be stable. Furthermore tweaking the above mentioned value does modify the error behavior so it has to be this code causing troubles. I can at the time being cheat around the problem but I'm looking for a proper solution.


So my question is what other possibility is there to calculate a quaterion from a matrix which is stable? Is there a known problem with the trace based method used here that doesn't work around the backwards point? I'm concerned more about an error free solution than the fastest solution on earth.

Speed up glClearBuffer*

08 August 2013 - 03:45 PM

Today I made a strange observation doing some timing tests with glFinish to check how long it takes the hardware to clear a depth+stencil and a color buffer. From various places we know that depth+stencil has stuff like HiZ, Z-Compression and what not else going on which is supposed to speed up rendering. So depth+stencil should be clearable by simply setting flags of tiles to "cleared" which should be faster than clearing all pixels like in the color buffer case. But the numbers look way different. This is what I got:


Clear depth+stencil buffer (32-bit): 800ys

Clear 1 color buffer (64-bit, RGBA16F): 150ys


As you can see clearing a floating point color buffer is more than 4 times faster than clearing a depth+stencil. So I'm wondering how depth+stencil clearing can be sped up. Any ideas? Here is what I use for the test case:

glDepthMask( GL_TRUE );

glClearBufferfi( GL_DEPTH_STENCIL, 0, 1.0f, 0 );
glClearBufferfv( GL_COLOR, 0, &clearColor[ 0 ] ) );

Timing is over the individual glClearBuffer* calls each accompanied by a glFinish to get the total time required for a full clear. How can it be clearing the depth+stencil buffer is over 4 times slower than clearing a color buffer?

low sample count screen space reflections

30 July 2013 - 06:31 AM

The basic idea behind SSR is clear to me. Although there is little useful information around the basic idea is to ismply march along a ray in either view or screen space. Personally I do it in screen space as I think this is better but that I can't say for sure for the lack of information around.


Whatever the case the common approach seems to be to do a linear stepping along the ray and then doing a bisecting search to refine the result. The bisecting search is clear and depending on the step size is around 5-6 steps for a large screen and a ray running across a large part of the screen. The problematic part is the step size.


I made tests with a step size of 20 (not counting the refinement). In this case for a large screen (1680x1050 as an example) this gives for a moderately long ray bouncing from one side of the screen to the other of lets say 1000 pixel length a step size of 1000/20 = 50 pixels. This is quite large and steps right scross thiner geometry like for example the edges of the boxes in the test-bed I put together attached below (smaller than 1680x1050 as it's from the editor). Furthermore it leads to incorrect sampling as seen on the right side.

Attached File  test1b.jpg   80.01KB   5 downloads


Now I've seen other people claiming they do (on the same large screen or larger) 16 samples only even for long rays running across the screen. 16 Samples is even less than the 20 I used in the test which already misses geometry a large deal. Nobody ever stated though how these under-sampling issues work out with such a low sampling count. In my tests I required 80-100 samples to keep these undersampling issues somewhat at bay (speed is gruesome).


So the question is:


1) how can 16 samples for the linear search possibly work without these undersampling issues?




Another issue is stuff like a chair or table resting on the ground. All rays passing underneath would work with an exhaustive search across the entire ray. With the linear test though the test goes into the bisecting phase at the first step the ray crosses geometry like the table or chair. The bisecting test then finds no solution and thus leaks the env-map through. Some others seem to not be affected by this problem but what happens there? Do they continue steping along the ray if the bisecting fails? This though would increase the sample count beyond 20+6 and kills the worst case. So another question is:


2) with rays passing underneath geometry at the first linear search hit and bisecting fails to return a result, what do you do? continue on the ray with worse worst case sample count or fail out?


3) how to detect these cases properly to fade out? bluring or more intelligent?