RPTD

Members
  • Content count

    741
  • Joined

  • Last visited

Community Reputation

340 Neutral

About RPTD

  • Rank
    Advanced Member
  1. It seems to be not a threading issue. Even with synchronous rendering it happens. Something is wrong in Mesa:   ==00:00:22:15.628 4423== 544,154,936 bytes in 13,576,238 blocks are still reachable in loss record 68,955 of 68,955 ==00:00:22:15.628 4423==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==00:00:22:15.628 4423==    by 0x8A2714B: ??? (in /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0) ==00:00:22:15.628 4423==    by 0x8A24ED0: ??? (in /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0) ==00:00:22:15.628 4423==    by 0x8A26616: ??? (in /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0) ==00:00:22:15.628 4423==    by 0x8A26720: xcb_wait_for_reply (in /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0) ==00:00:22:15.628 4423==    by 0x84EE8C2: ??? (in /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0) ==00:00:22:15.628 4423==    by 0x84E994E: ??? (in /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0) ==00:00:22:15.628 4423==    by 0x84E3917: ??? (in /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0) ==00:00:22:15.628 4423==    by 0x84E9E4B: ??? (in /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0) ==00:00:22:15.628 4423==    by 0x84BD3B4: glXMakeContextCurrent (in /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0)   xcb_wait_for_reply shows up in all large scale leaking reports. Mesa bug?
  2. No, one system is a Radeon HD 7970 with Crimson driver while the other is some Radeon 5xxx (not sure right now) with AMDGPU driver. The running away happens on the AMDGPU one.
  3. glXMakeCurrent for the same window has no effect. So this one is not the problem. I went ahead and did a couple of tests.   valgrind on the editor. no leaks detected but memory jolts off in the process monitor. and with jolting off I mean like this: Test System 1: around 10kb per second increase Test System 2: around 100MB(!) per second increase. so it can't be me leaking memory in my program. Valgrind would spot this. But why does the process memory consumption jolt off like this?   I also tested these situations:   for each frame   for each window     glXMakeCurrent()     // no render   for each window     glXSwapBuffers()   this leaks as mentioned above.   for each frame   for each window     glXMakeCurrent()     // no render   // no swapping   no leaking in this case   for each frame   // no glXMakeCurrent()   // no render   for each window     glXSwapBuffers()   no leaking in this case either.   So the leaking happens as soon as glXMakeCurrent is used together with glXSwapBuffers().And interestingly Valgrind does not pick up this nearly 1G of lost memory.   EDIT: I did some more testing and it seems that on the Test System 1 with the slow rising memory consumption I had been a bit quick. Letting the editor sit fully loaded with all rendering going on kept memory consumption in process monitor at the same level across a couple of seconds. System 2 though does jolt off in the hundrets of MB.
  4. One way I test is using the process monitor to see if the overall applicatin memory stays stable or steadily increases. That showed the little run-away of memory due to glXMakeCurrent. Otherwise engine internal objects are ref-counted and stored in a global list if leak-checking is enabled. upon exiting it is checked if no such list contains objects still alive. So I can precisely tell no leaking goes on inside the internal workings. It happens really only if glXMakeCurrent is enabled even if nothing else is going on.
  5. This is a very strange problem I've recently stumbled upon and for which I have no explanation nor remedy. The situation is the following:   Main thread is UI and Logic (toolkit has own Display connection). Render thread is purely OpenGL (own context, own Display connection).   If I have one window rendering everything is fine and no memory leaks happen.   If I have two windows then during each render call each window is made current using glXMakeCurrent, rendered and later on swapped. So basically this each frame:   glXMakeCurrent(display, window1Context...) render window1 glXMakeCurrent(display, window2Context...) render window2 wait for data for next frame render   The interesting thing is that while everything runs fine the application slowly starts leaking memory. Not doing glXMakeCurrent the leaking goes away. Using glXMakeCurrent the leaking starts again. It's differently fast/slow on different computers.   Any idea what could be wrong there? OpenGL runs entirely in the thread. The main thread has no connection to OpenGL at all. It only has the UI toolkit so to speak. Also the render thread has an own Display connection which is thread-safe.   Ideas welcome since I'm out of ideas right now.   EDIT: Note. I tried only enabling glXMakeCurrent without rendering between the calls and the leaking is the same. So rendering is not the culprit.
  6. That seems to work. At last for the errors cases I had it didn't score incorrectly anymore.
  7. This one seems interesting. I need to render XYZ points into a 2D texture where each pixel represents a result. The data comes in as as GL_POINTS and is multiplied by a geometry shader across 6 pixels. Blending is GL_ONE/GL_ONE so summing up results hitting the same pixels. Now the problem is that the result is incorrect. Basically the following happens:   layout( points ) in; layout( points, max_vertices=6 ) out;   ivec3 tc1U = inPoint % ivec3( pOutputWidth ); // ivec3 inPoint ivec3 tc1V = inPoint / ivec3( pOutputWidth );   vTC1 = vec2( tc1U.x, tc1V.x ) * pTCTransform.xy + pTCTransform.zw; // and so forth, 6 times pTCTransform is (2/outputWidth, 2/outputHeight, -1, -1) so mapping pixel indices in the range (0,0)-(outputWidth,outputHeight) to (-1,-1)-(1,1). In one particular case I have outputSize=(256,37). Some rows have the correct result (compared to do the calculation on CPU) while other rows are incorrect (like 1 row correct, 2 rows incorrect 2 rows correct, and so forth). With some other outputHeight it works correctly with others again not.   Is OpenGL points rendering not pixel-precise? If so how can you do pixel precise rendering (hence render with one primitive to exactly one point at a predefined location (x,y) in pixels)?
  8. Interesting points. Just to clarify the update has to be done "intra-frame" so frame-spawning hacks are not possible at all. So the question focuses on direct calculate-then-render scenario. Everything else is too rigid and breaks anything except lab conditions.
  9. Let's say you have some mesh you want to do skinning calculations on the vertices. Basically you have a position data, weight matrices and indices to tell you what matrix to use. You have now two possible path to work with:   1) Use Transform-Feedback. You use a TBO for the weight matrices, an input VBO with the positions and an output VBO where the transformed positions go to   2) Use an OpenCL kernel with two input data arrays one for weight matrices and the other for the positions and an output data array for the transformed position. If possible feed the data directly into a connected OpenGL buffer otherwise do some data transfer somehow.   As I see it (1) has the advantage of sending the data already to the VBO where you want it to be but it requires uploading of weight matrices to the TBO and setting up and running transform-feedback.   For (2) there is the advantage of not having to mess with OpenGL states to do a transform feedback in a safe way and data copy of weight matrices might be faster (not sure about that one, I'm not proficient with OpenCL right now). The disadvantage would be that you need to get the result data back to OpenGL. I've seen extensions to allow feeding the data directly into an OpenGL buffer object so this disadvantage might be nullified?   What do you think, what would be faster in general?
  10. Sorry for not replying sooner but the forum failed to send me a notification about you having posted so I assumed nobody answered until I decided to still check back in case the forum soft is broken (as it is).   So to your post... I could be wrong but is this not the same as my version just written in a non-un-rolled version? Maybe I'm missing something but it looks similar to me.  
  11. For animation purpose I move between matrix and quaterions in different places and for this I use the tracing method found on the internet: const double trace = a11 + a22 + a33 + 1.0; if( trace > 0.0001 ) {     const double s = 0.5 / sqrt( trace );     return decQuaternion( ( a32 - a23 ) * s, ( a13 - a31 ) * s, ( a21 - a12 ) * s, 0.25 / s ); }else if( a11 > a22 && a11 > a33 ){     const double s = 2.0 * sqrt( 1.0 + a11 - a22 - a33 );     return decQuaternion( 0.25 * s, ( a12 + a21 ) / s, ( a13 + a31 ) / s, ( a23 - a32 ) / s ); }else if( a22 > a33 ){     const double s = 2.0 * sqrt( 1.0 + a22 - a11 - a33 );     return decQuaternion( ( a12 + a21 ) / s, 0.25 * s, ( a23 + a32 ) / s, ( a13 - a31 ) / s ); }else{     const double s = 2.0 * sqrt( 1.0 + a33 - a11 - a22 );     return decQuaternion( ( a13 + a31 ) / s, ( a23 + a32 ) / s, 0.25 * s, ( a12 - a21 ) / s ); } The matrix is in row major order and quaterions are in the (x,y,z,w) format.   If I do for example a small sweep (from [0,175°,0] to [0,185°,0]) across the XZ plane (hence with Y axis fixed to [0,1,0] where I'm using DX coordinate system) around the backwards pole (0,180°,0) I end up with a slight twiching of the camera near the [0,180°,0] point. I tracked it down to the calculated quaterion to be slightly off near the point where you use any other than the first if-case. Augmenting the step value to 0.0001 did help in some cases but not in this one here. I even went all the way up to 0.01 in which case the slight twiching just moved a bit further away from the problem point.   I also do not think the quaterion-to-matrix is the culprit since this code there does not use any if-cases and thus should be stable. Furthermore tweaking the above mentioned value does modify the error behavior so it has to be this code causing troubles. I can at the time being cheat around the problem but I'm looking for a proper solution.   So my question is what other possibility is there to calculate a quaterion from a matrix which is stable? Is there a known problem with the trace based method used here that doesn't work around the backwards point? I'm concerned more about an error free solution than the fastest solution on earth.
  12. Today I made a strange observation doing some timing tests with glFinish to check how long it takes the hardware to clear a depth+stencil and a color buffer. From various places we know that depth+stencil has stuff like HiZ, Z-Compression and what not else going on which is supposed to speed up rendering. So depth+stencil should be clearable by simply setting flags of tiles to "cleared" which should be faster than clearing all pixels like in the color buffer case. But the numbers look way different. This is what I got:   Clear depth+stencil buffer (32-bit): 800ys Clear 1 color buffer (64-bit, RGBA16F): 150ys   As you can see clearing a floating point color buffer is more than 4 times faster than clearing a depth+stencil. So I'm wondering how depth+stencil clearing can be sped up. Any ideas? Here is what I use for the test case: glColorMask( GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE ); glDepthMask( GL_TRUE ); glClearBufferfi( GL_DEPTH_STENCIL, 0, 1.0f, 0 ); glClearBufferfv( GL_COLOR, 0, &clearColor[ 0 ] ) ); Timing is over the individual glClearBuffer* calls each accompanied by a glFinish to get the total time required for a full clear. How can it be clearing the depth+stencil buffer is over 4 times slower than clearing a color buffer?
  13. I used vs pos reconstruction too. It is supposed to be the same quality as a full-blown position buffer. This is why it is called reconstruction   FYI: I've just tried Call of Juarez: Gunslinger, and they use SSR, and it is much worse than your or my version...     The problem is the lack of precision. Depth is calculated using a perspective division and most pixels on screen are not close to the camera with their depth value somewhere above 0.9 quickly approaching 1. The range of z-values mapping to the same pixel gets large quickly. With your reconstruction you obtain a sort of middle z-value midst in the range. Comparing this with the test ray doesn't do precision any good. So in most of the range of pixels in screen the depth difference is small although the z-difference is huge. Combine this now with stepping (reflectionDir / stepCount) and pair this up with 32-bit floats in shaders and their 6-7 digits of precision and you are up to a precision problem.
  14. I did now some experimenting by combining the broad-phase stepping from my screen-space with the narrow-stepping from the view-space. The rest is better in the narrow-phase but still not as clean as in the screen-space. I think though this problem is due to me currently using depth-reconstruction as I didn't yet switch back to hacing a full RGBF16 position texture in the gbuffer. And far away the differences in the depth value are so small that stepping fails to be accurate in the narrow-phase while in the view-space version the z-difference is precise enough. I'm going to change this once I have the position texture back in the gbuffer.   So right now I would say my screen-space version still wins in terms of overall quality once I have fixed the narrow-phase with using z instead of depth.
  15.   I gave it a try to implement it but the view-space version is even worse than the screen-space version what goes for the broad-phase. This I did expect since I looked for screen-space to counter exactly this problem. The narrow-phase though I had to adjust and this one works better. Coverage calculation though is totally horrible and results in punctured geometry worse than before. The marked areas show this problem well. [attachment=17150:test2.jpg]   So I guess the best solution is screen-space but with a modified narrow-phase calculation. Let's see if this works out.   Astonishing is that with the modified narrow-phase in the view-space version coverage actually fades out if samples turn not included in the image. If just the punctured pattern would go away it would be near optimal given the instable nature of the SSR algorithm to begin with.