Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Jul 2009
Offline Last Active Yesterday, 12:06 PM

#5184972 What do with compute shaders?!

Posted by Aks9 on 04 October 2014 - 10:34 AM


Aks9 - Geometry shaders are useful too! No need to skip them. Just to know where to use them. 



I know. ;)

I'm sorry if my previous post make a confusion. I have skipped them in previous counting because of a general performance, not because of functionality.

#5184965 Knowing my REAL OpenGL version - RESOLVED

Posted by Aks9 on 04 October 2014 - 09:48 AM

"Father, forgive them, for they do not know what they are doing." sad.png 


That is bad advice, aks9. There's nothing to learn when it comes to context creation, other than what a nightmare it can be if you have older (or buggy) drivers. 



This is a typical agnostic claim. Everything is a source of knowledge. A rendering context creation is the first thing one should learn when starting with computer graphics.

But OK, I don't have time or will to argue about that.


Could you post a link, example or whatever to illustrate "the nightmare"? I've been creating GL contexts by myself about 18 years already and never had a problem. The problems could arise if you create GL 3.0+ context and hope everyone support it. Well, that is not a problem of drivers. Older drivers cannot assume what might happen in the future.


If the drivers are buggy, there is no workaround for the problem! 


I just look at the whole thing as risky, since if you take that code with you to other projects, one day one of the people trying the game/program out will simply not be able to run it because their driver requires a workaround.


I really don't understand this.What kind of workaround? The way how a GL context is created is defined by the specification. Why risky?




You can do the same exact things with SDL or any other library, except you will have less grief during the process. Most of the time, anyways.



I want to have control in my hands so no intermediary stuff is welcome. It is harder at the start, but the filling of freedom is priceless.




This link is totally out of context. The guy is frustrated by something, but give no arguments for his claims.

Considering platform specific APIs for porting OpenGL, there was an initiative to make them unique. Khronos started development of EGL, but it is not adopted for desktop OpenGL yet.




EDIT: I downvoted you aks9, but I can't undo it. sad.png Your post is helpful, so I'm sorry.


Don't be sorry. That was your opinion and you have right to express it through (down)voting. Points means really nothing to me.

Forums should be the way to share knowledge and opinions. Some of them are true, some not. I hope right advices still prevail on the behalf of users.

#5184926 Knowing my REAL OpenGL version - RESOLVED

Posted by Aks9 on 04 October 2014 - 05:40 AM


I'm going to give SDL a shot and see if it fixed my problems! 



I'm horrified with suggestions to use any of the library/wrapper for OpenGL. :(

It is not easy, but it is always better to understand what's happening under the hood than to be helplessly dependent on others.

Despite of its imperfections, OpenGL is still the best 3D graphics API for me, since I can do whatever I want having to install just the latest drivers and nothing more.

Of course, I need also a developing environment (read Visual Studio). Nobody wants to code in notepad and compile in command prompt.


Let's back to your problem. Before going any further revise your pixel format. It is not correct. The consequence is turning off HW acceleration and switching to OpenGL 1.1.

Tho following code snippet shows how to create valid GL context:

    memset(&pfd, 0, sizeof(PIXELFORMATDESCRIPTOR));
    pfd.nSize  = sizeof(PIXELFORMATDESCRIPTOR);
    pfd.nVersion   = 1; 
    pfd.iPixelType = PFD_TYPE_RGBA; 
    pfd.cColorBits = 32;
    pfd.cDepthBits = 24; 
    pfd.iLayerType = PFD_MAIN_PLANE;
nPixelFormat = ChoosePixelFormat(hDC, &pfd);
if (nPixelFormat == 0)
strcat_s(m_sErrorLog, LOGSIZE, "ChoosePixelFormat failed.\n");
return false;
DWORD error = GetLastError();
BOOL bResult = SetPixelFormat(hDC, nPixelFormat, &pfd);
if (!bResult)
error = GetLastError();
strcat_s(m_sErrorLog, LOGSIZE, "SetPixelFormat failed.\n");
return false;
HGLRC tempContext = wglCreateContext(hDC); 
int attribs[] =
WGL_CONTEXT_FLAGS_ARB, WGL_CONTEXT_DEBUG_BIT_ARB, // I suggest using debug context in order to know whats really happening and easily catch bugs
wglCreateContextAttribsARB = (PFNWGLCREATECONTEXTATTRIBSARBPROC) wglGetProcAddress("wglCreateContextAttribsARB");
if(wglCreateContextAttribsARB != NULL)
context = wglCreateContextAttribsARB(hDC, 0, attribs);

#5161898 why the gl_ClipDistance[] doesn't work?

Posted by Aks9 on 21 June 2014 - 06:41 AM

Clipping works perfectly, but you have to enable it. ;)


By setting glEnable(GL_CLIP_DISTANCE0+2), you have enabled gl_ClipDistance[2], not 0 and 1.

Also, I'm not sure whether your math is correct or not. It depends on your algorithm.


Just to know, if gl_ClipDistance[x] > 0.0, the vertex is visible, if it is <= 0.0, it is clipped.


#5155016 Are two OpenGL contexts still necessary for concurrent copy and render?

Posted by Aks9 on 21 May 2014 - 03:32 AM


OK! My apologies!

But you are wrong about my ego. As you can see it is not so large. smile.png

I didn't noticed the time in the post in this forum. I'm regularly checking OpenGL forums, and this thread I noticed after several of my answers on another, which make me think you are not satisfied with them. That was my mistake.

Btw, you didn't knowledge my posts, which also make me think you are not satisfied and searching for other opinions. That's why I overreacted. Mea culpa! (I have also attended gymnasium and had Latin, so I hope we could understand each other perfectly. wink.png  )

#5154244 Are two OpenGL contexts still necessary for concurrent copy and render?

Posted by Aks9 on 17 May 2014 - 06:01 AM

I can't quote a source right now and might be wrong on the exact hardware generation (though I believe it was in Cozzi and Riccio's book?). Basically, the thing is that pre-Kepler (or was it Fermi? I think it was Kepler) hardware has one dedicated DMA unit that runs inependently of the stream processors, so it can do one DMA operation while it is rendering, without you doing anything special. However, only Quadro drivers actually use this feature, consumer-level drivers stall rendering while doing DMA. Kepler and later have 2 DMA units and could do DMA uploads and downloads in parallel while rendering, but again, only Quadro drivers use the hardware's full potential.

This drives me crazy because I've already answered on Prune's question in another forum. dry.png

Pre-Fermi GPUs do not allow overlapping rendering and data downloading/uploading. Fermi was the first NV architecture where it is enabled. High-end Quadro cards have two separate DMA channels that can overlap, while GeForce cards have (or at least is enabled) just one. It is not clear whether two channels can transfer data in the same direction simultaneously (I guess not, but it is quite reasonable) This is known as (Dual) Copy Engine. Kepler has the same capability as Fermi considering the way how copy engine is working. Activating Copy Engine is not free, so by default it is turned off. NV drivers use heuristics (there is no special command to turn it on) to activate copy engine, and that is a separate context doing only data tranfer. That's why the second context is necessary.


Please correct me if I'm wrong.


This is probably my last post post about (Dual) Copy Engine since I'm really tired of repeating the same thing. 

#5154225 Are two OpenGL contexts still necessary for concurrent copy and render?

Posted by Aks9 on 17 May 2014 - 02:32 AM

Looking at http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-Optimized-Texture-Transfers.pdf

Are the two contexts required? Will rendering not occur while the DMA transfer is proceeding, unless I do the upload in another thread, even with last-gen NVIDIA cards? If so, how does that make sense? It seems an artificial limitation, as the hardware obviously can handle it (even in the single copy engine consumer-level cards) if you have another thread.


(If it matters, I'm using persistently mapped PBOs.)

What's wrong with the answer I've already given to you?

Or you want to hear another formulation of the same answer?

"No, it is not necessary, but that is how NV drivers are designed so far and there is no other way to turn on copy engine".

#5148512 Custom view matrices reduces FPS phenomenally

Posted by Aks9 on 21 April 2014 - 04:42 AM

If you modify a model matrix, don't update your projection matrix just for shits and giggles, that's inefficient and a huge performance drop. Imagine how many times per second you are doing that!


When we are speaking of optimization, NV does not transfer values to uniforms if they are not changed. It is probably not the case for buffers. Of course, buffers are transfered to graphics card memory only before they are actually used. So, frequent change before drawing should not affect performance significantly. Especially because it is a small amount of data in case of uniform blocks and calls communicate only with drivers' memory space in main memory.



Also, instead of working out the model view proj matrix on the CPU, do the multiplication in your shader. GPUs are far better at matrix multiplication in almost any situation.



I have strongly to disagree with this statement. Model/view/projection matrix calculation is far better to be done on the CPU side. In case of scientific visualization, when precision is important, CPU (when say this I mean Intel, because I'm not familiar with AMD architecture) can generate 10 orders of magnitude more precise matrices than GPU. I don't even know how such huge number is called. :) Transformations cumulatively generates errors. If double precision is not used the transformation cannot be accurate enough. Further more, transcendental functions  are calculated only using single precision on the GPU. CUDA and similar APIs emulate double precision for such functions, but in OpenGL there is no transcendental functions emulations. I agree that hardware implemented transcendental functions are enormously fast. No CPU can compete with GPUs in that field. Just a single clock interval for a function call! Besides the fact that the number of SFU (as they are called) are not equal to SP units, pipeline usually hides the latency imposed by waiting for the SFU. But, as I already said, the high-level accuracy cannot be achieved.

#5142044 Opengl design

Posted by Aks9 on 25 March 2014 - 12:17 PM

 If you want to do anything serious, forget about software right now.

This is very true, and I'm sorry I was not able to take part in this discussion earlier, since I think it went in wrong direction.


Rendering time does matter! It matters a lot, so I have to disagree with most "facts" frob used to illustrate his opinion.


But if your 3D viewpoint includes offline processing, such as rendering that takes place in movies, print, other physical media, or scientific rendering, software rendering is a pretty good thing.


It is maybe good, but HW accelerated is better. With legacy OpenGL it was really necessary to implement algorithms on the CPU side in order to have ray tracing and similar stuff. But now it is not. And if we have several order of magnitude acceleration through GPU usage, I simply don't understand why anybody would defend slower solutions.


There are some cases when CPU can beat GPU in rendering. That is a case when cache coherence is very weak, when different technologies compete for the resources and communicate through high number of small buffers that have to be synchronized. In most cases, beating GPUs like GK110 (with 2880 cores, 6x64-bit memory controllers, GDDR5 memory) in graphics stuff (where parallelization can be massive) is almost impossible. And we are taking about orders of magnitude!



Think about the resolution we get out of modern graphics cards.


Monitors with DVI can get up to about 1920x1200 resolution. That's about 2 megapixels.  Most 4k screens get up to 8 megapixels. Compare it with photographers who complain about not being able to blow up their 24 megapixel images. In the physical film world, both 70mm and wide 110 are still popular when you are blowing things up to wall-size, either in print or in movies. The first is about 58 megapixel equivalent, the second about 72 megapixel equivalent. 


When you see an IMAX 3D movie, I can guarantee you they were not worried about how quickly their little video cards could max out on fill rate. They use an offline process that generates large high quality images very slowly.

What does the resolution matter? This is a very inappropriate example.

IF GPU can render a 2M scene in 16ms, 72M scene can be rendered in 576ms. That's only 0.6s.

Using CPU implementation (what we call "software") it would take almost a minute.

Of course, it depends on the underlaying hardware.



In the film industry, I bet it is not irrelevant if some post-production lasts several days or several months.

There are a lot GPU accelerated renderers for professional 3D applications, although they are using CUDA (probably because it was easier to port it to CUDA than to OpenGL, and because there is lack of precision control and relatively new tessellation and computation support in OpenGL).


 There are companies today that sell even faster software rendering middleware for running modern games with real-time speeds for systems with under-featured graphics cards that cannot run modern shaders .

Can you give some useful link? How can CPU be even near the speed of GPU and also do some other tasks (AI, game logic, resource handling, etc.). This is SF, or far slower than it should be to be useful. Be honest, who will buy 3D video cards if games can be played smoothly on the CPU only?


You can definitely play games like Quake 1 - 3 on a software only renderer. With things like AVX2, DDR4 and cpus with 8+ cores, the performance disparity between software rendering and "hardware" rendering decreases significantly.

I really doubt in it. Any useful link to support this claim?

#5116448 OpenGl as a blitter

Posted by Aks9 on 12 December 2013 - 02:52 AM

Yes, glTexImage2D copies a pixel array into the video RAM or other memory controlled by the video driver.

Actually, it is partially true.

Drivers do really copy data to theirs memory, but it is still RAM. Only when the data is really needed they are copped into a GPU memory.

In short, glTexImage2D prepares data to be sent to GPU, and tells GL that they should be sent, but it occurs immediately before the usage.

It is a kind of driver optimization. Neither glFlush or glFinish would force them to do it if the date from the texture are not used. 

#5116446 Drawing vertices without actual vertices?

Posted by Aks9 on 12 December 2013 - 02:42 AM

you will need at the very least 1 vertex and an index buffer with a length == to the number of vertices you want to generate. I could be wrong, but I'm almost positive that the vertex shader only runs and only runs once per vertex in the VBO.

No, you don't. It is perfectly valid to draw without a single attribute.




Fancy, though what is the use?


Rendering without attributes (not without vertices, since it is impossible) is an awfully powerfully rendering method. In last 3 or 4 years I have almost everything drawn without a single attribute. From screen-aligned quads and 2D charts to some very complex geometries. For simple shapes index buffer is not required, but complex ones need "scheme" for "vertex distribution". For my purposes, attributless rendering is more powerful than tesselation shaders since there is no limitations for patch size.

#5111240 Getting order of attributes, as specified in shader file.

Posted by Aks9 on 22 November 2013 - 02:07 AM

Yes. By using layout location qualifier.
For example: layout (location = 0) in vec4 vPosition;

#5090585 Where is the Nvidia Opengl sdk?

Posted by Aks9 on 31 August 2013 - 08:04 AM

I have been waiting since January for the latest OpenGL sdk from Nvidia, do they ever plan to release it?

NVIDIA OpenGL SDK was announced at April 2011. So, it's only two and a half years late. If I were you I wouldn't wait for it. wink.png 

Probably NV guys think OpenGL programmers are more skillful than D3D ones, and they don't need (or deserve) programming examples. rolleyes.gif

I don't blame them as long as the drivers are kept in form.


On the other hand, things are happening more rapidly than in D3D world, hence making up-to-date examples is not simply.

#5082700 GLSL double precision

Posted by Aks9 on 03 August 2013 - 02:42 AM

First, I have to pay tribute to Brother Bob, and to his endless patience in answering to some trivial questions. Although I'm also involved in teaching, I'm far less tolerant to something that have to be basic knowledge in computer science (like binary numbers representation - a secondary school knowledge). My respect!

Let's back to topic... The performance drop caused by double precision is overestimated in this thread. Generally speaking, it is always better to use SP instead of DP if it is possible, but the impact of DP calculation could be well masked by compiler's optimization. On the other hand, when we are talking about GPUs, we are talking about very massive parallelism using architecture with very deep pipeline. If there is no stalls, no dependencies and a compiler is perfectly done its job, the execution time is 1 clock for each instruction. Also, different GPUs have different architecture. For example, Fermi uses two SP units to perform DP operation, while Kepler has separate DP units (the number varies from series to series). So, it is very hard to predict performance.

The major problems with DP calculation are: they are not widely supported and transcendental functions are always calculated with SP. The second problem is very serious, since even on the mighty graphics cards GLSL cannot calculate trigonometry accurately. CUDA and OpenCL overcome the problem by software emulation and the price is longer execution time. HW implementation enables transcendental function execution in a single clock (but only in single precision).

It is worth to mention (since somebody asked if SP is guaranteed) that all new graphics cards should be fully IEEE754 complaint. I can firmly claim that for NV cards since G80 (which means in last 7 years). But older cards are not. So be careful if you have to deal with older cards.

SP is not adequate for many purposes (precision of just 6-7 decimal digits), but it is almost certainly enough for visualization. My recommendation is to perform all precision-critical calculation on CPU using DP, but downcast values to SP before sending to GPU. After tons of tricks and optimizations, I have succeeded to visualize Earth's surface with precision of 1e-6m with just SP calculation in GPU. It is a very precise scientific visualization using WGS84 ellipsoid. So, if it works for me for the terrain at least three orders of magnitude greater than yours, there is no doubt it can works for you too.

#5064985 Programming in OpenGL 4 with a netbook, it is posible?

Posted by Aks9 on 26 May 2013 - 09:01 AM

At least under linux I can use GLSL 1.0 on my Atom N550 netbook (which has said GM3150). Performance is obviously questionable at best.

Wow! That's a miraculous achievement of GM3150. smile.png

Let's remember, GL2.0 requires GLSL 1.1 (or tu be more precise 1.10.xx).

GLSL 1.0 is a predecessor of GLSL exposed in the time of GL1.4 to announce arrival of new era. The interface to shaders is not the same as in GLSL 1.1 and other successors. That's what I meant when said ancient  interface.