> On Optimus systems, the Intel card is always the one hooked to the monitor.
> If you call get() query family for getting timestamp information, you're forcing the NV driver to ask Intel drivers to get information (i.e. have we presented yet?).
Even if it askes the Intel driver, it doesnt seem to slow it down, considering that I submit hundreds of queries per frame. And why should it delay the rendering ?
Because the NV driver needs to wait on the Intel driver. In a perfect world the NV driver would be asynchronous if you're asynchronous as well. But in the real world, where Optimus devices can't get VSync straight, I wouldn't count on that. At all. There's several layers of interface communication (NV stack, DirectX, WDDM, and Intel stack) and a roundtrip is going to be a deep hell where likely a synchronous path is the only one being implemented.
> Edit: Also if it's Windows 7, try disabling Aero. Also check with GPUView (also check this out).
It is not a matter of performance, but a matter of delay and synchronisation. All rendring on the GPU falls more or less exactly in the SwapBuffers call, as if the driver waits too long before submitting the command queue and get eventually forced by calling SwapBuffers.
Aero adds an extra layer of latency. Anyway, not the problem since you're in Windows 10.
I have Windows 10 btw. I looked into GPUView, but it seems to track only DirectX events.
GPUView tracks GPU events such as DMA transfers, page commits, memory eviction, screen presentation. All of which is API agnostic and thus works with DirectX and OpenGL (I've successfully used GPUView with OpenGL apps). IIRC it also supports some DX-only events, but that's not really relevant for an OGL app.