[Solved]NV Optimus notebook spend too much time in copy hardware queue?

Started by
5 comments, last by alek314?? 7 years, 10 months ago

Our game has a huge frame rate drop on a notebook equipped with Nvidia 960M GTX Optimus.

Turn out it spend a large amount of time on the copy queue. And there are many "signal command packet" and "wait command packet".

Looks like they make the render thread blocking until copy finished.

Any idea why this thing happened? Thank you.

By the way, I also check league of legends on this machine, they have "signal command packet" as well, but no sign of "wait".

[Update]

After twiddling with the nvidia control panel, frame rates double. There are still present packet in the hardware copy queue, but it use significantly less time than previous runs.

I am still confused which option did the magic, any way, it works, for now... thanks everyone for your suggestion.

Advertisement

You could be using more GPU RAM than the GPU actually has available, which will cause D3D to constantly move textures/etc in and out of GPU RAM for you, possibly multiple times a frame. Try reducing your texture resolution drastically and see if the problem goes away.

This is basically how Optimus works: (simplified version) all rendering commands are executed on the NVIDIA, and once the scene is complete the framebuffer is transferred to the Intel for display. If you look at the Optimus white paper, section headed "Optimus Copy Engine" (page 18) you'll get further info.

The theory is that for the transfer of the framebuffer, while it takes extra time, the extra performance from running commands on the NVIDIA outweighs what would otherwise be measurable overhead from it, and the net result is higher overall performance.

In practice however Intel GPUs aren't actually useless any more, and for certain workloads can even run fast enough that just using the Intel on it's own can easily outperform the full NVIDIA/Intel combination. I would guess that since you're measuring the copy engine as such a dominant factor your game has such a workload, and I'd suggest doing some benchmarking using the Intel on it's own to verify if this is indeed the case.

Even if this is the case, you may also get improved performance by just doing an NVIDIA driver update or reinstall. Optimus seems fussy about this, and anecdotally I personally got a significant performance increase on an Optimus laptop by doing this. I haven't analysed it in sufficient detail to determine if it's related to order that the NVIDIA and Intel drivers were installed in, some other system software installed after the original NVIDIA driver causing trouble, or something else.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

By the way if you're reading from the framebuffer, it would totally explain it (i.e. postprocessing, or worse... reading from CPU).
Treat the backbuffer as write-only.
I just realized: are you clearing the colour, depth and stencil buffers every frame? (at least the ones linked to the swap chain)
If you're not, you're creating inter-frame dependencies that could also explain this behaviour.

To the OP have you figured it out yet? If so can you post it here... I'm curious.

-potential energy is easily made kinetic-

Still investigating this issue.

We clear color, depth and stencil every frame, but in two call, one clears color, the other clear depth and stencil.

What is strange is that other game's present packet is in the 3D queue, but ours is in the copy queue.

And our game run faster on a notebook with NV 635m than the brand new NV960m, using the same configuration. :blink:

This topic is closed to new replies.

Advertisement