• Advertisement
Sign in to follow this  

Weird Performance Bottleneck

This topic is 678 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In my video processing library, I was assuming that my performance bottleneck was due to memory transfers from the GPU to the CPU. If I comment out GetRenderTargetData which transfers back from the GPU to the CPU after processing the chain of HLSL shaders, I still get low performance.
If I don't transfer the data back, then there shouldn't be any difference in design with running it for display.
I'm running a simple script with no data transfer back. With 8 threads I get 35.5fps, with 1 thread I get 5.2fps. The CPU is barely utilized at all.
Basically I'm calling this function for every shader in the processing chain and transferring the output of one as the input of the next. There isn't much overhead besides that.
A few other things that are weird with my design:
- Very HIGH memory usage under x64 (in one case I saw memory usage almost as high in x86 too, depending on the setup)
- Sometimes the performance gets very low for no reason (thread synchronization issue?)
- It runs *faster* on the Intel HD 4000 than on the Radeon 7670M!?
Perhaps these issues are all related to one fundamental design flaw.
If I don't transfer the data back to the CPU, there's no reason why I should get any lower performance than media players that do image refinement in the same way for display.
Thanks if you can bring any insight!
Edit: OK. I basically commented the code that did all memory transfers, and all CPU-intensive code. All that is left is the bare minimum. No PixelShader is assigned and no textures are assigned to the scene, and I don't read back the result. I still get 6.2fps with 1 thread with 2% CPU usage!??
The only line that appears to make a difference is to comment m_pDevice->Present, at which point performance goes up to... 3058fps!
I'm missing something very fundamental here...
Any idea?
Now I get 411fps on a small image and it's limited by CPU usage. One thread gives me 135fps!
Edited by MysteryX

Share this post

Link to post
Share on other sites
What you did was disabling vsync. If you aim for a stable 60fps, you should be able to render a frame within 16.67sec (1/60), both with and without vsync. Can you measure your frametime (with your current setting, vsync disabled)? Then you know for sure what you're talking about

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement