Jump to content
  • Advertisement
Sign in to follow this  
MysteryX

Weird Performance Bottleneck

This topic is 892 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In my video processing library, I was assuming that my performance bottleneck was due to memory transfers from the GPU to the CPU. If I comment out GetRenderTargetData which transfers back from the GPU to the CPU after processing the chain of HLSL shaders, I still get low performance.
 
If I don't transfer the data back, then there shouldn't be any difference in design with running it for display.
 
I'm running a simple script with no data transfer back. With 8 threads I get 35.5fps, with 1 thread I get 5.2fps. The CPU is barely utilized at all.
 
Basically I'm calling this function for every shader in the processing chain and transferring the output of one as the input of the next. There isn't much overhead besides that.
 
A few other things that are weird with my design:
- Very HIGH memory usage under x64 (in one case I saw memory usage almost as high in x86 too, depending on the setup)
- Sometimes the performance gets very low for no reason (thread synchronization issue?)
- It runs *faster* on the Intel HD 4000 than on the Radeon 7670M!?
Perhaps these issues are all related to one fundamental design flaw.
 
If I don't transfer the data back to the CPU, there's no reason why I should get any lower performance than media players that do image refinement in the same way for display.
 
Thanks if you can bring any insight!
 
 
Edit: OK. I basically commented the code that did all memory transfers, and all CPU-intensive code. All that is left is the bare minimum. No PixelShader is assigned and no textures are assigned to the scene, and I don't read back the result. I still get 6.2fps with 1 thread with 2% CPU usage!??
 
The only line that appears to make a difference is to comment m_pDevice->Present, at which point performance goes up to... 3058fps!
 
I'm missing something very fundamental here...
 
Any idea?
 
 
 
 
AAAAAHHHHH!!!!! 
 
One flag. I have to replace D3DPRESENT_INTERVAL_DEFAULT with D3DPRESENT_INTERVAL_IMMEDIATE !!!
 
Now I get 411fps on a small image and it's limited by CPU usage. One thread gives me 135fps!
Edited by MysteryX

Share this post


Link to post
Share on other sites
Advertisement
What you did was disabling vsync. If you aim for a stable 60fps, you should be able to render a frame within 16.67sec (1/60), both with and without vsync. Can you measure your frametime (with your current setting, vsync disabled)? Then you know for sure what you're talking about

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!