• Advertisement
Sign in to follow this  

PIX optimizations?

This topic is 1651 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, my GPU1 usage is about 50%, GPU2 usage is 10%. Total CPU usage is 15%, no core goes over 30% usage.

 

However, if i draw more stuff my fps goes down, even though usage does not go up!

 

Here is an image captured from PIX.

[attachment=17331:performance.png]

 

As you can see, in the begining I copy data to one texture, then the CPU does work, then I copy a bunch of stuff to dynamic vertex buffers(first big chunk), then CPU does some more work and finally I draw everything(second big chunk).

 

Is it possible to increase performance? I'm thinking moving stuff together on the timeline?

 

Cheers!

Share this post


Link to post
Share on other sites
Advertisement

You probably want to start with a CPU profiler like Very Sleepy to see where the slowness is on the CPU side is. Make sure you profile using an optimized build.

 

You should also try and avoid unnecessary CPU<->GPU synchronization, like reading back data on the CPU that the GPU has just written.

Share this post


Link to post
Share on other sites


Make sure you profile using an optimized build.

Is setting the build to Release enough to make it optimized? What optimizations can you recomend in Visual Studio 2010?

Share this post


Link to post
Share on other sites

Yes, setting it to release mode is enough. The extra settings you can adjust on top of that usually don't have that much impact on performance.

 

I'd also recommend starting the program without debugging, so you don't get the overhead of the debug heap.

Share this post


Link to post
Share on other sites

Yes, setting it to release mode is enough. The extra settings you can adjust on top of that usually don't have that much impact on performance.

 

I'd also recommend starting the program without debugging, so you don't get the overhead of the debug heap.

Actually there is one that will make a bit of a difference and the is turning SSE2 on, most (95%) of CPU's out there nowadays support SSE2 so turning it on should be fine.

Share this post


Link to post
Share on other sites

Do you know if standard D3DXMATRIX operations will take advantage of SSE2? Is it off by default?

Share this post


Link to post
Share on other sites

Do you know if standard D3DXMATRIX operations will take advantage of SSE2? Is it off by default?

It doesn't it is implemented as normal float arrays and not as intrinsics, the DirectXMath or XNAMath(on older SDKs, June 2010 is still XNAMath I believe) checks whether the CPU target you are compiling for can actually support SSE instruction and if it does switches to intrinsics, if it doesn't it falls back to non intrinsic code.

Edited by NightCreature83

Share this post


Link to post
Share on other sites

I thought D3DX did, given the existence of stuff like D3DXMATRIXA16 (and 16byte alignment being an SSE thing)? I don't see why that would be a bottle neck though, unless that last block has a GPU -> CPU data transfer? Can't the CPU just carry on preparing the next frame while the GPU does those Draw's?

Edited by SyncViews

Share this post


Link to post
Share on other sites

I thought D3DX did, given the existence of stuff like D3DXMATRIXA16 (and 16byte alignment being an SSE thing)? I don't see why that would be a bottle neck though, unless that last block has a GPU -> CPU data transfer? Can't the CPU just carry on preparing the next frame while the GPU does those Draw's?

16byte alignment is not a specific SSE thing it just happens to be that 16byte aligned data just fetches faster from memory as in most systems it is a full cache line.

Share this post


Link to post
Share on other sites

I thought D3DX did, given the existence of stuff like D3DXMATRIXA16 (and 16byte alignment being an SSE thing)?

I'm not sure if D3DX is SSE accelerated, but newer versions of Visual Studio can automatically compile float-based code using the SSE instruction set (especially if you align your data correctly).

I don't see why that would be a bottle neck though, unless that last block has a GPU -> CPU data transfer? Can't the CPU just carry on preparing the next frame while the GPU does those Draw's?

Yes. The GPU will usually be behind the CPU by at least one entire frame. The two of them will be completely out of sync, which is good.
When optimising, you need to find out which processor is taking the most time per-frame, and optimize it's workload first.
e.g. if the GPU completes a frame's worth of data in 4ms, but the CPU takes 33ms, then your game will be running at 30Hz, even though the GPU could possibly be running at 250Hz.

16byte alignment is not a specific SSE thing it just happens to be that 16byte aligned data just fetches faster from memory as in most systems it is a full cache line.

AFAIK, 64bytes is a common cache line size for Intel/x86 CPUs, or 128 bytes in the current gen consoles ;)
But yes, a 16-bit aligned float4 will never be straddling the boundary between two cache lines, whereas an unaligned one might.

Share this post


Link to post
Share on other sites

Yeah, was just thinking I am sure I read 64 bytes somewhere regarding caches.

 

 

 


When optimising, you need to find out which processor is taking the most time per-frame, and optimize it's workload first.
e.g. if the GPU completes a frame's worth of data in 4ms, but the CPU takes 33ms, then your game will be running at 30Hz, even though the GPU could possibly be running at 250Hz.

 

Well here he is saying neither the CPU core, nor the the GPU core are fully utilised, and that PIX screenshot looks like 450ms, while I guess a measure of CPU time might be say 150ms (still seems wrong, am I a factor of 10 out with PIX?). So I guess he must be stalling the CPU then, either waiting for the GPU to provide something, or else where (multi threading, filesystem, networking, etc)? Certainly do not see how it could be computationally limited in a way SSE or alignment would help?

Edited by SyncViews

Share this post


Link to post
Share on other sites

Hello, my GPU1 usage is about 50%, GPU2 usage is 10%. Total CPU usage is 15%, no core goes over 30% usage.

How are you making these measurements. "Processor usage" statistics usually aren't very useful values when profiling. You need to get milliseconds-per-frame measurements, and milliseconds-per-function measurements for the major parts of your code (e.g. the update and render functions, etc).
 
When you "draw more stuff", have a look at which functions consume more time -- which of your measured timers increases? Is there a correlation between your measurements and the frame-rate? If you're measuring ~20ms per frame on the CPU, is the frame rate ~50Hz?
 
You also need to determine if the CPU or the GPU is the bottleneck -- e.g. if your CPU is stalling in the Present function, then it's probably waiting for the GPU to catch up.
 
Like Adam said above, you can either use an external profiler to measure how much CPU time is being consumed by each part of your code, and/or you can manually add timing code to different parts of your game to measure how long different operations take each frame.

that PIX screenshot looks like 450ms

Unless your game is actually running at 2 frames per second, I would completely disregard these measurements for profiling purposes. Edited by Hodgman

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement