Sign in to follow this  
quaikohc2

Help with interpreting PIX results

Recommended Posts

hi, these two pix screens are taken form crysis 2 and from my game. can you help me interpret them? why ratio between cpu ang gpu frames are 1:1 in crysis while in my game it's 1:2 or sometimes 1:3? what can possibly cause such big difference in 'cpu duration' and 'gpu duration' in my game (in crysis they are almost the same)?

(i'm using dx 9 and windows 7)

TIA

[url="http://imageshack.us/photo/my-images/806/pixtimeline.png/"]PIX timelines[/url]
[url="http://imageshack.us/photo/my-images/806/pixstats.png/"]PIX stats[/url]

Share this post


Link to post
Share on other sites
Your 2nd capture tells more things , you make a lot of draw calls wich is nasty because it burden the driver queue and has call overhead at each call (enqueuing in the driver , entering the functions, ect...) (you should batch more), moreover you make a huge amount of lock in one frame over the vb's , i suppose you upload procedural geometries but you seem to do that on plenty of vb's ...
The lock and upload into the vb has a cost and i don't know the size of each upload (consider it's like transferring RAM to VRAM) ...

But nevertheless the cpu is more used in one frame , meaning that i guess you are performing calculation inside locks ?
If the cpu is slow try to consider your algo / maths , use SIMD if possible, but first of all use a profiler to see whats happening ?

Share this post


Link to post
Share on other sites
[quote name='boubi' timestamp='1305666493' post='4812097']
Your 2nd capture tells more things , you make a lot of draw calls wich is nasty[/quote]

4,5k draw calls is really that big? i think every commercial fps has more

[quote name='boubi' timestamp='1305666493' post='4812097']moreover you make a huge amount of lock in one frame over the vb's , i suppose you upload procedural geometries but you seem to do that on plenty of vb's ...
[/quote]

sry, i didn't label tables on second screen, but first one is from crysis 2 (second is from my game) - they have ~700 locks on vb, i have only 200, we have also almost equal number of draw calls (athough they have deferred lighning so half of their are cheaper)

the main question is how to interpret such big difference in cpu and gpu time? (in crysis it's almost the same), is it just mean that i'm cpu bound, or there is sth wrong with my render queqe?

[quote name='boubi' timestamp='1305666493' post='4812097'] but first of all use a profiler to see whats happening ?[/quote]

i do, but the problem is that it shows nothing useful at this moment: there are no bottlenecks, the cpu time is is evenly distributed over the code, there is sth wrong with cpu-gpu synchronization i guess but i don't know what, or maybe i just put too much work on cpu

with regards

Share this post


Link to post
Share on other sites
[quote name='quaikohc2' timestamp='1305622567' post='4811840']
why ratio between cpu ang gpu frames are 1:1 in crysis while in my game it's 1:2 or sometimes 1:3?[/quote]You're doing more work on the CPU than you are on the GPU. In other words, you're CPU bottlenecked. You could give the GPU more work (e.g. write more expensive shaders) and your FPS would remain the same.

Crysis was written for consoles by professionals, and I'd wager that during development they kept a keen eye on how many milliseconds of work they were giving to both the GPU and CPU and tried their hardest to keep them balanced.[quote] what can possibly cause such big difference in 'cpu duration' and 'gpu duration' in my game (in crysis they are almost the same)?[/quote]You're submitting lots of small draw-calls. The CPU has a fixed overhead for setting up a draw call, whereas the GPU cost depends on how much work it contains. Your GPU is blasting through the work you're giving it and has enough time for a cigarette break afterwards. Give it more work (larger batches).[quote name='quaikohc2' timestamp='1305704441' post='4812316']4,5k draw calls is really that big? i think every commercial fps has more[/quote]I haven't measured the draw-call count from other games like Crysis, but 4 to 5 thousand seems like a lot to me.

Share this post


Link to post
Share on other sites
Hidden
[quote name='quaikohc2' timestamp='1305704441' post='4812316']4,5k draw calls is really that big? i think every commercial fps has more[/quote]I haven't measured the draw-call count from other games like Crysis, but 4 to 5 thousand seems like a lot to me.

Share this post


Link to post
With a very little information over your situation it's kind of talking in blind and only trying to guess what's up there.

Just rapidly , tell me that you profile your app in optimized compiled target (Release) ?
What's your game engine architecture, is it multithreaded or not ?, your GPU wait , so it's in the CPU ok, you make too much stuff ok but generally, when we search for performance , we run a profiler and identify the largest time consuming functions and then analyze code ...
They are plenty of solutions to optimize something , so my question is are you sure you are doing theses thing the fastest possible ? have you analyze deeper the LHS hazard's on your cache lines for example inside big loops for example... have you multithreaded , if you make maths , did you use SIMD ?

A correct and precise answer is possible only if you have the application source and profilers on and architecture in mind... sorry i don't think someone can reply easily by forums...

Try to search hotspots in your app in your profiler and analyze the code.

Hope this helps.

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1305705478' post='4812326']
You're submitting lots of small draw-calls. The CPU has a fixed overhead for setting up a draw call, whereas the GPU cost depends on how much work it contains. Your GPU is blasting through the work you're giving it and has enough time for a cigarette break afterwards. Give it more work (larger batches)
[/quote]

this is pix screen from other location (with only 800 draw calls), ratio between cpu and gpu duration is still the same..

[url="http://imageshack.us/f/651/pixnewlocation.jpg/"]another pix screen[/url]

Share this post


Link to post
Share on other sites
[quote name='quaikohc2' timestamp='1305708549' post='4812370']
[quote name='Hodgman' timestamp='1305705478' post='4812326']
You're submitting lots of small draw-calls. The CPU has a fixed overhead for setting up a draw call, whereas the GPU cost depends on how much work it contains. Your GPU is blasting through the work you're giving it and has enough time for a cigarette break afterwards. Give it more work (larger batches)[/quote]

this is pix screen from other location (with only 800 draw calls), ratio between cpu and gpu duration is still the same..

[url="http://imageshack.us/f/651/pixnewlocation.jpg/"]another pix screen[/url][/quote]You've misunderstood what I was trying to get across.

Let's say you've got some number of draw-calls "D", and they each draw "N" triangles.

The CPU-side cost is D * CPU_Overhead.
The GPU-side cost is D * N * GPU_Overhead.

Now let's say that CPU_Overhead is 50, while GPU_Overhead is 2 ([i]totally made up numbers, but doesn't matter![/i])

In the case where D is 1, and N is 10:
CPU Cost == 1 * 50 == 50
GPU Cost == 1 * 10 * 2 == 20

However, if we increase the number of triangles per-draw call to 500 (i.e. N is now 500)
CPU Cost == 1 * 50 == 50
GPU Cost == 1 * 500 * 2 == 1000

So you can see with many small ([i]small meaning: "small amount of GPU work"[/i]) draw calls, the CPU has to do more work than the GPU does.
However, with larger draw calls ([i]e.g. more triangles for GPU to process[/i]), the CPU cost stays the same, but now the GPU does more work than the CPU.



You want each of your draw calls to do as much GPU work as possible. A few large draw-calls is better than many small draw-calls.

Share this post


Link to post
Share on other sites
[quote name='boubi' timestamp='1305705617' post='4812331']
Just rapidly , tell me that you profile your app in optimized compiled target (Release) ?[/quote]

ofc
[quote name='boubi' timestamp='1305705617' post='4812331']
What's your game engine architecture, is it multithreaded or not ?, .[/quote]

yes,

[quote name='boubi' timestamp='1305705617' post='4812331']when we search for performance , we run a profiler and identify the largest time consuming functions and then analyze code ..[/quote]

but as i said, i've done this, and i don't have 'largest time consuming functions' or even group of them now, i optimized them before and i can't do much more cause i have 500 functions and they all take approximately the same amount of cpu time, but anyway i don't think this is a real problem

[quote name='boubi' timestamp='1305705617' post='4812331']Try to search hotspots in your app in your profiler and analyze the code.[/quote]

again, based on profilng i don't have hotspots, i'm almost sure it's a problem with gpu, cpu waiting for each other, but i dont know how to locate and fix this, + based on 'evga precision' cpu main core usage in game is 80% max and gpu 60%

Share this post


Link to post
Share on other sites
Ok guy , i asked here and you can assume that the locks causes synchronisation between GPU and CPU ...that's to say that your CPU can wait a certain amount of time waiting that driver doesn't need the resources anymore especially when having some frames of delay in the command buffer (eg presentation intervall other than immediate).
we have had this problem on a big game like Crysis here, and the lock time was driver dependent...

Try to deactivate lock if you can and capture and you will see if it's the pbm...

Share this post


Link to post
Share on other sites
[quote name='boubi' timestamp='1305722594' post='4812492']
Ok guy , i asked here and you can assume that the locks causes synchronisation between GPU and CPU ...that's to say that your CPU can wait a certain amount of time waiting that driver doesn't need the resources anymore especially when having some frames of delay in the command buffer (eg presentation intervall other than immediate).
we have had this problem on a big game like Crysis here, and the lock time was driver dependent...

Try to deactivate lock if you can and capture and you will see if it's the pbm...
[/quote]

Looking at your PIX scrns you have 16x to 50x more texture locks then the crysis traces show, if these are texture read backs avoid them as they are really intensive especially if you after reading back upload them to the GPU again.

Also try and avoid using the User Primitives as they might upload their data to the GPU on the fly wereas VB's are uploaded as soon as you unlock them. The UP calls need to upload the data and don't return untill they are done doing this, look at the documentation of these functions.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this