hi, these two pix screens are taken form crysis 2 and from my game. can you help me interpret them? why ratio between cpu ang gpu frames are 1:1 in crysis while in my game it's 1:2 or sometimes 1:3? what can possibly cause such big difference in 'cpu duration' and 'gpu duration' in my game (in crysis they are almost the same)?
(i'm using dx 9 and windows 7)
TIA
PIX timelines
PIX stats
Help with interpreting PIX results
Your 2nd capture tells more things , you make a lot of draw calls wich is nasty because it burden the driver queue and has call overhead at each call (enqueuing in the driver , entering the functions, ect...) (you should batch more), moreover you make a huge amount of lock in one frame over the vb's , i suppose you upload procedural geometries but you seem to do that on plenty of vb's ...
The lock and upload into the vb has a cost and i don't know the size of each upload (consider it's like transferring RAM to VRAM) ...
But nevertheless the cpu is more used in one frame , meaning that i guess you are performing calculation inside locks ?
If the cpu is slow try to consider your algo / maths , use SIMD if possible, but first of all use a profiler to see whats happening ?
The lock and upload into the vb has a cost and i don't know the size of each upload (consider it's like transferring RAM to VRAM) ...
But nevertheless the cpu is more used in one frame , meaning that i guess you are performing calculation inside locks ?
If the cpu is slow try to consider your algo / maths , use SIMD if possible, but first of all use a profiler to see whats happening ?
Your 2nd capture tells more things , you make a lot of draw calls wich is nasty
4,5k draw calls is really that big? i think every commercial fps has more
moreover you make a huge amount of lock in one frame over the vb's , i suppose you upload procedural geometries but you seem to do that on plenty of vb's ...
sry, i didn't label tables on second screen, but first one is from crysis 2 (second is from my game) - they have ~700 locks on vb, i have only 200, we have also almost equal number of draw calls (athough they have deferred lighning so half of their are cheaper)
the main question is how to interpret such big difference in cpu and gpu time? (in crysis it's almost the same), is it just mean that i'm cpu bound, or there is sth wrong with my render queqe?
but first of all use a profiler to see whats happening ?
i do, but the problem is that it shows nothing useful at this moment: there are no bottlenecks, the cpu time is is evenly distributed over the code, there is sth wrong with cpu-gpu synchronization i guess but i don't know what, or maybe i just put too much work on cpu
with regards
You're doing more work on the CPU than you are on the GPU. In other words, you're CPU bottlenecked. You could give the GPU more work (e.g. write more expensive shaders) and your FPS would remain the same.
why ratio between cpu ang gpu frames are 1:1 in crysis while in my game it's 1:2 or sometimes 1:3?
Crysis was written for consoles by professionals, and I'd wager that during development they kept a keen eye on how many milliseconds of work they were giving to both the GPU and CPU and tried their hardest to keep them balanced.
what can possibly cause such big difference in 'cpu duration' and 'gpu duration' in my game (in crysis they are almost the same)?[/quote]You're submitting lots of small draw-calls. The CPU has a fixed overhead for setting up a draw call, whereas the GPU cost depends on how much work it contains. Your GPU is blasting through the work you're giving it and has enough time for a cigarette break afterwards. Give it more work (larger batches).4,5k draw calls is really that big? i think every commercial fps has moreI haven't measured the draw-call count from other games like Crysis, but 4 to 5 thousand seems like a lot to me.
4,5k draw calls is really that big? i think every commercial fps has moreI haven't measured the draw-call count from other games like Crysis, but 4 to 5 thousand seems like a lot to me.
With a very little information over your situation it's kind of talking in blind and only trying to guess what's up there.
Just rapidly , tell me that you profile your app in optimized compiled target (Release) ?
What's your game engine architecture, is it multithreaded or not ?, your GPU wait , so it's in the CPU ok, you make too much stuff ok but generally, when we search for performance , we run a profiler and identify the largest time consuming functions and then analyze code ...
They are plenty of solutions to optimize something , so my question is are you sure you are doing theses thing the fastest possible ? have you analyze deeper the LHS hazard's on your cache lines for example inside big loops for example... have you multithreaded , if you make maths , did you use SIMD ?
A correct and precise answer is possible only if you have the application source and profilers on and architecture in mind... sorry i don't think someone can reply easily by forums...
Try to search hotspots in your app in your profiler and analyze the code.
Hope this helps.
Just rapidly , tell me that you profile your app in optimized compiled target (Release) ?
What's your game engine architecture, is it multithreaded or not ?, your GPU wait , so it's in the CPU ok, you make too much stuff ok but generally, when we search for performance , we run a profiler and identify the largest time consuming functions and then analyze code ...
They are plenty of solutions to optimize something , so my question is are you sure you are doing theses thing the fastest possible ? have you analyze deeper the LHS hazard's on your cache lines for example inside big loops for example... have you multithreaded , if you make maths , did you use SIMD ?
A correct and precise answer is possible only if you have the application source and profilers on and architecture in mind... sorry i don't think someone can reply easily by forums...
Try to search hotspots in your app in your profiler and analyze the code.
Hope this helps.
You're submitting lots of small draw-calls. The CPU has a fixed overhead for setting up a draw call, whereas the GPU cost depends on how much work it contains. Your GPU is blasting through the work you're giving it and has enough time for a cigarette break afterwards. Give it more work (larger batches)
this is pix screen from other location (with only 800 draw calls), ratio between cpu and gpu duration is still the same..
another pix screen
[quote name='Hodgman' timestamp='1305705478' post='4812326']
You're submitting lots of small draw-calls. The CPU has a fixed overhead for setting up a draw call, whereas the GPU cost depends on how much work it contains. Your GPU is blasting through the work you're giving it and has enough time for a cigarette break afterwards. Give it more work (larger batches)
this is pix screen from other location (with only 800 draw calls), ratio between cpu and gpu duration is still the same..
another pix screen[/quote]You've misunderstood what I was trying to get across.
Let's say you've got some number of draw-calls "D", and they each draw "N" triangles.
The CPU-side cost is D * CPU_Overhead.
The GPU-side cost is D * N * GPU_Overhead.
Now let's say that CPU_Overhead is 50, while GPU_Overhead is 2 (totally made up numbers, but doesn't matter!)
In the case where D is 1, and N is 10:
CPU Cost == 1 * 50 == 50
GPU Cost == 1 * 10 * 2 == 20
However, if we increase the number of triangles per-draw call to 500 (i.e. N is now 500)
CPU Cost == 1 * 50 == 50
GPU Cost == 1 * 500 * 2 == 1000
So you can see with many small (small meaning: "small amount of GPU work") draw calls, the CPU has to do more work than the GPU does.
However, with larger draw calls (e.g. more triangles for GPU to process), the CPU cost stays the same, but now the GPU does more work than the CPU.
You want each of your draw calls to do as much GPU work as possible. A few large draw-calls is better than many small draw-calls.
Just rapidly , tell me that you profile your app in optimized compiled target (Release) ?
ofc
What's your game engine architecture, is it multithreaded or not ?, .
yes,
when we search for performance , we run a profiler and identify the largest time consuming functions and then analyze code ..
but as i said, i've done this, and i don't have 'largest time consuming functions' or even group of them now, i optimized them before and i can't do much more cause i have 500 functions and they all take approximately the same amount of cpu time, but anyway i don't think this is a real problem
Try to search hotspots in your app in your profiler and analyze the code.
again, based on profilng i don't have hotspots, i'm almost sure it's a problem with gpu, cpu waiting for each other, but i dont know how to locate and fix this, + based on 'evga precision' cpu main core usage in game is 80% max and gpu 60%
Ok guy , i asked here and you can assume that the locks causes synchronisation between GPU and CPU ...that's to say that your CPU can wait a certain amount of time waiting that driver doesn't need the resources anymore especially when having some frames of delay in the command buffer (eg presentation intervall other than immediate).
we have had this problem on a big game like Crysis here, and the lock time was driver dependent...
Try to deactivate lock if you can and capture and you will see if it's the pbm...
we have had this problem on a big game like Crysis here, and the lock time was driver dependent...
Try to deactivate lock if you can and capture and you will see if it's the pbm...
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement