how to count the time spent on gpu?

Started by
2 comments, last by softimage 20 years, 1 month ago
it seems that dx can send instructions to gpu and return without waiting for completion. so how can i know how much time it takes for gpu to complete the computation? i use vmr9 to read mpeg files and add pixel shader effect on every video frame provided by vmr9. my program uses the following for every frame: beginscene, setrendertarget, setpixelshader, drawprimitive, endscene. i counted the time spent on it using DXUtil_Timer(), the result is 1/580 second. then i added another phase, so it is like this for every frame: beginscene, ..., endscene, beginscene, ..., endscene the video began to be of some stutters, so i doubt the burden is too large. but the time consumed on the beginscene...endscene is 1/560 second. so i doubt it returns without waiting. i know there is a program called fraps which can count the fps of games. but for videos it''s fixed at 30fps.
Advertisement
unfortunately I doubt that you''ll get anywhere with this. Nvidia/ATI don''t tend to tell you how the finer workings of their GPU''s operate - and for good reason really.

With all the parallel processing / batched execution type tricks they play to get speed increases it''s probably no simple function to work out how long something will take.

As for the async nature of D3D not waiting for the render to complete before returning, you can sometimes get an idea of what the GPU will do by looking at the drivers. I know one of my previous drivers had a feature of "render no more than __ frames ahead", and I''m guessing that would mean that it''ll only cache __ frames before forcing D3D to wait until its caught up (this is noticable if you use some profiling tools in specific scenarios).

The only option is probably to ask devrel@ati.com - ATI have always helped me in the past, Nvidia weren''t as helpful - but I haven''t spoken to them in a while.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

quote:Original post by jollyjeffers
I know one of my previous drivers had a feature of "render no more than __ frames ahead", and I''m guessing that would mean that it''ll only cache __ frames before forcing D3D to wait until its caught up (this is noticable if you use some profiling tools in specific scenarios).

All WHQL drivers can buffer up to a maximum of 3 frames, most probably so that - as Rich Thomson says - "[...]the GPU can be displaying frame N and rendering frame N+1 while the CPU queues frame N+2. Any more queueing than that and you introduce delays. Any less queueing than that and you lose possibilities for parallelism"

Muhammad Haggag

I createrendertarget(lockable) and lockrect() then unlockrect() after endscene().
The result is almost the same as using a tool called FRAPS.
But somehow the cpu usage rises from 6% to 40% when locking.

I found that on geforce fx5600, to keep 30fps of a 512*256 video,
I can put at most 68 "dp4" instructions, compared to 47 on fx5200.

Is that common?
How can games achieve 100fps while having complicated scene?
Or is the pixel shader still not fast enough now?

[edited by - softimage on March 3, 2004 8:56:59 PM]

This topic is closed to new replies.

Advertisement