Sign in to follow this  
antber

Profiling CPU state changes

Recommended Posts

antber    129
Im trying to profile CPU state changes time. Here is what i do first time (based on DirectX article - Accurately Profiling Direct3D API Calls): g_pd3dDevice->CreateQuery(D3DQUERYTYPE_EVENT, &pEvent); pEvent->Issue(D3DISSUE_END); while(S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH )) ; // Now start counting because the video card is ready QueryPerformanceCounter(&start); for( DWORD ii=0; ii<10000; ii++ ) { g_pd3dDevice->SetStreamSource(0, g_pVB[ii%2], 0, sizeof(VERTEX)); //g_pd3dDevice->SetIndices(g_pIB[ii%2]); //g_pd3dDevice->SetTexture(0, g_pMeshTextures[ii%2]); //g_pd3dDevice->SetTransform( D3DTS_WORLD, &g_pWorld[ii%2] ); // draw only one triangle out of view frustrum to minimize GPU work g_pd3dDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, g_pMesh[ii]->GetNumVertices(), 0 , 1); batches++; } pEvent->Issue(D3DISSUE_END); while(S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH )) ; QueryPerformanceCounter(&stop); float numTicksPerCall = (float)(stop.QuadPart - start.QuadPart) / batches; nummcs = 1000000.0f * numTicksPerCall / freqCounter.QuadPart; For example, i got 0.7 microseconds per batch on SetStreamSource+DrawIndexedPrimitive on Core2Duo 2Ghz But when use more streams, i get more time! For example on 3000 streams (g_pVB[ii%3000]) i got 1.7 microseconds per batch! I dont understand the reazon why it happens. Same occurs with switches indexes and textures (for SetTransform it remains unchanged ~1.3 mks) More streams, more time on switches! At first i thought it depends from buffer content. But when i load same data in buffers, nothing was changed! And now im frustraded a bit. Maybe i did wrong something? I missed something? Why it happened?

Share this post


Link to post
Share on other sites
jollyjeffers    1570
Multi-stream rendering is known to be a bit slower on some architectures, but that's not what you're using here from what I see.

My thoughts would be whether there is any caching or redundant state change evaluation taking place. It might well be that your GPU driver has some sort of LRU cache or something that can handle a number of calls and above that threshold the cache is useless.

Do any tools like PIX or NVPerfHUD give you insight into what the underlying OS/Driver/Hardware are doing?

hth
Jack

Share this post


Link to post
Share on other sites
antber    129
Yes, i dont use multistream rendering.
I used PerfHUD, but what i found is that GPU frame time remains unchanged and CPU frame time grows up (every batch takes more time). I dont know how can i find more detail information.

Share this post


Link to post
Share on other sites
jollyjeffers    1570
Quote:
Original post by antber
I used PerfHUD, but what i found is that GPU frame time remains unchanged and CPU frame time grows up (every batch takes more time).
Interesting - are you able to determine which process/component uses more CPU time?? I suspect not, but CPU-side there is your code, Microsoft's code and Nvidia's code so it'd be useful to know where to look...

Quote:
Original post by antber
I dont know how can i find more detail information.
Unfortunately the drivers are like a black box unless you have friends at Nvidia. Even if you are on their partner program it can be very tough to get an answer out of them.

Bare in mind that your code snippet doesn't represent a realistic application so you may well be exercising part of the driver that isn't optimized or is not an expected use. Writing more "normal" code will probably make the driver happier [smile]

hth
Jack

Share this post


Link to post
Share on other sites
antber    129
Thank you for your answer.

Im trying to count my CPU budget - I have about 8000 batches in some scenes and this is a bottleneck. The geometry is simple and GPU idle about half time. This rendering cycle is typical for my application. I thought that one state switch time should remains the same, but it depends from number of unique data buffers (VB, IB, texture). Switches on already used "states" are faster, and I dont understand why. I thought driver should overwrite previous pointer to new value and thats all, because all we need already in video memory.

Share this post


Link to post
Share on other sites
jollyjeffers    1570
Quote:
Original post by antber
I have about 8000 batches in some scenes and this is a bottleneck. The geometry is simple and GPU idle about half time. This rendering cycle is typical for my application.
Doesn't surprise me this is a bottleneck! The general rule-of-thumb is to try and keep to 500-1000 batches per frame as a maximum...

Quote:
Original post by antber
Switches on already used "states" are faster, and I dont understand why. I thought driver should overwrite previous pointer to new value and thats all, because all we need already in video memory.
From your analysis it would seem to me that you're witnessing some sort of caching algorithm. The exact details of the GPU and driver are pretty guarded secrets so it is next to impossible to make guesses or assumptions about their behaviour. The behaviour you're seeing might well be due to some work-around or might reflect some unknown small cache/buffer somewhere and your code is causing "cache thrash"...


Ultimately you have a good set of tests to know good code and bad code, so I would direct your energy at reducing those 8000 draw calls and/or the number of unique buffers.

hth
Jack

Share this post


Link to post
Share on other sites
antber    129
Yes, Im working on grouping objects ant packing textures into atlases.
The main problem is, that Im a PhD student, and I need to explain as more things as i can. But that black boxes don't give me enough freedom.
Anyway, thanks for your help.

Share this post


Link to post
Share on other sites
jollyjeffers    1570
If you're a PhD student then talk with your university and see if anyone has connections with Nvidia. Companies and Universities are sufficiently big to get Nvidia's attention, so you may get lucky and find that they'll explain the black box to you [smile]

Jack

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this