Profiling CPU state changes

Started by
6 comments, last by jollyjeffers 15 years, 5 months ago
Im trying to profile CPU state changes time. Here is what i do first time (based on DirectX article - Accurately Profiling Direct3D API Calls): g_pd3dDevice->CreateQuery(D3DQUERYTYPE_EVENT, &pEvent); pEvent->Issue(D3DISSUE_END); while(S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH )) ; // Now start counting because the video card is ready QueryPerformanceCounter(&start); for( DWORD ii=0; ii<10000; ii++ ) { g_pd3dDevice->SetStreamSource(0, g_pVB[ii%2], 0, sizeof(VERTEX)); //g_pd3dDevice->SetIndices(g_pIB[ii%2]); //g_pd3dDevice->SetTexture(0, g_pMeshTextures[ii%2]); //g_pd3dDevice->SetTransform( D3DTS_WORLD, &g_pWorld[ii%2] ); // draw only one triangle out of view frustrum to minimize GPU work g_pd3dDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, g_pMesh[ii]->GetNumVertices(), 0 , 1); batches++; } pEvent->Issue(D3DISSUE_END); while(S_FALSE == pEvent->GetData( NULL, 0, D3DGETDATA_FLUSH )) ; QueryPerformanceCounter(&stop); float numTicksPerCall = (float)(stop.QuadPart - start.QuadPart) / batches; nummcs = 1000000.0f * numTicksPerCall / freqCounter.QuadPart; For example, i got 0.7 microseconds per batch on SetStreamSource+DrawIndexedPrimitive on Core2Duo 2Ghz But when use more streams, i get more time! For example on 3000 streams (g_pVB[ii%3000]) i got 1.7 microseconds per batch! I dont understand the reazon why it happens. Same occurs with switches indexes and textures (for SetTransform it remains unchanged ~1.3 mks) More streams, more time on switches! At first i thought it depends from buffer content. But when i load same data in buffers, nothing was changed! And now im frustraded a bit. Maybe i did wrong something? I missed something? Why it happened?
Advertisement
Multi-stream rendering is known to be a bit slower on some architectures, but that's not what you're using here from what I see.

My thoughts would be whether there is any caching or redundant state change evaluation taking place. It might well be that your GPU driver has some sort of LRU cache or something that can handle a number of calls and above that threshold the cache is useless.

Do any tools like PIX or NVPerfHUD give you insight into what the underlying OS/Driver/Hardware are doing?

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Yes, i dont use multistream rendering.
I used PerfHUD, but what i found is that GPU frame time remains unchanged and CPU frame time grows up (every batch takes more time). I dont know how can i find more detail information.
Quote:Original post by antber
I used PerfHUD, but what i found is that GPU frame time remains unchanged and CPU frame time grows up (every batch takes more time).
Interesting - are you able to determine which process/component uses more CPU time?? I suspect not, but CPU-side there is your code, Microsoft's code and Nvidia's code so it'd be useful to know where to look...

Quote:Original post by antber
I dont know how can i find more detail information.
Unfortunately the drivers are like a black box unless you have friends at Nvidia. Even if you are on their partner program it can be very tough to get an answer out of them.

Bare in mind that your code snippet doesn't represent a realistic application so you may well be exercising part of the driver that isn't optimized or is not an expected use. Writing more "normal" code will probably make the driver happier [smile]

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Thank you for your answer.

Im trying to count my CPU budget - I have about 8000 batches in some scenes and this is a bottleneck. The geometry is simple and GPU idle about half time. This rendering cycle is typical for my application. I thought that one state switch time should remains the same, but it depends from number of unique data buffers (VB, IB, texture). Switches on already used "states" are faster, and I dont understand why. I thought driver should overwrite previous pointer to new value and thats all, because all we need already in video memory.
Quote:Original post by antber
I have about 8000 batches in some scenes and this is a bottleneck. The geometry is simple and GPU idle about half time. This rendering cycle is typical for my application.
Doesn't surprise me this is a bottleneck! The general rule-of-thumb is to try and keep to 500-1000 batches per frame as a maximum...

Quote:Original post by antber
Switches on already used "states" are faster, and I dont understand why. I thought driver should overwrite previous pointer to new value and thats all, because all we need already in video memory.
From your analysis it would seem to me that you're witnessing some sort of caching algorithm. The exact details of the GPU and driver are pretty guarded secrets so it is next to impossible to make guesses or assumptions about their behaviour. The behaviour you're seeing might well be due to some work-around or might reflect some unknown small cache/buffer somewhere and your code is causing "cache thrash"...


Ultimately you have a good set of tests to know good code and bad code, so I would direct your energy at reducing those 8000 draw calls and/or the number of unique buffers.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Yes, Im working on grouping objects ant packing textures into atlases.
The main problem is, that Im a PhD student, and I need to explain as more things as i can. But that black boxes don't give me enough freedom.
Anyway, thanks for your help.
If you're a PhD student then talk with your university and see if anyone has connections with Nvidia. Companies and Universities are sufficiently big to get Nvidia's attention, so you may get lucky and find that they'll explain the black box to you [smile]

Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

This topic is closed to new replies.

Advertisement