Jump to content
  • Advertisement
Sign in to follow this  
Ripiz

[C++, DX9] Strange performance

This topic is 2948 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Device creation:
D3DPRESENT_PARAMETERS InitializeParameters(){
D3DPRESENT_PARAMETERS dx_PresParams;
ZeroMemory( &dx_PresParams, sizeof(dx_PresParams) );
dx_PresParams.Windowed = windowed;
dx_PresParams.BackBufferCount = 1;
dx_PresParams.SwapEffect = D3DSWAPEFFECT_DISCARD;
dx_PresParams.BackBufferFormat = display.Format;
dx_PresParams.EnableAutoDepthStencil = TRUE;
if(SUCCEEDED(object->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, display.Format, D3DUSAGE_DEPTHSTENCIL, D3DRTYPE_SURFACE, D3DFMT_D24X8)))
dx_PresParams.AutoDepthStencilFormat = D3DFMT_D24X8;
else if(SUCCEEDED(object->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, display.Format, D3DUSAGE_DEPTHSTENCIL, D3DRTYPE_SURFACE, D3DFMT_D24S8)))
dx_PresParams.AutoDepthStencilFormat = D3DFMT_D24S8;
else if(SUCCEEDED(object->CheckDeviceFormat(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, display.Format, D3DUSAGE_DEPTHSTENCIL, D3DRTYPE_SURFACE, D3DFMT_D16)))
dx_PresParams.AutoDepthStencilFormat = D3DFMT_D16;
dx_PresParams.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;
dx_PresParams.Flags = D3DCREATE_MULTITHREADED | D3DCREATE_PUREDEVICE | D3DCREATE_HARDWARE_VERTEXPROCESSING;
if(!windowed){
dx_PresParams.BackBufferWidth=(int)WIDTH;
dx_PresParams.BackBufferHeight=(int)HEIGHT;
}
return dx_PresParams;
}

void InitializeDevice() {
object->CreateDevice(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, window, D3DCREATE_MULTITHREADED | D3DCREATE_PUREDEVICE | D3DCREATE_HARDWARE_VERTEXPROCESSING, &InitializeParameters(), &device);
Lights();
device->SetRenderState(D3DRS_CULLMODE, D3DCULL_CCW);
device->SetRenderState(D3DRS_LIGHTING, true);
device->SetRenderState(D3DRS_ZENABLE, true);
D3DXCreateFont(device,20,0,FW_NORMAL,0,FALSE,DEFAULT_CHARSET,OUT_DEFAULT_PRECIS,DEFAULT_QUALITY,DEFAULT_PITCH | FF_DONTCARE,TEXT("Arial"),&font);
D3DXCreateFont(device,13,0,FW_NORMAL,0,FALSE,DEFAULT_CHARSET,OUT_DEFAULT_PRECIS,DEFAULT_QUALITY,DEFAULT_PITCH | FF_DONTCARE,TEXT("Arial"),&chatfont);
device->SetSamplerState(0, D3DSAMP_MIPMAPLODBIAS, (DWORD)0.0f);
device->SetSamplerState(0, D3DSAMP_MAXANISOTROPY, 1);

device->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);
device->SetRenderState(D3DRS_ALPHAFUNC, D3DCMP_GREATEREQUAL);
device->SetRenderState(D3DRS_ALPHAREF, (DWORD)1);
device->SetRenderState(D3DRS_ALPHATESTENABLE, TRUE);
device->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_SRCALPHA);
device->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_INVSRCALPHA);

ID3DXBuffer *pBuffer = NULL;
D3DXCreateEffectFromFile( device, "data/shaders/shader.fx", 0, 0, D3DXFX_NOT_CLONEABLE, 0, &shader, &pBuffer );
if(pBuffer != NULL){
MessageBox(window, (char*)pBuffer->GetBufferPointer(), NULL, NULL);
exit(0);
}
shader->SetValue("DiffuseLightColor", Vector4(0.7f,0.7f,0.7f,1.0f), sizeof(Vector4));
shader->SetFloat("LayerOneSharp", 0.6f);
shader->SetFloat("LayerOneRough", 0.01f);
shader->SetFloat("LayerOneContrib", 0.08f);
shader->SetFloat("LayerTwoSharp", 0.55f);
shader->SetFloat("LayerTwoRough", 2.0f);
shader->SetFloat("LayerTwoContrib", 0.12f);
shader->SetValue("LightPosition", Vector3(-50.0f,50.0f,100.0f), sizeof(Vector3));

LPDIRECT3DSURFACE9 surf;
D3DSURFACE_DESC surf_desc;
device->GetBackBuffer(0,0,D3DBACKBUFFER_TYPE_MONO,&surf);
surf->GetDesc(&surf_desc);
WIDTH = (float)surf_desc.Width;
HEIGHT = (float)surf_desc.Height;
surf->Release();
D3DXCreateSprite(device, &sprite);

Matrix m_Projection;
D3DXMatrixPerspectiveFovLH(&m_Projection, D3DX_PI/4, (float)800/600, 1, 10000);
device->SetTransform(D3DTS_PROJECTION, &m_Projection);
}






Performance Analysis:


Why does d3dx9_42.dll take so much CPU? I have 50% all the time in Task Manager (Dual Core CPU), while professional games keep it below 50% (Assassin's Creed etc)

[Edited by - Ripiz on July 20, 2010 8:58:31 AM]

Share this post


Link to post
Share on other sites
Advertisement
You just post your init code but the time is more likely spend in your render loop.

I see you use D3DXSprite. This is part of the D3DX9_42.dll. Therefore if you have many calls for this class the result is no surprise.

Running with 50% on a dual core machine just means that you are CPU limited while your games with less than 50% are limited by your GPU.

Share this post


Link to post
Share on other sites
To work out which code to optimize the first thing to find out is which functions in d3dx9_42.dll are taking most of the processor time. You can then track down where they are being called from and if there's anyway to call them less or otherwise make the game run quicker.

Having the task manager say that the CPU load is at 50% all the time is good - it means your game is fully utilizing one CPU core, which it generally should do. If it's taking less than 100% of one CPU core then it's not running as fast as it could be - it's possibly GPU limited and not CPU limited (although in my experience the driver will usually sit in a busy wait loop eating CPU time when it's waiting for the GPU to catch up).

Under some circumstances it might be worth giving Windows a small portion of that CPU time back so that it's more responsive on single core systems, especially when running in windowed mode, but don't do that when optimizing for performance.

Some comments on that device creation code:

- Have you tested performance without D3DCREATE_PUREDEVICE? It can be quicker.
- D3DCREATE_MULTITHREADED adds some performance overhead. Only specify that if you're calling D3D functions from multiple threads.

- I expect you're getting a nasty warning from the compiler about taking the address of a temporary for "&InitializeParameters()". I'd recommend copying the function result to a local variable.
- Check out the documentation for what should be in the flags member of the D3DPRESENT_PARAMETERS structure. You probably want to set it to 0.
- You've set D3DSAMP_MIPMAPLODBIAS incorrectly. Your code will work for 0 but fail for any other value. You need to pass *(DWORD*)&floatVariable to get it right.
- If you're using shaders and not the fixed function pipeline you don't need to change D3DRS_LIGHTING. I don't think it will cause any problems though.

Share this post


Link to post
Share on other sites
New device code:
D3DPRESENT_PARAMETERS InitializeParameters(){
D3DPRESENT_PARAMETERS dx_PresParams;
ZeroMemory( &dx_PresParams, sizeof(dx_PresParams) );
dx_PresParams.Windowed = windowed;
dx_PresParams.BackBufferCount = 1;
dx_PresParams.SwapEffect = D3DSWAPEFFECT_DISCARD;
dx_PresParams.BackBufferFormat = display.Format;
dx_PresParams.EnableAutoDepthStencil = TRUE;
dx_PresParams.AutoDepthStencilFormat = D3DFMT_D16; //temporaly
dx_PresParams.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;
dx_PresParams.Flags = 0;
if(!windowed){
dx_PresParams.BackBufferWidth=(int)WIDTH;
dx_PresParams.BackBufferHeight=(int)HEIGHT;
}
return dx_PresParams;
}

void InitializeDevice(){
D3DPRESENT_PARAMETERS *params = &InitializeParameters();
object->CreateDevice(D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, window, D3DCREATE_MULTITHREADED | D3DCREATE_PUREDEVICE | D3DCREATE_HARDWARE_VERTEXPROCESSING, params, &device);
Lights();
device->SetRenderState(D3DRS_CULLMODE, D3DCULL_CCW);
device->SetRenderState(D3DRS_ZENABLE, true);
D3DXCreateFont(device,20,0,FW_NORMAL,0,FALSE,DEFAULT_CHARSET,OUT_DEFAULT_PRECIS,DEFAULT_QUALITY,DEFAULT_PITCH | FF_DONTCARE,TEXT("Arial"),&font);
D3DXCreateFont(device,13,0,FW_NORMAL,0,FALSE,DEFAULT_CHARSET,OUT_DEFAULT_PRECIS,DEFAULT_QUALITY,DEFAULT_PITCH | FF_DONTCARE,TEXT("Arial"),&chatfont);
device->SetSamplerState(0, D3DSAMP_MAXANISOTROPY, 1);

device->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);
device->SetRenderState(D3DRS_ALPHAFUNC, D3DCMP_GREATEREQUAL);
device->SetRenderState(D3DRS_ALPHAREF, (DWORD)1);
device->SetRenderState(D3DRS_ALPHATESTENABLE, TRUE);
device->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_SRCALPHA);
device->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_INVSRCALPHA);

ID3DXBuffer *pBuffer = NULL;
D3DXCreateEffectFromFile( device, "data/shaders/shader.fx", 0, 0, D3DXFX_NOT_CLONEABLE, 0, &shader, &pBuffer );
if(pBuffer != NULL){
MessageBox(window, (char*)pBuffer->GetBufferPointer(), NULL, NULL);
exit(0);
}
shader->SetValue("DiffuseLightColor", Vector4(0.7f,0.7f,0.7f,1.0f), sizeof(Vector4));
shader->SetFloat("LayerOneSharp", 0.6f);
shader->SetFloat("LayerOneRough", 0.01f);
shader->SetFloat("LayerOneContrib", 0.08f);
shader->SetFloat("LayerTwoSharp", 0.55f);
shader->SetFloat("LayerTwoRough", 2.0f);
shader->SetFloat("LayerTwoContrib", 0.12f);
shader->SetValue("LightPosition", Vector3(-50.0f,50.0f,100.0f), sizeof(Vector3));

LPDIRECT3DSURFACE9 surf;
D3DSURFACE_DESC surf_desc;
device->GetBackBuffer(0,0,D3DBACKBUFFER_TYPE_MONO,&surf);
surf->GetDesc(&surf_desc);
WIDTH = (float)surf_desc.Width;
HEIGHT = (float)surf_desc.Height;
surf->Release();
D3DXCreateSprite(device, &sprite);

Matrix m_Projection;
D3DXMatrixPerspectiveFovLH(&m_Projection, D3DX_PI/4, (float)800/600, 1, 10000);
device->SetTransform(D3DTS_PROJECTION, &m_Projection);
}



I draw only 2 lines of text (FPS and Average Frame Time) using D3DXSprite, it results in 375 FPS in average (2.666 ms).

D3DCREATE_PUREDEVICE doesn't seem to make any difference, or it's very small.

I do use separate thread to load and unload textures, however if you think that I don't lower runtime loading lag using this way of loading resources, I can remove it, it's the only place where I use Multi-threading.

Skinned mesh drawing code:

void Animation::DrawMeshFrame(D3DXFRAME* pFrame){
Container* pMC = ( Container* )pFrame->pMeshContainer;
D3DXMATRIX mx;
if( pMC->pSkinInfo == NULL )
return;
LPD3DXBONECOMBINATION pBC = ( LPD3DXBONECOMBINATION )( pMC->m_pBufBoneCombos->GetBufferPointer() );
DWORD dwAttrib, dwPalEntry;
shader->SetTechnique("Skinning");
for( dwAttrib = 0; dwAttrib < pMC->m_dwNumAttrGroups; ++ dwAttrib ){
for( dwPalEntry = 0; dwPalEntry < pMC->m_dwNumPaletteEntries; ++ dwPalEntry ){
DWORD dwMatrixIndex = pBC[ dwAttrib ].BoneId[ dwPalEntry ];
if( dwMatrixIndex != UINT_MAX )
D3DXMatrixMultiply( &pMC->m_amxWorkingPalette[ dwPalEntry ], &( pMC->m_amxBoneOffsets[ dwMatrixIndex ] ), pMC->m_apmxBonePointers[ dwMatrixIndex ] ); // 0.5% CPU time
}
shader->SetMatrixArray( "amPalette", pMC->m_amxWorkingPalette, pMC->m_dwNumPaletteEntries ); // <0.1% CPU to set 4*16*70 size value (float size * number of floats in matrix * number of bones)
device->SetTexture(0, pMC->materials[pBC[ dwAttrib ].AttribId].texture->GetTexture());
shader->SetInt( "CurNumBones", pMC->m_dwMaxNumFaceInfls - 1 ); // 0.2% wtf 4 bytes slower than 4kb?
unsigned int uiPasses;
shader->Begin(&uiPasses, 0); // 1% wth
shader->BeginPass( 0 );
pMC->m_pWorkingMesh->DrawSubset( dwAttrib ); // 0.9% yay?
shader->EndPass(); // 1.1% another wth
shader->End();
}
}



Little more code:

void Animation::UpdateFrame(LPD3DXFRAME frame, Matrix *mx){
D3DXMatrixMultiply( &frame->TransformationMatrix, &frame->TransformationMatrix, mx ); // 3%, but in drawing code exactly same multiplication is 0.5%, wtf?
// transform siblings by the same matrix
if( frame->pFrameSibling )
UpdateFrame( frame->pFrameSibling, mx );

// transform children by the transformed matrix - hierarchical transformation
if( frame->pFrameFirstChild )
UpdateFrame( frame->pFrameFirstChild, &frame->TransformationMatrix );
}



Drawing two lines of text took 6.3%
device->Present(NULL, NULL, NULL, NULL); takes 4% (why so long?)



Thanks for all help.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ripiz
Drawing two lines of text took 6.3%
Is that 6.3% overall, 6.3% for the first frame, ofr 6.3% for all but the first frame? The first frame will be slow because ID3DXFont has to render all of the characters to its internal texture. After the characters are rendered to the texture, it'll be very quick to draw text (Just as fast as rendering sprites, which results in one dynamic vertex buffer lock and probably one DrawPrimitve() call (Depending on the size of your text)).

Quote:
Original post by Ripiz
device->Present(NULL, NULL, NULL, NULL); takes 4% (why so long?)
Because calling Present() tells the GPU to finish up all draw calls and present the backbuffer, so it'll probably be waiting for the GPU to finish drawing to the backbuffer before it can swap.

Share this post


Link to post
Share on other sites
Quote:
Original post by Evil Steve
Is that 6.3% overall, 6.3% for the first frame, ofr 6.3% for all but the first frame?

That's overall average. I have few non pow 2 sprites on my login screen and extra text (as well as character selection screen), but I go through those in less than 5 seconds. Then I wait 30-60 seconds while game is rendered: two lines of text, one terrain model, four house models, 15 skinned models. All models use Toon Shader, skinned models also use Hardware Skinning.
I assume 30-60 seconds is enough to negate loading time (it doesn't show loading terrain or houses took any CPU time, so I guess it gets negated).
Basically, my single core is 100% used up, and almost 7% taken by two lines of text. I'll wait through longer for more accurate results.

Share this post


Link to post
Share on other sites
IIRC the way you used threads to load textures didn't make much sense...

You are in the main thread rendering ... you need to set a texture ... it's not loaded.

You go to the texture resource manager and it starts a thread to load the texture ... you wait for the thread in the main thread.

The thread starts and loads the texture using calls against the device. When it completes it exits, and the main thread resumes.

So this doesn't really win you anything. It's not like your main thread simply skips rendering the object for which the texture isn't loaded, it just blocked until the texture was loaded ... you might as well do that in the same thread instead of creating a new one to do it.

If instead you were background loading the textures for areas that were not currently in the viewing frustum ... and not blocking the main thread on their successful load ... then it would make more sense.

Another small tip - during initialization time get handles to all the uniform externs in your FX files, then use those handles for stuff like SetInt() etc, rather than the string name.

Also I'm not sure specifically what you are asking? How many milliseconds does it take for your game to render an average frame? Is it too slow? What are you trying to optimize?

Share this post


Link to post
Share on other sites
I think the problem may be here:

dx_PresParams.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;

It is very cycle intensive
Try using D3DPRESENT_INTERVAL_ONE instead. I saw my cpu usage drop from 50% to 2% when I made this switch in a project of mine.

Share this post


Link to post
Share on other sites
Right now I have it this way:
Main requests texture
If texture is loaded it returns texture, if it's not loaded Texture manager starts new thread and returns 0 (as if no texture present)
Frame gets rendered without texture
On next frame, main thread requests texture
If texture is loaded now it returns texture, if still loading returns 0

Quote:
Original post by Steve_Segreto
Another small tip - during initialization time get handles to all the uniform externs in your FX files, then use those handles for stuff like SetInt() etc, rather than the string name.

Thanks! Will try this, hope it'll speed up things.

Quote:
Original post by Steve_Segreto
Also I'm not sure specifically what you are asking? How many milliseconds does it take for your game to render an average frame? Is it too slow? What are you trying to optimize?

I just feel like my code is way too much CPU dependent, I fully use one core and my framerate is limited by CPU. Just trying to find out why I have such high CPU usage.

Share this post


Link to post
Share on other sites
First of all, why are you using sprites to draw text? This seems like a horrible way of doing it. Compare with a font texture and see how things are.

Secondly, you have both ID3DXSprite and ID3DXEffect used, and both will be making calls into d3dx9_42.dll every frame, hence the fact that d3dx9_32.dll is where all your CPU is going. You can use PIX to examine the D3D calls that these use behind the scenes and get more info on what's going on, but I bet that there's also a lot of CPU work.

Thirdly, if you're drawing each sprite with it's own draw call, you can be sure that's the source of your heavy CPU usage. Again, put all your characters into a font texture, batch up your calls, and use a single draw call for everything.

Fourthly, as has been said, high CPU usage overall is what you want. It means that your CPU is working as fast as it can for you, and this is a desirable situation. If a professional game is using less than 50% CPU (i.e. 100% of one core on a dual-core) it means that the CPU is going idle when it should be processing game logic instead. Maybe you're just being paranoid about it?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!