Jump to content
  • Advertisement
Sign in to follow this  
supmagc

[DX9] IDirect3DDevice9::Present() way to long

This topic is 3270 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey, As a school project I'm currently developing a renderingframework with some features to enable gameplay-programming. Everything was going great, untill today. A few hours ago, I started testing the rendering performance on large scenes, and as it turns out, with only 200 planes to render (shared vertex-/index-buffer) my framerate nearly reaches 4. I profiled the render-cycle, and it looks as if calling the Present(0, 0, 0, 0); method on my device-pointer takes about 200msec. It might be interesting to know that the framework renders the scene on a seperate-thread than the main thread and that the device is created with the D3DCREATE_MULTITHREADED flag. (which doen't seems to make any difference)
// Behaviour flags
devBehaviourFlags = D3DCREATE_PUREDEVICE | D3DCREATE_MULTITHREADED | D3DCREATE_HARDWARE_VERTEXPROCESSING;

// presentParams
m_ePP.BackBufferWidth = 1280;
m_ePP.BackBufferHeight = 768;
m_ePP.BackBufferFormat = bFullscreen ? D3DFMT_X8R8G8B8 : m_eDisplayMode.Format;
m_ePP.BackBufferCount = 1;
m_ePP.MultiSampleType = D3DMULTISAMPLE_NONE;
m_ePP.MultiSampleQuality = 0;
m_ePP.SwapEffect = D3DSWAPEFFECT_DISCARD;
m_ePP.hDeviceWindow = KiCodil::GetRenderWindow()->GetWindow();
m_ePP.Windowed = !bFullscreen;
m_ePP.EnableAutoDepthStencil = true;
m_ePP.AutoDepthStencilFormat = D3DFMT_D24X8;
m_ePP.Flags = 0;
m_ePP.FullScreen_RefreshRateInHz = D3DPRESENT_RATE_DEFAULT;
m_ePP.PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;

// before rendering
HR(m_pDevice->Clear(0, NULL, D3DCLEAR_TARGET | D3DCLEAR_ZBUFFER, D3DCOLOR_RGBA(255, 255, 255, 0), 1.0f, 0));
HR(m_pDevice->BeginScene());

// after rendering
HR(m_pDevice->EndScene());
HR(m_pDevice->Present(NULL, NULL, NULL, NULL)); <= this devil takes about 200msec each frame
So, anybody any ideas on how to speed up the Present(0, 0, 0, 0); ?

Share this post


Link to post
Share on other sites
Advertisement
1: is this debug or release?

2: How are you profiling? With your own code if so can you describe it, if another program which one?

3: I'm assuming this is in windowed mode? if so try fullscreen do you get the same results?

4: Have you try'd cranking up your poly count to see if it actually increases the time it takes?

Share this post


Link to post
Share on other sites
Quote:
Original post by freeworld
1: is this debug or release?

This is in debug-mode as well as release-mode

Quote:
Original post by freeworld
2: How are you profiling? With your own code if so can you describe it, if another program which one?

I'm using a small pice of code I state before and after the Present(...) call:

void Start() {
QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&m_nMarker));
}

void Stop() {
__int64 nTmp;
QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&nTmp));
m_nSecondsElapsed += static_cast<double>(nTmp - m_nMarker) * m_nSecondsPerTick;
++m_nLoopCount;
}

void Update() {
m_nTiming = static_cast<float>(m_nSecondsElapsed / m_nLoopCount);
m_nLoopCount = 0;
m_nSecondsElapsed = 0;
}


Quote:
Original post by freeworld
3: I'm assuming this is in windowed mode? if so try fullscreen do you get the same results?

Switching windowed to fullscreen seems to help a little bit, but by far not enough

Quote:
Original post by freeworld
4: Have you try'd cranking up your poly count to see if it actually increases the time it takes?

Adding more meshes increases the time, removing meshes decreases it

Share this post


Link to post
Share on other sites
how long is a total frame taking up? what's your FPS? I'll iterate again you should dramatically increase the amount you are drawing. You find it hard to see differences in speed if your not doing much in the first place.

Share this post


Link to post
Share on other sites
I use a different timer for the total FPS than I use for the profiler-timing.
(the FPS-counter I use comes from the book: 3D Game-Programming, a shader approach)

When I have a scene with 1 Geometric object (44 vertices) and 200 planes (consisting of 4 vertices, 2 triangles) my framerate gives me 4.
Just to be sure, I had a look at the FPS monitoring in Fraps, which said about the same thing.

Share this post


Link to post
Share on other sites
I'm not exactly sure that I'm using code analyst the right way, but here is my shot:

95.8% of the samples where taken during the Symbol 'QueryOglResource' which is part of nvd3dum.dll (some dll from nvidia)


I also tested the debug-build on the desktop of a friend, ... same result

Share this post


Link to post
Share on other sites
1: open codeanalyst and choose express profile from the wizard menu that pops up.

2: In the launch option browse and find your exe you want to profile.

3: make sure the working directory option is the same directory your exe is in.

4: When your app starts up wait till the green progress bar in the lower right finishes, then close your app.

5: in the window that pops up, find your app and double click on it.

6: This should be a huge list of all the functions that get called from your app, and a percentage of the total time they take up.

7: Look at the ones that take up the most time, first looking at your function and not third party functions like std::_vector... and so forth.

8: Find the chucks of code that take up the most time, and feel free to post one or two.

[Edited by - freeworld on December 3, 2009 7:12:34 PM]

Share this post


Link to post
Share on other sites
It's setting the world transform causing your problem. I ran into this a few months back in my game engine. Unfortunately the only way around the problem is reduce the calls to the set transform command.

Options...
Switch to instancing, this will go a long way towards reducing the need to call the set transform.
If you are doing particles, switch to DX sprites, they are batched by DX and is much faster than any particle system you can most likely write unless you are doing GPU sprite handling...

Using instancing (ex. asteroids) I can render approx 300k meshes per second on a dual core AMD 4200+ and an ATI 1650.

Share this post


Link to post
Share on other sites
Spending 200ms in present suggests strongly to me that you're GPU bound - that is your giving the video card too much work to do. You're not using the REF device are you?

Given that you're clearly not geometry bound, I'd suspect either fillrate or the pixel shader. How much overdraw is there?

Anyway, the first thing I'd suggest is getting hold of PerfHud. The documentation for that will guide you through tracking down your bottleneck.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!