Is D3D10 slower than D3D9???

Started by
8 comments, last by Demirug 14 years, 8 months ago
Hi, I've spent the last two months writing a small framework for fun at home in my spare time. Nothing special, classic C++/DX9. Since the framework is supposed to be a testing lab for things I have to implement at work, I decided yesterday to make the big step and add a DX10 rendering subsystem to my framework. I've never seen any DX10 code snippet until yesterday morning, so probably I'm missing something. The test application loop is simple: - set a render target (one rgba8888 RT with a dephstencil buffer) - clear the render target and the depthstencil buffer - render a textured rectangle from my camera POV using an indexed draw call - set the original render target/depthstencil buffer - clear the render target and the depthstencil buffer - set the previous render target texture as texture0 input - render the same textured rectangle from the same camera using an indexed draw call - unset the texture0 input (this prevents the driver to raise warnings when setting the render target at the beginning of the next frame) - present on screen In DX9 the application runs at 520fps, in DX10 runs at 280! The situation is the same in fullscreen mode: 800fps vs 400fps. Things I tried to speed up the DX10 version: - remove dynamic IBs and VBs ad try immutable and default ones. - use DXGI_FORMAT_D24_UNORM_S8_UINT instead of DXGI_FORMAT_D32_FLOAT - change the refresh rate - force the device singlethreaded - remove SetRasterizerState from my .fx file - playing with VS input, changing position from float4 to float3 and let the VS add the w component Whatever I change, speed is always 280fps. There are no driver errors/warnings/infos being displayed to MSVC output tab and the only difference I can see is I still use SetDepthStencilState in my .fx file, while DX9 rendering subsystem sets the correct ztest parameters just once at startup. Application isn't multisampled, the backbuffer format is the same and also the texture filtering in my .fx files is the same. A few infos about the classes: - the shaders internally save the technique and the technique desc, so that I don't have to ask for them everytime I set a shader parameter - IBs and VBs should be as fast as possible, being immutable/default - input layout is correctly created only once, and is just made up of a position and a texcoord (float3 and float2) I'm testing the application on a notebook with an 8600m, so it's a first-gen DX10 device but since I'm not using Geometry Shaders I expect the same application to run at the same speed regardless of the DX version! I don't know where to look, I wonder if somebody did simple speed tests comparing DX10 and DX9 or could give me an hint about common speed issues newbies like me encounter when writing DX10 apps. Thank you!
Advertisement
Quote:Original post by undeadIn DX9 the application runs at 520fps, in DX10 runs at 280!
The situation is the same in fullscreen mode: 800fps vs 400fps.


FPS is not a meaningful measure of performance.

Using the numbers you posted.

There is a difference of 0.0016s, ie. 1.6ms in the time for each frame in windowed mode and a difference of 0.00125s, ie. 1.25ms in fullscreen mode.

These numbers are so small they might as well not exist.

As for what is causing this difference, I can't really comment. I wouldn't worry about it's actually making a tangible difference though.

Simple answers:

1) Is D3D10 inherently slower than D3D9? No. In fact, it has the potential to be faster.

2) Is your driver's implementation of 10 slower than 9? Good possibility. Several years after the fact, the drivers are still improving. Unfortunately there's just not much D3D10 software out there (let's not talk about whose fault that is), so there's not much push to focus on it performance wise.

3) Is the way you are using D3D10 optimal? Almost certainly not. Especially if you're simply wrapping it in a framework that was built around D3D9 concepts. D3D10 must be used differently if you want to get peak performance out of it, so renderers built to support both will rarely do so optimally for both cases.
// The user formerly known as Tojiro67445, formerly known as Toji [smile]
Thank you for your replies.

I've found the problem, my render target format was 128-bit RGBA, while the reference DX9 application uses a standard R8G8B8A8 RT.

Now the DX10 application is running at full speed, and it's as fast as the DX9 one: 570fps and 1000+fps.

Toji:

I'm a total newbie at DX10, so you're right about my rendering subsystem: it is closer to the way DX9 works. I use it to test ideas and rewriting part of the subsystem to make it more DX10-friendly is on my todo list. :)

Just one word of warning as I am seeing this mistake in many beginners Direct3D 10 shader code.

While Direct3D 9 removes unused shader constants Direct3D 10 will only remove complete constant buffers when not a single element is used. As all constants that are not assigned to a buffer are stored in a global one this could become quite large.

The problem with this is that Direct3D 10 requires updating a constant buffer as whole. Therefore even if you change only a single value between two draw calls the whole buffer need to be send again. This can become very easily a large bottleneck.

Using the effect framework hides the detail from you and it could become very complicate to find this performances killer if you didn’t know.
You might as well get your DX11 renderer added as well, since it can use the hardware from DX9 and DX10, as well as the DX11 stuff when it comes out. The API is similar, but significantly more flexible in what you can do with shaders. I've had more fun working with it in my engine than I have had since I first started out with DX9 (which has been quite a while now!).
Quote:Original post by Demirug
Just one word of warning as I am seeing this mistake in many beginners Direct3D 10 shader code.

While Direct3D 9 removes unused shader constants Direct3D 10 will only remove complete constant buffers when not a single element is used. As all constants that are not assigned to a buffer are stored in a global one this could become quite large.

The problem with this is that Direct3D 10 requires updating a constant buffer as whole. Therefore even if you change only a single value between two draw calls the whole buffer need to be send again. This can become very easily a large bottleneck.

Using the effect framework hides the detail from you and it could become very complicate to find this performances killer if you didn’t know.

Thank you, I completely missed the constant buffer bottleneck problem (I just set a couple of global values so this isn't yet a problem in my test app).

Do you think using multiple constant buffers so that Direct3D 10 can remove sets of data not needed for a specific technique is the way to go? What about alternative solutions?

Quote:Original post by Jason Z
You might as well get your DX11 renderer added as well, since it can use the hardware from DX9 and DX10, as well as the DX11 stuff when it comes out. The API is similar, but significantly more flexible in what you can do with shaders. I've had more fun working with it in my engine than I have had since I first started out with DX9 (which has been quite a while now!).

Currently I'm using August 2008 SDK, which IIRC is the last before DX11 CTP.
I've read a lot of amazing things in your journal (in particular about shader reflection), I'm going to download a newer SDK and give a try to DX11. I suppose you're going to get more comments from me in your journal soon. :)

As for DX11 stuff, I'm interested in compute shaders and multithreaded rendering. AFAIK DX10-class HW should already be compatible with them. Is it correct?
Quote:Original post by undead
Thank you, I completely missed the constant buffer bottleneck problem (I just set a couple of global values so this isn't yet a problem in my test app).

Do you think using multiple constant buffers so that Direct3D 10 can remove sets of data not needed for a specific technique is the way to go? What about alternative solutions?


Well using multiple constant buffers helps but you should take care about update intervals. A common rule is that you should not have more than one buffer that needs to be updated for every object. It is recommend to have another buffer that contains values that are never (or only during engine resets) change. A typical value there is the screen resolution. A third buffer stores values that are changed on frame level. You might add additional buffers for every render target that contains information about them. The tricky part here is to make sure that the effect framework shares these global buffers and not create one for every effect.

If you want to remove unused variables based on the technique you may be forced to build your own effect framework (many engines do this). This way you can use the preprocessor to exclude unused constants. There is another way which removes the unused constants from the bytecode of the compiled shader. But it requires black magic and its absolute not recommend.

Quote:Original post by undeadCurrently I'm using August 2008 SDK, which IIRC is the last before DX11 CTP.
I've read a lot of amazing things in your journal (in particular about shader reflection), I'm going to download a newer SDK and give a try to DX11. I suppose you're going to get more comments from me in your journal soon. :)

As for DX11 stuff, I'm interested in compute shaders and multithreaded rendering. AFAIK DX10-class HW should already be compatible with them. Is it correct?


Some level 10 hardware may support limited compute shaders (cs_4_0) but you need a new driver for this.

The multithreading stuff is more interesting. The runtime contains emulation when you don’t have a driver that supports it natively. The problem with this is that you could end up slower when using it in the wrong way. Another problem is that running good multithreading rendering code on a single core system would be slower than doing it with a single thread. I am currently playing around with a usage pattern that should avoid these problems. But without drivers that support multithreading rendering the way it should be I am not sure if the pattern really works.

Quote:Original post by Demirug
Well using multiple constant buffers helps but you should take care about update intervals. A common rule is that you should not have more than one buffer that needs to be updated for every object. It is recommend to have another buffer that contains values that are never (or only during engine resets) change. A typical value there is the screen resolution. A third buffer stores values that are changed on frame level. You might add additional buffers for every render target that contains information about them. The tricky part here is to make sure that the effect framework shares these global buffers and not create one for every effect.

At work I have just a few shaders which are compiled as they appear. Many of them are generated on the fly, others are sent mutiple times through a prepocess pass to generate N versions of the same base shader. It's not going to be easy to use only a bunch of constant buffers in that scenario.

I'm reading the documentation, the solution for sharing buffers seems to be using an effect pool for the main .fx file and compile the others with D3D10_COMPILE_CHILD_EFFECT. Am I right?

Edit: is the "Shared" keyword of some use in this case?
Yes you need to mark the constant buffers that should be reused in multiple effects as shared. There is an effect pool sample in the SDK that demonstrates this.

This topic is closed to new replies.

Advertisement