D3DX Performance: Static Link vs. Dynamic Link

Started by
8 comments, last by XeonXT 13 years, 7 months ago
In my attempts to reduce the executable size of my program, I made a startling discovery: dynamic linking of the d3dx library makes my program way, way slower than static linking.

Although my test program is simple, the framerate difference was astounding: ~1400 fps with statically-linked d3dx9.lib, ~400 fps with dynamically-linked d3dx9.lib. Through my (limited) profiling tools, I have determined that all draw calls are taking longer under the dynamically-linked version, and that appears to be what is causing the discrepancy.

Needless to say, this raises lots of questions for me:
1) Is static linkage faster in general? I was under the impression that there was no speed difference.
2) Why would older versions of the d3d libraries (December 2004/the last static-d3dx release) be outperforming newer ones (June 2010)?
3) Am I missing any real features by using the old, static libraries? I haven't noticed anything missing yet (I am only using DX 9).

Insight on the issue would be appreciated!
Advertisement
How much batching are you doing? The fewer draw calls the better, if you minimise the draw calls then it shouldn't make much of a difference. Are you compiling or running in some kind of debug mode?

Beware artificial benchmarks. What is your simple program doing? Measuring performance in FPS is often meaningless.

1400 FPS = 0.714285714 ms / frame
400 FPS = 2.5 ms / frame

Either way you are still miles away from the "real" deadline of 16.6 milliseconds per frame to achieve 60 FPS, which is the only thing that matters in reality.

Quote:
1) Is static linkage faster in general? I was under the impression that there was no speed difference.

Yes. Static linking means that the function call is direct. With dynamic linking, the function call has an extra level of indirection. The cost of this should hopefully be amortised by the size of the operation you perform.
It dependes on how you're doing the dynamic linking. But aside from that, why are you worried about executable size? Have you been taken in by the anti-bloatware weenies? D3DX even linked statically will typically only add about 1 MB or so to the size, which is absolutely nothing in terms of todays storage capacities (0.00019% of a 512 GB disk). Memory won't be paged in until it's needed, and will be paged out when something else needs it, so the whole concept of "a large exe == bloatware" is just a myth and nothing more.

The only answer I can give to why you have a performance drop is that you may be doing something wrong. I've linked to D3DX both statically and dynamically in the past, and have never noticed any speed difference, but I manage my own dynamic linking (by doing a bunch of GetProcAddress calls to the functions I need during startup).

As for features of the new dynamic libraries, firstly they use the D3D10 shader compiler by default (yes, even with D3D9, and yes, even on XP) and secondly they will contain bugfixes and other code improvements. More recent versions also support 64-bit so in time you're going to have no choice but to use them. The downside is that your users will need to update their DirectX to use them, of course.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Quote:Original post by rip-off
How much batching are you doing? The fewer draw calls the better, if you minimise the draw calls then it shouldn't make much of a difference. Are you compiling or running in some kind of debug mode?

I'm drawing a single, high-poly model and some text for profiling. Very few calls. Aha! Yes, I was running in debug mode. Switching it off made the dynamic link much, much faster. Good call. However, there's still a serious differential - now only about 300 fps.

Quote:Original post by rip-off
Beware artificial benchmarks. What is your simple program doing? Measuring performance in FPS is often meaningless.

1400 FPS = 0.714285714 ms / frame
400 FPS = 2.5 ms / frame

Either way you are still miles away from the "real" deadline of 16.6 milliseconds per frame to achieve 60 FPS, which is the only thing that matters in reality.

I disagree that such performance benchmarking is meaningless. Yes, I also have a ms/frame display. This is a very, very simple program drawing a single model. It's startling that I'm already 2.5ms down in the dynamic link version with just this simple thing. That's why I'm worried. By the time real stuff starts happening in the program, that 16ms is going to be gone in no time if the dynamic link hogs my resources like that. 1.8ms is a pretty huge differential if you ask me.

Quote:Original post by rip-off
Yes. Static linking means that the function call is direct. With dynamic linking, the function call has an extra level of indirection. The cost of this should hopefully be amortised by the size of the operation you perform.

Ah, that's interesting. Thanks.

Quote:Original post by mhagain
It dependes on how you're doing the dynamic linking. But aside from that, why are you worried about executable size? Have you been taken in by the anti-bloatware weenies? D3DX even linked statically will typically only add about 1 MB or so to the size, which is absolutely nothing in terms of todays storage capacities (0.00019% of a 512 GB disk). Memory won't be paged in until it's needed, and will be paged out when something else needs it, so the whole concept of "a large exe == bloatware" is just a myth and nothing more.

That's really pretty beside the point, isn't it? I never said anything about being scared of bloatware and I'm more than aware of modern drive capacities. You've heard of 64kb demos? That's where I'm going with this. Size still matters in some fields. Unfortunately, the static link completely blows that, so I've really got not choice but to dynamic link if I want to hit the size target. I'm still in discomfort about the performance, though.

Quote:Original post by mhagain
The only answer I can give to why you have a performance drop is that you may be doing something wrong. I've linked to D3DX both statically and dynamically in the past, and have never noticed any speed difference, but I manage my own dynamic linking (by doing a bunch of GetProcAddress calls to the functions I need during startup).

Right, so any ideas as to what might have gone wrong? On a side note - that's interesting, I've been wanting to switch to explicit linking for a while now, but it seems like a royal pain to have to GetProcAd for every single function you want to access...am I wrong? Is it manageable/worthwhile?

Quote:Original post by mhagain
As for features of the new dynamic libraries, firstly they use the D3D10 shader compiler by default (yes, even with D3D9, and yes, even on XP) and secondly they will contain bugfixes and other code improvements. More recent versions also support 64-bit so in time you're going to have no choice but to use them. The downside is that your users will need to update their DirectX to use them, of course.

Indeed, I know that in time I'll have to drop the old December 2004 SDK. Still, for now, it's a lot more convenient. And I'm precompiling shaders with fxc and including them as resources so I should still be getting the D3D10 compiler. Also, there's a 64-bit version of the libraries in the Dec 2004 SDK...does this not mean that it supports x64?

Thanks to both for the responses!
Quote:
Good call. However, there's still a serious differential - now only about 300 fps.

FPS is a nonlinear measure and shouldn't be used for any serious performance metrics. Additionally using FPS differentials is even more misleading.

If your static version is 1400 FPS you're getting 0.0007 seconds per frame. Reducing that by 300 FPS (to 1100 FPS) means 0.0009 seconds per frame. 0.0002 seconds is not a "serious differential." It is, in fact, edging right up towards the threshold of reliably for timing anyhow.

I'm curious what you're actually doing here anyway, you should not be statically linking to D3DX (you can't). I think you might just be statically linking to the import library, which is really just doing the GetProcAddress() stuff for you automatically. You should provide some examples of how you are "statically" and "dynamically" linking D3DX (and resolving function pointers, et cetera).

Quote:
Unfortunately, the static link completely blows that, so I've really got not choice but to dynamic link if I want to hit the size target. I'm still in discomfort about the performance, though.

Regardless, you realize you almost certainly will need to create an installer and bundle the redist for the D3DX libraries, correct? It is illegal to redistribute the libraries yourself, so you either have to assume the correct version exists on the end-users machine or ensure it does -- the net size cost is going to be the same, it's just a matter of how you pass that cost on to the user.

Quote:Original post by jpetrie
FPS is a nonlinear measure and shouldn't be used for any serious performance metrics. Additionally using FPS differentials is even more misleading.

If your static version is 1400 FPS you're getting 0.0007 seconds per frame. Reducing that by 300 FPS (to 1100 FPS) means 0.0009 seconds per frame. 0.0002 seconds is not a "serious differential." It is, in fact, edging right up towards the threshold of reliably for timing anyhow.

I understand how FPS works, as I said, I am also measuring in mspf. Considering the contents of the demo, it is a serious differential. And the timing is plenty reliable because I am accumulating over many frames and averaging, not taking single-frame snapshots. So the precision of the measurement is quite good.

Quote:Original post by jpetrie
I'm curious what you're actually doing here anyway

Comparing the performance of an application built with static linkage to D3DX versus that of one built with dynamic linkage. If you mean what kind of application, I'm working on building a "demo" type program in the style of 64kb demos (the demoscene). Though I will not hit the target 64kb with a static link, the idea is still to push as much content as possible in a small executable.

Quote:Original post by jpetrieYou should not be statically linking to D3DX (you can't). I think you might just be statically linking to the import library, which is really just doing the GetProcAddress() stuff for you automatically. You should provide some examples of how you are "statically" and "dynamically" linking D3DX (and resolving function pointers, et cetera).

Yes, you can, and there's no reason why you shouldn't unless you absolutely need the functionality of newer libraries. The D3DX library was a static library before 2005. As I mentioned earlier, I am linking against the December 2004 library for the static-link version. Try it for yourself - download the December 2004 SDK and link against it with one of your DX 9.0c programs (the only thing that you might need to change is any references to "dxerr.lib", which is called "dxerr9.lib" in the older SDKs).

As for resolution of function pointers and such, I have no such code because I am implicitly linking using #pragma comment, not explicitly linking. So all the functions are resolved automatically.

Quote:Original post by jpetrie
Regardless, you realize you almost certainly will need to create an installer and bundle the redist for the D3DX libraries, correct? It is illegal to redistribute the libraries yourself, so you either have to assume the correct version exists on the end-users machine or ensure it does -- the net size cost is going to be the same, it's just a matter of how you pass that cost on to the user.

No, this is meant to be a stand-alone, works-right-out-of-the-box executable. I needn't distribute any libraries and I needn't make any assumptions given that I am statically linking, so D3DX is contained.

As for the net size cost, I'm fairly certain that I'm saving a lot. I've managed to get the executable down to about 400kb (including the static link to d3dx). IIRC, isn't the redistributable a good deal more than this? I can't remember but I thought it was pretty large. Which is bad for a stand-alone demo. Size is of the essence here.

I hope I've cleared up what I'm doing a little bit...if I haven't, please just ask.
For any app where you actually care about performance, I'd imagine you'd have to really try very hard to construct a situation where you're bottlenecked by DLL function call overhead. I mean really hard. The cost of even a single D3D draw call will postively dwarf that small overhead.

Besides it's not even like we're talking about the core API here...we're talking about D3DX, the completely-optional utility library. If you're really worried about its footprint or speed, there's nothing in there that you couldn't replace with your own code.
Quote:Original post by MJP
For any app where you actually care about performance, I'd imagine you'd have to really try very hard to construct a situation where you're bottlenecked by DLL function call overhead. I mean really hard. The cost of even a single D3D draw call will postively dwarf that small overhead.

That's what I figured, so I'm really still stumped by this performance differential. I mean, a draw call doesn't even involve the D3DX library, so I'm still not sure what's going on here.

Quote:Original post by MJP
If you're really worried about its footprint or speed, there's nothing in there that you couldn't replace with your own code.

I'm working towards that, actually. I'd really like to replace it, especially since I want my library to be OpenGL-compatible as well. Do you know of any resources on emulating D3DX functionality? There are certain things that I just wouldn't know how to do without it...for example creating textures, compiling effects, etc. Are there any resources that can aid me in this area? I certainly don't need to rewrite the D3DX library, but if there are any tutorials on how to ween oneself of D3DX or anything of that nature, that'd be great.
For loading shaders your option is to use fxc.exe, and either pre-compiler or invoke the process from your app. But it looks like you've figured that out already, based on your other thread. :P

Textures aren't too difficult to create and fill with information, the only tricky part is dealing with various image formats. If you can limit your textures to a single easy-to-parse format, then you'll probably make your life a lot easier. DDS is a pretty simple format that pretty much mirrors the runtime layout of a texture, which makes it pretty handy. There's even a command line utility in the SDK (texconv.exe) that can convert image files and generate mipmaps, and there's also a DDSWithoutD3DX sample that shows you how to load the files with D3DX (it's in the D3D10 section, but it has a D3D9 loader).
Quote:Original post by MJP
For loading shaders your option is to use fxc.exe, and either pre-compiler or invoke the process from your app. But it looks like you've figured that out already, based on your other thread. :P

Textures aren't too difficult to create and fill with information, the only tricky part is dealing with various image formats. If you can limit your textures to a single easy-to-parse format, then you'll probably make your life a lot easier. DDS is a pretty simple format that pretty much mirrors the runtime layout of a texture, which makes it pretty handy. There's even a command line utility in the SDK (texconv.exe) that can convert image files and generate mipmaps, and there's also a DDSWithoutD3DX sample that shows you how to load the files with D3DX (it's in the D3D10 section, but it has a D3D9 loader).

Excellent, thanks very much. Yes, I figured out the shaders, and that DDSWithoutD3DX is helping enormously for understanding texture loading. Hadn't seen that little nugget of a sample earlier.

Thanks!

This topic is closed to new replies.

Advertisement