DLL performance

Started by
20 comments, last by chollida1 18 years, 4 months ago
> I had heard that there is a performance penalty
> when i'm using dynamic linked libraries(DLL)....

There is no difference in performance whatsoever between EXE and DLL when it comes to running the actual code, that is once the code is loaded in memory. The performance difference comes at DLL loading time. Here are a few tips & tricks about using DLLs that can enhance performance.

a) You can delay-load a DLL until it is needed by the application. For example, by putting the victory dance code in a separate DLL, you can accelerate things by putting off use of that code for later (and only if the gamer wins). Check the delay-load linker switches for this feature.

b) Use the /BASE switch of the linker to specify where the DLL will land in memory at load-up. Why? Using the default linker settings, all your compiled DLLs will end up in the same address space. Smart Win32 OS will do an address fixup pass to move the DLL around to avoid collisions, which will create a new code copy instead of solely relying on the memorymap mechanism. So you end up memory-mapping your DLL, modify each and every 4K blocks, and swap it out again to disk... not efficient use of swapping.

c) Use explicit linkage (LoadLibrary()/GetProcAddress()) when possible. This allows your application to make a decision about specific code section. Your user has turned off the music completely: why load the music DLL then? The gamer plays solo: don't load the network code.

d) Coalesce related code segments into 4K blocks. Why? The memory-map unit has a 4K granularity (8K on MIPS architectures and I64 like Vista). By putting all your setup UI dialog box code in a single segment allows the OS to page this segment out when it's no longer needed and this segment's LRU will eventually get marked as recyclable by the OS. Check the 'code_seg' pragmas of the compiler. If you scatter the code segments all over the place, then you stand a chance of *maximizing* your app's memory footprint (not a smart thing) and thus increase its sensitivity to swap. By coalescing related code segments you are *minimizing* your memory footprint.

You can get more info about how EXE and DLLs are loaded into memory here

Hope this helps.

-cb
Advertisement
Quote:Original post by cbenoi1
There is no difference in performance whatsoever between EXE and DLL when it comes to running the actual code, that is once the code is loaded in memory.

Not really true. The DLL loading mechanism means that every call to a function in a DLL is effectively through a funciton pointer, which is slightly slower than a standard function call, and also tends to have poorer cache consistency than standard function calls.
> Not really true.

It's a question of granularity. If you put a small piece of code in a DLL and call it within a tight loop, then yes, you will notice a small difference. Otherwise, with larger granularity calls the small overhead will be absorbed and become unnoticeable.

-cb
It doesn't even need to be a tight loops to cause performance decrease as the indirections create a definate cache impact, and the way that segments are allocated in modules means that it tends to create a disproportionate number of reads being done from the same set of cache lines. This can be very significant if you have a two or four way associative L1 cache. This actually becomes more significant if the DLL calls are not in a tight loop as it increases the size of the effective memory working set.

Either way it's still different from your claim of "no difference in performance whatsoever".
Quote:Original post by SiCrane
It doesn't even need to be a tight loops to cause performance decrease as the indirections create a definate cache impact, and the way that segments are allocated in modules means that it tends to create a disproportionate number of reads being done from the same set of cache lines. This can be very significant if you have a two or four way associative L1 cache. This actually becomes more significant if the DLL calls are not in a tight loop as it increases the size of the effective memory working set.

Either way it's still different from your claim of "no difference in performance whatsoever".
As much as I think there is probably some merrit in what you're saying, I really hope it wont needlessly push the OP into shoving his DLL code into a lib file or even the exe. I'm sure you don't want that either.

I very much doubt that the rest of his code is so efficient that a DLL boundary is the bottleneck. The OP has not profiled, he has merely heard rumours and some of you might be confirming his worst fears of those rumours. As always, get it working first, then if it's too slow, PROFILE, then optimise.[smile]

I think everyone posting in this thread should read hplus0603's post too.[cool]
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
I just get annoyed when people post things like "no difference in performance whatsoever" when there is a difference. As I said, it's slightly slower, not OMG IT'LL RUIN EVERYTHING slower.
The people who say that having DLLs does not impact runtime performance are WRONG.

Some disadvantages using a DLL (apart from startup costs)

1) Compiler optimizations might not be as good compared to static linking.
2) Page Faults. This could increase and can be a big performance hit.
3) MultiThreading. If your app is multithreaded you can be affected by having many DLLS in your app. By default Windows calls the startup/shutdown routines for each DLL for every thread that is created/destroyed. Windows provides an API DisableThreadLibraryCalls to avoid calling these routines. (each DLL must call it)

That said, if used correctly (which is not hard), the advantage of using DLLs can outweigh the disadvantages.
Ok, alot of answers...thank you all...
anyway, maybe i should specify my question a little bit...
how about this..

class API CSomeClass
{
public:
// some public functions

protected:
// some protected funcions/datamembers

};

where API is the classic

#ifdef PART_OF_DLL
#define API __declspec(dllexport)
#else
#define API __declspec(dllimport)
#endif


...as you can se i didn't export any interface, which means i want load this "dynamicly" with LoadLibrary...
instead the compiler creates the LIB files, which are included in the project which use the code and DLL are applied to the program so it can run with no errors.

Now...is this costly?
i Could load them with LoadLibrary, but these parts are used often and widely, so i don't see the reason why i should load them later. There are also many functions and classes in these DLL:s which also are called often. a exampel could be the render system.

Is it better to load these as static libraries?

when working with dlls should one use alloc functions passed from the main program? or are the dll and the programms addressspace merged per dll instance?
http://www.8ung.at/basiror/theironcross.html
Quote:Original post by Basiror
when working with dlls should one use alloc functions passed from the main program? or are the dll and the programms addressspace merged per dll instance?


1) When you read a page from the DLL, the address is the location of the DLL image (though the address will be valid for your process); When you write to a page in the DLL (inside the DLL or inside the process that loaded), then that page is copied and mapped for the currently active process. This means that when you *write* to a global/static variable in a DLL, the value you write is only for the current process; it won't be shared across processes using that DLL (unless you specifically mark the pages as shared).


2) It's generally a safer practice (and cleaner from a design standpoint) to have destruction of a resource in the same place as the creation of that resource; so if some resource was created inside the DLL, ideally it should be destroyed inside the DLL.
Why is it safer? In the case of memory allocation (malloc, free, new & delete), it's *essential* that the version of the C runtime library used to allocate the memory is *identical* to the version used to free that memory.
If for example the application using your DLL uses the multithreaded debug version of the CRT and your DLL uses the single threaded release version of the CRT, you can expect very bad things when you pass a new'ed pointer from one to the delete in the other.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

This topic is closed to new replies.

Advertisement