SSE Optimization and DLL

Started by
4 comments, last by AnirudhSShastry 15 years, 10 months ago
Hi folks, I've implemented an SSE optimized Math Library as a separate DLL and I've found something quite strange. If I use the Math Library as a DLL instead of directly including the code and compiling it with my application, all function calls seem to have a large overhead associated with them. For example, 1,000,000 4D vector dot products cost 5 milliseconds to execute by directly including the code in my app where as calling the function from the DLL costs 40 milliseconds for the same thing. The overhead seems a little ridiculous. The thing is, if I link to the D3DX libs and perform the same thing using D3DXVECTOR4, it still takes just 5 milliseconds for those 1,000,000 dot products. Strange as to why my DLL has such a huge overhead. I've not linked my math library to any external libraries and the only headers I've used are the standard libraries. I'm using Visual Studio 2008, BTW. Any ideas? Thanks.
Advertisement
Maybe they are using __fastcall or perhaps __stdcall or __cdecl
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
There are probably a few factors at work, but you'll always have more overhead calling DLL functions than statically linked functions. This is due to the fact that every call into a DLL has an extra lookup into the import address table to retrieve the function pointer. This is obviously not the case with statically linked functions, because the address of the function is known at link time.
Quote:Original post by V-man
Maybe they are using __fastcall or perhaps __stdcall or __cdecl


I've tried __fastcall, doesn't make much of a difference. And __stdcall and __cdecl have worse overheads, they jump to 90 milliseconds instead of 40! Thanks though for your input V-man :)

Quote:Original post by strtok
There are probably a few factors at work, but you'll always have more overhead calling DLL functions than statically linked functions.


Yes, that's true, but such a large overhead seems unnatural. The D3DX libraries are dynamically linked as well, but don't seem to have such a significant overhead. And surely when executing 1 million operations, the DLL overhead should be overwhelmed right? Thanks for your input strtok :)
Have you tried timing a C++ implementation in a DLL? Overhead could be related to SSE.
Actually it seems more likely to me, given that your static link timing was identical to dx9's dynamic that perhaps dx just implements the dot product, being a rather simple operation, in the header file... though I don't use dx so I could be way off here.

[Edited by - AndyPandyV2 on June 25, 2008 12:15:38 AM]
The compiler was optimizing away the function calls from the loop! That's why it seemed so fast. Adding another line which actually uses the result from the function call gives the same results as dynamically linking my library! Thank you all for your input.

This topic is closed to new replies.

Advertisement