Inline assembly

This topic is 4334 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I need to use a shift function with 64 bit integers, and I found that Visual C++ 6.0 make a call to __allshl or __aullshr function, I don't want a function call because I want it as fast as possible, I try to make an inline function :
__inline __int64 __fastcall int64shr(__int64 x, const int c)
{
__asm mov		ecx, c
__asm mov		eax, dword ptr [x]
__asm mov		edx, dword ptr [x + 4]
__asm shrd		eax, edx, cl
__asm shl		edx, cl
}


but the compiler may not use eax and edx for the 64 bit value when he has to call the shift function and make something like this :
mov	ecx, DWORD PTR [esi+48]
mov	edx, DWORD PTR [esi+52]
mov	DWORD PTR $T62616[ebp], ecx mov DWORD PTR$T62616[ebp+4], edx
mov	ecx, 8
mov	eax, DWORD PTR $T62616[ebp] mov edx, DWORD PTR$T62616[ebp+4]
shld	edx, eax, cl
shl	eax, cl


What is the fastest way to make this functions ? Is it possible to force the compiler using only registers ? Thanks.

Share on other sites
1) With the __fastcall calling convention, only arguments which are DWORD sized or smaller get passed in the ECX and EDX registers, all others get passed on the stack.

2) Take a look at __declspec(naked). That tells the compiler to not generate any prologue/epilogue code and to use your asm as-is.

3) For AMD64 and Itanium build targets, __fastcall is ignored.

4) Suggestions:
a. pass the int64 by reference from a point where x exists in memory.

b. re-write part of the caller as inline asm so that you can guarantee x is always passed the way you want, probably as a higher level function that performs a larger job (inline asm for tiny leaf functions rarely gives as much of a performance gain as it does for functions that do a sizable chunk of work).

Share on other sites
inline __int64 shiftleft64(__int64 n, unsigned N) //Note use of 'inline'!{   return n << N; //By putting this all on one line, we remove an unneccessary temporary variable} //See how short it is! This'll be inlined for sure

Is your shift really a bottleneck? If not, then move on. Algorithmic optimizations are sufficient in most cases, and unless you're doing nothing but shifting 64-bit integers around all day, this will be one of them.

CM

Share on other sites
__asm mov ecx, c
__asm mov eax, dword ptr [x]
__asm mov edx, dword ptr [x + 4]
__asm shrd eax, edx, cl
__asm shl edx, cl

Instead of writing out all of those "__asm" statments, can't you just use curly braces? Imo it helps readability.

_asm
{
mov exc, c
mov eax, dword ptr [x]
mov edx, dword ptr [x + 4]
shrd eax, edx, cl
shl edx, cl
}

Share on other sites

Using (x << n) in an inline function does not solve the problem,
because the compiler call the __allshl function.

I think the shift is not really a bottleneck, I'm working on a chess program, using bitboards, and need sometimes to shift 64 bits integer,
I'm sure there are other things to optimize first, but I'm surprised that the compiler make a call to shift a 64 bit integer.

Share on other sites
Quote:
 Original post by YogSothothUsing (x << n) in an inline function does not solve the problem,because the compiler call the __allshl function.

The compiler is smarter than you. Do you honestly think the engineers at MS don't know how to shift a 64-bit integer without calling some complicated function?

Until you have reason to believe otherwise, just accept that the optimizer is extremely good at what it does. Your shift function can be faster than greased lightning, but that won't make a broken program work. And chess programs have plenty of legitimate ways to break without reinventing the wheel like this.

CM