# Native calling convention support for mingw x64

## Recommended Posts

_Vicious_    330
Hello!

We're using mingw to cross compile our binary files for 2 different architectures: lin/win X x86/x64. Now that we've discovered that native calling convention is only supported for x64 when compiled by msvc, what is required to get it working under mingw?

##### Share on other sites
WitchLord    4677
It is probably mostly about tweaking the as_config.h a bit to enable the native calling conventions with MinGW on Win64. You can start by making the configuration the same as for Linux, then run the test_features to check if it works.

If it fails, please let me know which of the tests that fail and I'll help you figure out what needs to be modified to get it working.

##### Share on other sites
_Vicious_    330
Ok, thanks!

So I played around with a little.. Here's the first problem: mingw-w64 doesn't seem to define __LP64__ and indeed, sizeof(long) is 4. as_config.h seems to enable NCC for gcc only in case when __LP64__ is present. So where should I go from this?

##### Share on other sites
WitchLord    4677
Can you obtain the list of predefines for MinGW? I think this is obtained by running the compiler for an empty file with a command line argument like -mD, or something similar. This will then print the defines.

It is necessary to determine what predefine that can be used to identify a 64bit target. There must be something instead of __LP64__ that can be used.

##### Share on other sites
_Vicious_    330
Ok, here's the list of defines with 64 in the name:

[code]#define _WIN64 1
#define __DBL_DENORM_MIN__ ((double)4.94065645841246544177e-324L)
#define __x86_64 1
#define __UINT_FAST64_MAX__ 18446744073709551615ULL
#define __DEC64_MAX_EXP__ 385
#define __UINT_LEAST64_MAX__ 18446744073709551615ULL
#define __INT64_C(c) c ## LL
#define __WIN64 1
#define __INT32_MAX__ 2147483647
#define __INT_FAST32_MAX__ 2147483647
#define __INT64_MAX__ 9223372036854775807LL
#define __INT_LEAST32_MAX__ 2147483647
#define __amd64 1
#define __INT_FAST64_TYPE__ long long int
#define __UINT64_C(c) c ## ULL
#define __DEC64_EPSILON__ 1E-15DD
#define __UINT64_MAX__ 18446744073709551615ULL
#define __SIG_ATOMIC_MAX__ 2147483647
#define __x86_64__ 1
#define __UINT_LEAST64_TYPE__ long long unsigned int
#define __LONG_MAX__ 2147483647L
#define __DEC64_MAX__ 9.999999999999999E384DD
#define __DEC64_MANT_DIG__ 16
#define __INT_LEAST64_TYPE__ long long int
#define __LDBL_MANT_DIG__ 64
#define __DEC64_MIN_EXP__ (-382)
#define __INT_FAST64_MAX__ 9223372036854775807LL
#define __UINT_FAST64_TYPE__ long long unsigned int
#define __INT_MAX__ 2147483647
#define __amd64__ 1
#define __INT64_TYPE__ long long int
#define WIN64 1
#define __INT_LEAST64_MAX__ 9223372036854775807LL
#define __DEC64_MIN__ 1E-383DD
#define __UINT64_TYPE__ long long unsigned int
#define __MINGW64__ 1
#define __FLT_DENORM_MIN__ 1.40129846432481707092e-45F
#define _INTEGRAL_MAX_BITS 64
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD
#define __WIN64__ 1
[/code]

The relevant options are -E -dM

##### Share on other sites
WitchLord    4677
The flag __MINGW64__ looks appropiate for detecting the 64bit target. __amd64 should be used to determine the cpu type.

It may be necessary to check in angelscript.h how the INT64 and QWORD types are typedefed. It should be with 'long long'.

##### Share on other sites
_Vicious_    330
Yes, the QWORD typedef poses a problem in case __LP64__ is not defined, even if it's a 64-bit arch:
[code]#ifdef __LP64__
typedef unsigned int asDWORD;
typedef unsigned long asQWORD;
typedef long asINT64;
#else
typedef unsigned long asDWORD;
#if defined(__GNUC__) || defined(__MWERKS__)
typedef unsigned long long asQWORD;
typedef long long asINT64;
#else
typedef unsigned __int64 asQWORD;
typedef __int64 asINT64;
#endif
#endif[/code]

##### Share on other sites
WitchLord    4677
It seems that it would actually be correct. Since __LP64__ isn't defined, the asQWORD type will be typedefed as 'unsigned long long' which is correct from the MinGW64 predefines. (I'm assuming __GNUC__ is predefined by MinGW64).

##### Share on other sites
_Vicious_    330
Ok, so I copied all those defines to the 'else' section of mingw set of defines, got it to compile.. The testbin suite crashes right after printing [code]AngelScript version: 2.21.0
AngelScript options: AS_64BIT_PTR AS_WIN AS_X64_GCC[/code]to console. Looks like it's going to be a bumpy ride

##### Share on other sites
WitchLord    4677
I suggest you start with the tests that have been designed specifically for testing the native calling conventions. As these tests are targetted for very specific function calls, they are usually easier to understand and debug.

These tests start at line 303 in the test_feature/source/main.cpp file. Comment out the tests above those until all the tests for the native calling conventions work.

If you post the errors you get I can help figure out the changes that needs to be made.

##### Share on other sites
_Vicious_    330
Okay, here's the output this time:
[code]AngelScript options: AS_64BIT_PTR AS_WIN AS_X64_GCC
-- TestExecute passed
-- TestCDeclReturn passed

TestExecute1Arg: testVal is not of expected value. Got 0, expected 5

Failed on line 50 in ../../source/testexecute1arg.cpp

TestExecute1Arg: testVal is not of expected value. Got 0, expected 5

Failed on line 83 in ../../source/testexecute1arg.cpp[/code]

got a couple of warnings while compiling the source code too (although I think they're irrelevant to my problem):
[quote]x86_64-w64-mingw32-g++ -I/home/porkbot/build/mingw-w64/include -ggdb -I../../../../angelscript/include -Wno-missing-field-initializers -o obj/test_getset.o -c ../../source/test_getset.cpp
../../source/test_getset.cpp: In function 'bool TestGetSet::Test()':
../../source/test_getset.cpp:1218:60: warning: invalid access to non-static data member 'TestGetSet::CNode::vector' of NULL object
../../source/test_getset.cpp:1218:60: warning: (perhaps the 'offsetof' macro was used incorrectly)
../../source/test_getset.cpp:1219:53: warning: invalid access to non-static data member 'TestGetSet::CNode::vector' of NULL object
../../source/test_getset.cpp:1219:53: warning: (perhaps the 'offsetof' macro was used incorrectly)

x86_64-w64-mingw32-g++ -I/home/porkbot/build/mingw-w64/include -ggdb -I../../../../angelscript/include -Wno-missing-field-initializers -o obj/scriptfile.o -c ../../../../add_on/scriptfile/scriptfile.cpp
../../../../add_on/scriptfile/scriptfile.cpp:201:78: warning: (perhaps the 'offsetof' macro was used incorrectly)
../../../../add_on/scriptfile/scriptfile.cpp:234:78: warning: (perhaps the 'offsetof' macro was used incorrectly)
[/quote]

##### Share on other sites
WitchLord    4677
The good thing is that at least the functions are called, so the code is not completely wrong for MinGW64.

Now it's necessary to figure out why the argument wasn't received properly in the test that failed. The code in as_callfunc_x64_gcc.cpp will put the integer argument in the RDI CPU register, but it seems that MinGW64 is following a different convention. It is quite probable that MinGW64 is using the Microsoft calling convention in order to be better compatible with Windows dlls. In this case the first argument should be passedin the RCX register instead. (You can see this in the as_callfunc_x64_msvc_asm.asm file).

Can you show me the disassembly of the cfunction? It would help in determining how the argument should be passed.

If we see that MinGW64 follows the MSVC Win64 convention, then it's necessary to change the code in AngelScript to use that instead. The good thing is that the MSVC convention is really simple compared to what GNUC uses on Linux, so it should be quite easy to do. You can probably change the code to use as_callfunc_x64_msvc.cpp without much changes, and then just adapt as_callfunc_x64_msvc_asm.asm to assembly code that MinGW understands. I suggest inlining it in the as_callfunc_x64_msvc.cpp itself (similar to as_callfunc_x64_gcc.cpp) so it isn't necessary to have a separate assembler file for MinGW64. Observe that the GetReturnedFloat() and GetReturnedDouble() from as_callfunc_x64.gcc.cpp can probably be reused, so you don't have to convert these from Microsoft assembler.

You may ignore the compiler warnings you got. GNUC doesn't like the use of the macro 'offsetof', but it works as is should so there is no problem with it.

##### Share on other sites
_Vicious_    330
[quote]Can you show me the disassembly of the cfunction? It would help in determining how the argument should be passed.
[/quote] What "cfunction" exactly do you mean?

##### Share on other sites
WitchLord    4677
Sorry I wasn't clear enough.

I meant the 'static void cfunction(int f1)' that you find in the file test_feature/source/testexecute1arg.cpp on line 14. This is the function that is being called but that doesn't receive the correct value in the argument. If the disassambly of this function shows that it reads the argument value from the RCX register, then we can pretty much assume that MinGW64 is following the MSVC calling convention.

##### Share on other sites
_Vicious_    330
If only it were that simple.. I don't have a 64bit debugger here, the machine the binary is compiled on is a 64bit linux system.

##### Share on other sites
WitchLord    4677
If I'm not mistaken, I believe there is a command line option for the gcc compiler, and therefore also mingw to output the assembler instead of the object file.

[EDIT] I found a post on google on this. Maybe you can give it a try? [url="http://stackoverflow.com/questions/137038/how-do-you-get-assembler-output-from-c-c-source-in-gcc"]How to output assembler with gcc[/url].

##### Share on other sites
_Vicious_    330
Thanks, here's the asm code:
[code].LFE821:
.lcomm _ZL7testVal,4,4
.lcomm _ZL6called,1,1
.text
.def _ZL9cfunctioni; .scl 3; .type 32; .endef
_ZL9cfunctioni:
.LFB842:
.file 2 "../../source/testexecute1arg.cpp"
.loc 2 15 0
pushq %rbp
.LCFI10:
movq %rsp, %rbp
.LCFI11:
movl %ecx, 16(%rbp)
.loc 2 16 0
movb $1, _ZL6called(%rip) .loc 2 17 0 movl 16(%rbp), %eax movl %eax, _ZL7testVal(%rip) .loc 2 18 0 leave .LCFI12: ret [/code] #### Share this post ##### Link to post ##### Share on other sites WitchLord 4677 Looks like I'm correct. It is the ECX (lower half of RCX) that holds the value that is eventually put in the testVal variable. The hard part will now be to convert the code from the CallX64 MSVC assembler code in as_callfunc_x64_msvc_asm.asm into something that MinGW64 can compile. The syntax for inline assembler in GNUC is a bit weird and I do not have much experience with it, but I believe the code in as_callfunc_x86.cpp will serve as a good comparison for how to write it. Remember that the order of the arguments to the assembler instructions are the inverse compared to the MSVC assembler style. #### Share this post ##### Link to post ##### Share on other sites _Vicious_ 330 Ok, here's my first stab at it: [url="http://e4m5.net/as_callfunc_x64_mingw.cpp"]http://e4m5.net/as_c...c_x64_mingw.cpp[/url] [quote]AngelScript version: 2.21.0 AngelScript options: AS_64BIT_PTR AS_WIN -- TestExecute passed --- Assert failed --- func: void ExecuteString() mdle: ExecuteString sect: ExecuteString line: 1 --------------------- TestReturn: cfunction didn't return properly Failed on line 82 in ../../source/test_cdecl_return.cpp Failed on line 244 in ../../source/test_cdecl_return.cpp[/quote] #### Share this post ##### Link to post ##### Share on other sites WitchLord 4677 It's a good start. You were able to successfully call a global function without corrupting the callstack. And the return of floats and doubles seems to work. There is trouble with returning boolean and integer types though. So this needs to be studied. In the inline assembler there is no need to manually implement the prolog and epilogue, i.e. the pushing and popping of all the registers. The compiler will do that for you, if you just tell it which registers are used. You do that after the third :. The following article might be of use: [url="http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#ss5.3"]GCC-Inline-Assembly-HOWTO[/url] In fact I think you got the operands for the inline assembly wrong. I'll admit that I don't fully understand them myself, but I believe that after the first : you list the output values, after the second : you list the input, and after the third you list all the registered that are changed (but are not part of the input/output). My guess is that you want something like this: [code] static asQWORD CallX64(const asQWORD *args, const asQWORD *floatArgs, int paramSize, asQWORD func) { asQWORD ret = 0; __asm__ __volatile__ ( "# Move function param to non-scratch register\n" " mov %4,%%r14 # r14 = function\n" // Copy func into r14 "# Allocate space on the stack for the arguments\n" "# Make room for at least 4 arguments even if there are less. When\n" "# the compiler does optimizations for speed it may use these for \n" "# temporary storage.\n" " mov %3,%%rdi\n" // Copy paramSize into rdi " add$32,%%rdi\n"
"# Make sure the stack pointer is 16byte aligned so the\n"
"# whole program optimizations will work properly\n"
"# TODO: optimize: Can this be optimized with fewer instructions?\n"
" mov %%rsp,%%rsi\n"
" sub %%rdi,%%rsi\n"
" and $0x8,%%rsi\n" " add %%rsi,%%rdi\n" " sub %%rdi,%%rsp\n" "# Jump straight to calling the function if no parameters\n" " cmp$0,%3 # Compare paramSize with 0\n"
" je callfunc # Jump to call funtion if (paramSize == 0)\n"
"# Move params to non-scratch registers\n"
" mov %1,%%rsi # rsi = pArgs\n" // Copy args into rsi
" mov %2,%%r11 # r11 = pFloatArgs (can be NULL)\n" // Copy floatArgs into r11
" mov %3,%%r12d # r12 = paramSize\n" // Copy paramSize into r12
"# Copy arguments from script stack to application stack\n"
"# Order is (first to last):\n"
"# rcx, rdx, r8, r9 & everything else goes on stack\n"
" movq (%%rsi),%%rcx\n"
" movq (%%rsi) +8,%%rdx\n"
" movq (%%rsi) +16,%%r8\n"
" movq (%%rsi) +24,%%r9\n"
"# Negate the 4 params from the size to be copied\n"
" sub $32,%%r12d\n" " js copyfloat # Jump if negative result\n" " jz copyfloat # Jump if zero result\n" "# Now copy all remaining params onto stack allowing space for first four\n" "# params to be flushed back to the stack if required by the callee.\n" " add$32,%%rsi # Position input pointer 4 args ahead\n"
" mov %%rsp,%%r13 # Put the stack pointer into r13\n"
" add $32,%%r13 # Leave space for first 4 args on stack\n" "copyoverflow:\n" " movq (%%rsi),%%r15 # Read param from source stack into r15\n" " movq %%r15,(%%r13) # Copy param to real stack\n" " add$8,%%r13 # Move virtual stack pointer\n"
" add $8,%%rsi # Move source stack pointer\n" " sub$8,%%r12d # Decrement remaining count\n"
" jnz copyoverflow # Continue if more params\n"
"copyfloat:\n"
"# Any floating point params?\n"
" cmp \$0,%%r11\n"
" je callfunc\n"
" movlpd (%%r11),%%xmm0\n"
" movlpd (%%r11) +8,%%xmm1\n"
" movlpd (%%r11) +16,%%xmm2\n"
" movlpd (%%r11) +24,%%xmm3\n"
"callfunc:\n"
"# Call function\n"
" call *%%r14\n"
" movq %%rax,%0\n" // Copy the returned value into the ret variable
: "=r" (ret)
: "r" (args), "r" (floatArgs), "r" (paramSize), "r" (func)
: "r14", "rdi", "rsi", "rsp", "r11", "r12", "rcx", "rdx", "r8", "r9", "r13", "r15"
);

return ret;
}
[/code]

The changes I did was to use the operands and clobber list. This way I don't need to worry about what the MinGW optimizer will do outside the assembler code. The optimizer may very well decide to move the arguments to a different place.

Now, I'm not sure what the "=r" and "r" strings mean. This is something that still need to study in the gcc manual, but I think it means something like 'regular value'.

[EDIT] Chapter 6 in the article I linked to explains the "r" string. It just means the value will be stored in one of the general purpose registers.

##### Share on other sites
_Vicious_    330
Unfortunately, your version produces the following error:
[quote]../../source/as_callfunc_x64_mingw.cpp:111:3: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
../../source/as_callfunc_x64_mingw.cpp:111:3: error: 'asm' operand has impossible constraints
[/quote]
I'll try to fix it..

EDIT:
Ok, using -fomit-frame-pointer fixed that, however the proposed template produces offending code:
[quote]as_callfunc_x64_mingw.s: Assembler messages:
as_callfunc_x64_mingw.s:130: Error: unsupported for `mov'
[/quote]

[code]...<------>movl<-->112(%rsp), %ebp
...
# Make room for at least 4 arguments even if there are less. When
# the compiler does optimizations for speed it may use these for.
# temporary storage.
mov %ebp,%rdi[/code]

Obviously, one can't mov ebp to rdi.. I wonder what msvc does in this case..

##### Share on other sites
_Vicious_    330
Ok, using g constraint for 'paramSize' somewhat worked but the produced code crashes without passing even a single test and looks quite suboptimal: [url="http://e4m5.net/as_callfunc_x64_mingw.s"]http://e4m5.net/as_c...unc_x64_mingw.s[/url]

On a side note, just wanted to mention that I'm using the MSVC version of CallSystemFunctionNative

##### Share on other sites
WitchLord    4677
This is likely going to require a few trial-and-error iterations before we get it right.

Moving ebp to rdi should be possible, it's just a copy of the value in the register. I guess the compiler is just complaining that it needs to know the size qualifier. Try changing the instruction to 'movq' instead, or 'movd' if that doesn't work.

Yes, you should be using the MSVC version of the CallSystemFunctionNative.

Instead of omitting the frame pointer, you could try changing the operands to the following:

[code]__asm__ __volatile__ (
...
same as before
...
"# Call function\n"
" call *%%r14\n"
" lea  %0, %%rbx\n"     // Load the address of the ret variable into rbx
" movq %%rax,(%%rbx)\n" // Copy the returned value into the ret variable
: // no output
: "m" (ret), "a" (args), "b" (floatArgs), "c" (paramSize), "d" (func)
: "%r14", "%rdi", "%rsi", "%rsp", "%r11", "%r12", "%r8", "%r9", "r13", "r15"
[/code]With this, you'll use one less register as the operands, as ret is accessed directly on the stack, and you'll also specify explicitly that the other 4 arguments will be passed in the rax, rbx, rcx, and rdx registers.

[EDIT] After re-reading the article I noticed I had forgotten the % on the registers in the clobber list. I'm not sure if it actually makes a difference, but it is probably best to add it, just in case.

##### Share on other sites
_Vicious_    330
Well, how would it know how exactly the value should be copied? it's a 32bit (since paramSize is declared as int) to 64bit copying.. so it's kinda logical that it fails on that instruction. and btw, neither of the movq variations work..

##### Share on other sites
WitchLord    4677
Any success with the other suggestion, i.e. changing to explicitly name the registers to use? That would make the copy be 'mov %rcx, %rdi', which should hopefully work.