Hi!

There was a topic with similar problem that we are experiencing, http://www.gamedev.net/topic/607167-ogre-vector3-and-amd64/ .
But what is different in our case is that our vector type is in pure C and writing explicit constructors/copy-constructors dont work in our case.
And yes, the problem is returning simple struct type as value on linux x64 (GCC 4.6.1), the vector values are all funky.
Saying returning vector with values (1,2,3) will give out (3,0,1).
Vector struct looks like this;
 typedef float vec_t; typedef vec_t vec3_t[3]; typedef struct asvec3_s { vec3_t v; } asvec3_t; 

We register the type with flags (asOBJ_VALUE | asOBJ_POD | asOBJ_APP_CLASS).
What are possible options that we have to fix this?

Linux 64bit will probably return this type in the registers RAX:RDX. However, it looks like the values might be switched, so you're getting the higher elements in the lower elements. The 0 is probably just the default value as the type only has 3 floats, and not 4.

This might be a bug in as_callfunc.cpp (lines 513-514) or it may be a specific situation for this type that is not handled yet. I'll try to investigate this and see if I can figure it out.

In the meantime, you should be able to use the autowrappers to solve this problem.

Small update, I found something that temporarily fixes our problem. Any of the following methods produce the vector (1,2,3) correctly:
 struct vector { union { float xyz[3]; double _as_64bit_hack[3]; }; }; struct vector { float xyz[3]; float pad[2]; // <- has to be at least 2 elements }; struct vector { double xyz[3]; // works as double, see first method }; 

I tried different alignments with GCC align attribute, but none worked like these so I'm not sure is this alignment issue or framesize issue or what?
Naturally this isnt 100% satisfactory workaround and I hope that you find a "correct" solution for this

All of these make the structure be larger than 128bit which will make the gcc compiler return the type in memory rather than the registers.

What stumps me is that I already have a test for validating that a class similar to this works properly on linux 64bit, and it is working as it should.

Unfortunately I haven't had the time to investigate this problem in detail yet, but hopefully before the end of the week I will at least be able to understand the cause.

I made some tests with asvec3_t, and I get the same result as you do, i.e. {3,0,1} where {1,2,3} is expected.

For some reason that I have yet to determine why gcc is treating the following class:

 class Class3 { asDWORD a; asDWORD b; asDWORD c; }; 

differently than the asvec3_t type, even though both are of the same size and both contain only primitives. Both seem to be returned in the RAX and RDX registers, however the order of the registers is swapped for asvec3_t versus Class3.

Apparently I need to have another flag than asOBJ_APP_CLASS to identify that the asvec3_t type should be returned in a swapped order, however I do not know how the application developer should know when to use one or the other.

If you have any idea on what the rule might be as to when gcc does it one way or the other I would really like to hear it.

 class Class3 { asDWORD a; asDWORD b; asDWORD c; }; 

At least this allows us to keep the 12-byte size on the vector, union { float v[3]; int i[3]; };

Apparently I need to have another flag than asOBJ_APP_CLASS to identify that the asvec3_t type should be returned in a swapped order, however I do not know how the application developer should know when to use one or the other.
Yes, this is a bad idea. Even Vicious dont approve

If you have any idea on what the rule might be as to when gcc does it one way or the other I would really like to hear it.
I'm afraid I'm not that well equipped to investigate the bug on gcc/asm level.

I wonder would issues in this thread be related: http://www.gamedev.n...urning-doubles/
Specially those " bigger changes to the code that implements the native calling conventions in version 2.20.2"?
Our version is 2.21.0 btw.

No, this problem is different from the one described in the other thread. Floats and doubles are returned in the XMM register, and for some reason when compiling with optimizations the value gets lost. Probably the gcc compiler doesn't see the XMM register is used and end up removing the instructions set it.

I'm trying to find some documentation that explains why gcc behaves this way for this type so I can add proper support for it.

After reading the documentation at http://www.x86-64.or...ntation/abi.pdf I think the case might be that this structure is actually returned in XMM0 and not RAX:RDX, because it only contains float values. The fact that we get {3,0,1} is might be a coincidence.

Would it be possible for you to compile the following function:

 asvec3_t vec3_123() { asvec3_t v = {1,2,3}; return v; } 

into assembler, so I can see how the return value is loaded into the registers?

You should be able to do this by compiling with 'gcc -S test.c'. It will generate the file test.s instead of test.obj.

asvec3.c
 #define _TEST_ONLY_ #ifndef _TEST_ONLY_ #include <stdio.h> #endif typedef float vec3_t[3]; typedef struct { vec3_t v; } asvec3_t; asvec3_t asvec3_123() { asvec3_t v = { {1,2,3} }; return v; } #ifndef _TEST_ONLY_ int main() { asvec3_t v = asvec3_123(); printf( "%f, %f, %f\n", v.v[0], v.v[1], v.v[2] ); return 0; } #endif 

asvec3.s
 .file "asvec3.c" .text .globl asvec3_123 .type asvec3_123, @function asvec3_123: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $0x3f800000, %eax movl %eax, -32(%rbp) movl$0x40000000, %eax movl %eax, -28(%rbp) movl \$0x40400000, %eax movl %eax, -24(%rbp) movq -32(%rbp), %rax movq %rax, -16(%rbp) movl -24(%rbp), %eax movl %eax, -8(%rbp) movq -16(%rbp), %rdx mov -8(%rbp), %eax movq %rdx, -56(%rbp) movq -56(%rbp), %xmm0 movq %rax, -56(%rbp) movq -56(%rbp), %xmm1 popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size asvec3_123, .-asvec3_123 .ident "GCC: (GNU) 4.6.1 20110819 (prerelease)" .section .note.GNU-stack,"",@progbits 

Yes it seems to use the SSE registers for return value. I found a function attribute sseregparm for GCC, but unfortunately I can't find a complementary attribute to it.

Here is nice document that lists ABI for various compilers (including MSVC and GCC) for various platforms (including 16-bit, 32-bit and 64-bit PC): http://www.agner.org/optimize/calling_conventions.pdf (page 16)

Here is nice document that lists ABI for various compilers (including MSVC and GCC) for various platforms (including 16-bit, 32-bit and 64-bit PC): http://www.agner.org...conventions.pdf (page 16)

Thanks a lot for this article. This clearly explains how the classes are handled.

Yes it seems to use the SSE registers for return value. I found a function attribute sseregparm for GCC, but unfortunately I can't find a complementary attribute to it.

Indeed. As the ABI said, the structure is returned in the XMM0:XMM1 registers rather than the RAX:RDX. I'm not sure why the structure is not returned only in XMM0 as it would fit nicely in there, but the other article that Martins posted describes that only the lower 64bits of the XMM registers are used.

Anyway, I may be able to add a special case for this kind of structure, say by adding the flag asOBJ_APP_CLASS_ALLFLOATS, or something like that. It would then be possible to have the code in as_callfunc_x64_gcc.cpp use that flag to handle the type correctly. However, this will have to wait for a future version, unless you feel up to implementing it yourself.

For now I'll just make sure the library properly checks for the return of simple types and gives an error since there is no way code can predict the way it should be handled.

The unfortunate final verdict is that on Linux 64bit you will only be able to return this type by value in registered functions if you use the asCALL_GENERIC calling convention via function wrappers. However, don't feel too bad about this, because I believe the asCALL_GENERIC calling convention is actually faster than the native calling convention on Linux 64bit, as there are less dynamic decisions that the library has to make.

Thanks for the help in clearing out the doubts.

Regards,

Andreas

asOBJ_APP_CLASS_ALLFLOATS doesn't sound like a particularly good idea but it's definitely better than the other option: we've recently dropped support for generic calling convention from our AS classes completely. it was just too much of a burden.. doubling the amount of code for no practical reason.

Unfortunately there is no other way for AngelScript to be able to support this type on Linux 64bit. The code in as_callfunc_x64_gcc.cpp needs to know that the gcc compiler would classify the type as floats and return it in XMM0:XMM1 instead of RAX:RDX.

The information may be given in a different form, but right now I think it is the way that will give the least amount to work. With the flag it should be possible to quite easily change the code to retrieve the return value from XMM0:XMM1 similarly to how GetReturnedFloat() is done.

Since you do not want to use the generic calling convention (even with autowrappers) and for some reason cannot change the type to include a copy constructor, perhaps you would be interested in adding the support for ALLFLOATS?

Unfortunately there is no other way for AngelScript to be able to support this type on Linux 64bit. The code in as_callfunc_x64_gcc.cpp needs to know that the gcc compiler would classify the type as floats and return it in XMM0:XMM1 instead of RAX:RDX.

The information may be given in a different form, but right now I think it is the way that will give the least amount to work. With the flag it should be possible to quite easily change the code to retrieve the return value from XMM0:XMM1 similarly to how GetReturnedFloat() is done.

Since you do not want to use the generic calling convention (even with autowrappers) and for some reason cannot change the type to include a copy constructor, perhaps you would be interested in adding the support for ALLFLOATS?

Vector is in C side so no constructors there. In C++ this would be no problem since returning by value works with explicit copy constructors.
But I was able to, at least temporarily, fix the issue by unioning the 3 float components with 3 integer components

Just out of curiousity, what is ithat you're doing in pure C? Why have you chosen to use C instead of C++?

C is the language our game was originally written in. Converting to C++ would simply be a waste of time for no real gain..

You don't need to convert it. C++ is supposed to be backwards compatible with C. You just need to compile the C code as if it was C++, then you'll be able to add the copy constructor to the vec3 struct.

Anyway, I decided to try implement the ALLFLOATS myself. It should hopefully be only a few minor changes that I can do without the need for debugging.

You don't need to convert it.

That's another option but it's more of a half-assed effort I mean, is it worth the trouble of altering project and makefiles, compiler settings, etc just to add one(!) constructor to a single(!) class?

Done!

I added the flags asOBJ_APP_CLASS_ALLINTS and asOBJ_APP_CLASS_ALLFLOATS, to allow these two alternatives to work properly on Linux 64bit. The tests I included in test_feature, seems to work properly on the buildbot so hopefully it should work just fine for you as well.

You'll need revision 980 if you wish to try this.

The only change to your code is that the asvec3 type should be registered with the flags asOBJ_VALUE | asOBJ_POD | asOBJ_APP_CLASS | asOBJ_APP_CLASS_ALLFLOATS.

Don't forget to remove the union with the ints, that were added as a work around. If these are there then the flag must be asOBJ_APP_CLASS_ALLINTS instead.

Let me know how it works.

Thanks!

I will try to find the time to update our AS to the latest version on this weekend and test the new flags.

So, how did the test go?

I have a similar problem with Ogre3D, but haven't had the time to test yet.. ;)

I just now checked in another improvement for 64bit Linux and Mac OS X. (revision 1070)

Objects registered with asOBJ_APP_CLASS_ALLINTS or asOBJ_APP_CLASS_ALLFLOATS can now also be passed by value to application functions. Previously they could only be returned by value, but the parameters had to be sent by reference to work.