• Advertisement
Sign in to follow this  

Re: vector3 pod types (in C this time)

This topic is 2272 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi!

There was a topic with similar problem that we are experiencing, http://www.gamedev.net/topic/607167-ogre-vector3-and-amd64/ .
But what is different in our case is that our vector type is in pure C and writing explicit constructors/copy-constructors dont work in our case.
And yes, the problem is returning simple struct type as value on linux x64 (GCC 4.6.1), the vector values are all funky.
Saying returning vector with values (1,2,3) will give out (3,0,1).
Vector struct looks like this;

typedef float vec_t;
typedef vec_t vec3_t[3];
typedef struct asvec3_s
{
vec3_t v;
} asvec3_t;


We register the type with flags (asOBJ_VALUE | asOBJ_POD | asOBJ_APP_CLASS).
What are possible options that we have to fix this?

Share this post


Link to post
Share on other sites
Advertisement
Linux 64bit will probably return this type in the registers RAX:RDX. However, it looks like the values might be switched, so you're getting the higher elements in the lower elements. The 0 is probably just the default value as the type only has 3 floats, and not 4.

This might be a bug in as_callfunc.cpp (lines 513-514) or it may be a specific situation for this type that is not handled yet. I'll try to investigate this and see if I can figure it out.


In the meantime, you should be able to use the autowrappers to solve this problem.

Share this post


Link to post
Share on other sites

Linux 64bit will probably return this type in the registers RAX:RDX. However, it looks like the values might be switched, so you're getting the higher elements in the lower elements. The 0 is probably just the default value as the type only has 3 floats, and not 4.

This might be a bug in as_callfunc.cpp (lines 513-514) or it may be a specific situation for this type that is not handled yet. I'll try to investigate this and see if I can figure it out.


In the meantime, you should be able to use the autowrappers to solve this problem.


Small update, I found something that temporarily fixes our problem. Any of the following methods produce the vector (1,2,3) correctly:

struct vector {
union {
float xyz[3];
double _as_64bit_hack[3];
};
};

struct vector {
float xyz[3];
float pad[2]; // <- has to be at least 2 elements
};

struct vector {
double xyz[3]; // works as double, see first method
};


I tried different alignments with GCC align attribute, but none worked like these so I'm not sure is this alignment issue or framesize issue or what?
Naturally this isnt 100% satisfactory workaround and I hope that you find a "correct" solution for this :)

Share this post


Link to post
Share on other sites
All of these make the structure be larger than 128bit which will make the gcc compiler return the type in memory rather than the registers.

What stumps me is that I already have a test for validating that a class similar to this works properly on linux 64bit, and it is working as it should.

Unfortunately I haven't had the time to investigate this problem in detail yet, but hopefully before the end of the week I will at least be able to understand the cause.

Share this post


Link to post
Share on other sites
I made some tests with asvec3_t, and I get the same result as you do, i.e. {3,0,1} where {1,2,3} is expected.

For some reason that I have yet to determine why gcc is treating the following class:



class Class3
{
asDWORD a;
asDWORD b;
asDWORD c;
};


differently than the asvec3_t type, even though both are of the same size and both contain only primitives. Both seem to be returned in the RAX and RDX registers, however the order of the registers is swapped for asvec3_t versus Class3.



Apparently I need to have another flag than asOBJ_APP_CLASS to identify that the asvec3_t type should be returned in a swapped order, however I do not know how the application developer should know when to use one or the other.


If you have any idea on what the rule might be as to when gcc does it one way or the other I would really like to hear it.

Share this post


Link to post
Share on other sites


class Class3
{
asDWORD a;
asDWORD b;
asDWORD c;
};


At least this allows us to keep the 12-byte size on the vector, union { float v[3]; int i[3]; }; :D


Apparently I need to have another flag than asOBJ_APP_CLASS to identify that the asvec3_t type should be returned in a swapped order, however I do not know how the application developer should know when to use one or the other.
[/quote]
Yes, this is a bad idea. Even Vicious dont approve :)


If you have any idea on what the rule might be as to when gcc does it one way or the other I would really like to hear it.
[/quote]
I'm afraid I'm not that well equipped to investigate the bug on gcc/asm level.

I wonder would issues in this thread be related: http://www.gamedev.n...urning-doubles/
Specially those " bigger changes to the code that implements the native calling conventions in version 2.20.2"?
Our version is 2.21.0 btw.

Share this post


Link to post
Share on other sites
No, this problem is different from the one described in the other thread. Floats and doubles are returned in the XMM register, and for some reason when compiling with optimizations the value gets lost. Probably the gcc compiler doesn't see the XMM register is used and end up removing the instructions set it.


I'm trying to find some documentation that explains why gcc behaves this way for this type so I can add proper support for it.

Share this post


Link to post
Share on other sites
After reading the documentation at http://www.x86-64.or...ntation/abi.pdf I think the case might be that this structure is actually returned in XMM0 and not RAX:RDX, because it only contains float values. The fact that we get {3,0,1} is might be a coincidence.

Would it be possible for you to compile the following function:


asvec3_t vec3_123()
{
asvec3_t v = {1,2,3};
return v;
}


into assembler, so I can see how the return value is loaded into the registers?

You should be able to do this by compiling with 'gcc -S test.c'. It will generate the file test.s instead of test.obj.

Share this post


Link to post
Share on other sites
asvec3.c

#define _TEST_ONLY_

#ifndef _TEST_ONLY_
#include <stdio.h>
#endif

typedef float vec3_t[3];

typedef struct {
vec3_t v;
} asvec3_t;

asvec3_t asvec3_123()
{
asvec3_t v = { {1,2,3} };
return v;
}

#ifndef _TEST_ONLY_
int main()
{
asvec3_t v = asvec3_123();
printf( "%f, %f, %f\n", v.v[0], v.v[1], v.v[2] );
return 0;
}
#endif


asvec3.s

.file "asvec3.c"
.text
.globl asvec3_123
.type asvec3_123, @function
asvec3_123:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0x3f800000, %eax
movl %eax, -32(%rbp)
movl $0x40000000, %eax
movl %eax, -28(%rbp)
movl $0x40400000, %eax
movl %eax, -24(%rbp)
movq -32(%rbp), %rax
movq %rax, -16(%rbp)
movl -24(%rbp), %eax
movl %eax, -8(%rbp)
movq -16(%rbp), %rdx
mov -8(%rbp), %eax
movq %rdx, -56(%rbp)
movq -56(%rbp), %xmm0
movq %rax, -56(%rbp)
movq -56(%rbp), %xmm1
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size asvec3_123, .-asvec3_123
.ident "GCC: (GNU) 4.6.1 20110819 (prerelease)"
.section .note.GNU-stack,"",@progbits


Yes it seems to use the SSE registers for return value. I found a function attribute sseregparm for GCC, but unfortunately I can't find a complementary attribute to it.

Share this post


Link to post
Share on other sites
Here is nice document that lists ABI for various compilers (including MSVC and GCC) for various platforms (including 16-bit, 32-bit and 64-bit PC): http://www.agner.org/optimize/calling_conventions.pdf (page 16)

Share this post


Link to post
Share on other sites


Yes it seems to use the SSE registers for return value. I found a function attribute sseregparm for GCC, but unfortunately I can't find a complementary attribute to it.



Indeed. As the ABI said, the structure is returned in the XMM0:XMM1 registers rather than the RAX:RDX. I'm not sure why the structure is not returned only in XMM0 as it would fit nicely in there, but the other article that Martins posted describes that only the lower 64bits of the XMM registers are used.


Anyway, I may be able to add a special case for this kind of structure, say by adding the flag asOBJ_APP_CLASS_ALLFLOATS, or something like that. It would then be possible to have the code in as_callfunc_x64_gcc.cpp use that flag to handle the type correctly. However, this will have to wait for a future version, unless you feel up to implementing it yourself.

For now I'll just make sure the library properly checks for the return of simple types and gives an error since there is no way code can predict the way it should be handled.


The unfortunate final verdict is that on Linux 64bit you will only be able to return this type by value in registered functions if you use the asCALL_GENERIC calling convention via function wrappers. However, don't feel too bad about this, because I believe the asCALL_GENERIC calling convention is actually faster than the native calling convention on Linux 64bit, as there are less dynamic decisions that the library has to make.


Thanks for the help in clearing out the doubts.


Regards,

Andreas

Share this post


Link to post
Share on other sites
asOBJ_APP_CLASS_ALLFLOATS doesn't sound like a particularly good idea but it's definitely better than the other option: we've recently dropped support for generic calling convention from our AS classes completely. it was just too much of a burden.. doubling the amount of code for no practical reason.

Share this post


Link to post
Share on other sites
Unfortunately there is no other way for AngelScript to be able to support this type on Linux 64bit. The code in as_callfunc_x64_gcc.cpp needs to know that the gcc compiler would classify the type as floats and return it in XMM0:XMM1 instead of RAX:RDX.

The information may be given in a different form, but right now I think it is the way that will give the least amount to work. With the flag it should be possible to quite easily change the code to retrieve the return value from XMM0:XMM1 similarly to how GetReturnedFloat() is done.

Since you do not want to use the generic calling convention (even with autowrappers) and for some reason cannot change the type to include a copy constructor, perhaps you would be interested in adding the support for ALLFLOATS?

Share this post


Link to post
Share on other sites

Unfortunately there is no other way for AngelScript to be able to support this type on Linux 64bit. The code in as_callfunc_x64_gcc.cpp needs to know that the gcc compiler would classify the type as floats and return it in XMM0:XMM1 instead of RAX:RDX.

The information may be given in a different form, but right now I think it is the way that will give the least amount to work. With the flag it should be possible to quite easily change the code to retrieve the return value from XMM0:XMM1 similarly to how GetReturnedFloat() is done.

Since you do not want to use the generic calling convention (even with autowrappers) and for some reason cannot change the type to include a copy constructor, perhaps you would be interested in adding the support for ALLFLOATS?


Vector is in C side so no constructors there. In C++ this would be no problem since returning by value works with explicit copy constructors.
But I was able to, at least temporarily, fix the issue by unioning the 3 float components with 3 integer components :)

Share this post


Link to post
Share on other sites
C is the language our game was originally written in. Converting to C++ would simply be a waste of time for no real gain..

Share this post


Link to post
Share on other sites
You don't need to convert it. C++ is supposed to be backwards compatible with C. You just need to compile the C code as if it was C++, then you'll be able to add the copy constructor to the vec3 struct.

Anyway, I decided to try implement the ALLFLOATS myself. It should hopefully be only a few minor changes that I can do without the need for debugging.

Share this post


Link to post
Share on other sites

You don't need to convert it.


That's another option but it's more of a half-assed effort :) I mean, is it worth the trouble of altering project and makefiles, compiler settings, etc just to add one(!) constructor to a single(!) class?

Share this post


Link to post
Share on other sites
Done!

I added the flags asOBJ_APP_CLASS_ALLINTS and asOBJ_APP_CLASS_ALLFLOATS, to allow these two alternatives to work properly on Linux 64bit. The tests I included in test_feature, seems to work properly on the buildbot so hopefully it should work just fine for you as well.

You'll need revision 980 if you wish to try this.

The only change to your code is that the asvec3 type should be registered with the flags asOBJ_VALUE | asOBJ_POD | asOBJ_APP_CLASS | asOBJ_APP_CLASS_ALLFLOATS.

Don't forget to remove the union with the ints, that were added as a work around. If these are there then the flag must be asOBJ_APP_CLASS_ALLINTS instead.


Let me know how it works.

Share this post


Link to post
Share on other sites
I will try to find the time to update our AS to the latest version on this weekend and test the new flags.

Share this post


Link to post
Share on other sites
So, how did the test go?

I have a similar problem with Ogre3D, but haven't had the time to test yet.. ;)

Share this post


Link to post
Share on other sites
I just now checked in another improvement for 64bit Linux and Mac OS X. (revision 1070)

Objects registered with asOBJ_APP_CLASS_ALLINTS or asOBJ_APP_CLASS_ALLFLOATS can now also be passed by value to application functions. Previously they could only be returned by value, but the parameters had to be sent by reference to work.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement