• Content count

  • Joined

  • Last visited

Community Reputation

266 Neutral

About quarnster

  • Rank
  1. Alignment requirements

    For a reference see Intel® Advanced Vector Extensions Programming Reference:   [table] [th='2']Table 2-4. Instructions Requiring Explicitly Aligned Memory[/th] [tr][td]Require 16-byte alignment[/td][td]Require 32-byte alignment[/td][/tr] [tr][td](V)MOVDQA xmm, m128      [/td][td]VMOVDQA ymm, m256[/td][/tr] [tr][td](V)MOVDQA m128, xmm      [/td][td]VMOVDQA m256, ymm[/td][/tr] [tr][td](V)MOVAPS xmm, m128      [/td][td]VMOVAPS ymm, m256[/td][/tr] [tr][td](V)MOVAPS m128, xmm      [/td][td]VMOVAPS m256, ymm[/td][/tr] [tr][td](V)MOVAPD xmm, m128      [/td][td]VMOVAPD ymm, m256[/td][/tr] [tr][td](V)MOVAPD m128, xmm      [/td][td]VMOVAPD m256, ymm[/td][/tr] [tr][td](V)MOVNTPS m128, xmm     [/td][td]VMOVNTPS m256, ymm[/td][/tr] [tr][td](V)MOVNTPD m128, xmm     [/td][td]VMOVNTPD m256, ymm[/td][/tr] [tr][td](V)MOVNTDQ m128, xmm     [/td][td]VMOVNTDQ m256, ymm[/td][/tr] [tr][td](V)MOVNTDQA xmm, m128    [/td][td]VMOVNTDQA ymm, m256[/td][/tr] [/table]   [table] [th]Table 2-5. Instructions Not Requiring Explicit Memory Alignment[/th] [tr][td](V)MOVDQU xmm, m128[/td][/tr] [tr][td](V)MOVDQU m128, m128[/td][/tr] [tr][td](V)MOVUPS xmm, m128[/td][/tr] [tr][td](V)MOVUPS m128, xmm[/td][/tr] [tr][td](V)MOVUPD xmm, m128[/td][/tr] [tr][td](V)MOVUPD m128, xmm[/td][/tr] [tr][td]VMOVDQU ymm, m256[/td][/tr] [tr][td]VMOVDQU m256, ymm[/td][/tr] [tr][td]VMOVUPS ymm, m256[/td][/tr] [tr][td]VMOVUPS m256, ymm[/td][/tr] [tr][td]VMOVUPD ymm, m256[/td][/tr] [tr][td]VMOVUPD m256, ymm[/td][/tr] [/table]   In http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf, we can read in section 3.6.4 that: Misaligned data access can incur significant performance penalties. This is particularly true for cache line  splits. The size of a cache line is 64 bytes in the Pentium 4 and other recent Intel processors, including  processors based on Intel Core microarchitecture.    An access to data unaligned on 64-byte boundary leads to two memory accesses and requires several  µops to be executed (instead of one). Accesses that span 64-byte boundaries are likely to incur a large  performance penalty, the cost of each stall generally are greater on machines with longer pipelines.
  2. Premature destruction of object in Android

    It needs to be between the two include directives.
  3. Premature destruction of object in Android

    Others have touched the code since I first contributed the initial native arm call conventions so this might have changed, but they were originally not designed for interoperating with thumb mode. Your problem might be as easy as making the ndk compile your code in arm mode unless you absolutely need thumb. Just put this in your Android.mk: LOCAL_ARM_MODE := arm
  4. Angelscript on Raspberry Pi

    [quote name='Andreas Jonsson' timestamp='1356899504' post='5015847'] http://asarmjit.svn....asarmjit/trunk/  (edit: it seems this link doesn't work anymore. Perhaps it's possible to find the project on google? Or maybe quarnster has it somewhere else?) [/quote] It lives in https://github.com/quarnster/asarmjit nowadays, although it's untouched since I last worked on it so probably needs changes to work at all and I never got around to implementing a satisfying solution to the floating point operations, so none of those byte codes were ever implemented.   There's also my AOT compiler at https://github.com/quarnster/asaot which IIRC tested successfully on Android, but again has been untouched for a while so might need changes too.   Not entierly related to Angelscript, but I also have a small standalone fork of Mozilla's nanojit at https://github.com/quarnster/nanojit/tree/master/nanojit if you want to write your own platform independent jit. At the time of the fork the License was MPL1.1/GPL 2.0/LGPL 2.1, but looks like the TOT version http://hg.mozilla.org/tamarin-redux/file/f5191c18b0e4/nanojit has been updated to MPL2.0. I don't recall exactly what the obligations under MPL1.1 are, but I know that MPL2.0 is not viral so that you can statically link in MPL2.0 code without having to make your whole program's source code available, nor do you need to make the object code (for re-linking) available. If you make any changes to the MPL2.0 licensed files themselves you have to make those source code changes available though.
  5. Angelscript on Raspberry Pi

    Long thread so this might have been mentioned already, but I wanted to make it explicit that passing floating point values in floating point registers (aka hard float) isn't a Raspberry Pi specific thing, and you can in fact get a soft-float (floating point values in int registers) Raspberry Pi OS from http://www.raspberrypi.org/downloads If you make sure to never pass any floats between your program and pre compiled lib you could also (in theory[sup]tm[/sup]) use the hard float call convention in your programs on Android/iPhone/WinCE by providing the appropriate float abi flags to the compilers.
  6. Patch for unaligned 64 bit arm android arguments

    [url="https://gist.github.com/3377029"]https://gist.github.com/3377029[/url] adds the asOBJ_CLASS_ALIGN8 flag which made all the current test_feature tests pass. I also found an issue that wasn't caught by the current test_feature set ((value&mask)==mask would always hold true when mask is 0 which wasn't my intent..) and implemented a new test to catch that. So now again with this patch all test_feature tests pass on Arm Android.
  7. In as_callfunc_arm.cpp, replace the existing paramBuffer code with this [url="https://gist.github.com/3372000"]https://gist.github.com/3372000[/url] This makes all the current test_feature tests pass. However, a new test is needed to catch a case that this patch doesn't fix which isn't caught by the current set of tests. Replace testcdecl_class.cpp with the following [url="https://gist.github.com/3372005"]https://gist.github.com/3372005[/url] The new failure happens with the call to: [code]c4.class2_2ByVal(c)[/code] As the class2_2 argument should be 64 bit aligned. How to detect this and differ class2_2 from class2 I do not know though as they both have the same size in memory and yet one must be aligned but not the other. Any suggestions? Some clever template trick perhaps?
  8. svn trunk not compiling with gcc

    [code] [ 0%] Building CXX object CMakeFiles/Angelscript.dir/Users/quarnster/code/3rdparty/angelscript/sdk/angelscript/source/as_builder.cpp.o In file included from /Users/quarnster/code/3rdparty/angelscript/sdk/angelscript/source/as_builder.h:43, from /Users/quarnster/code/3rdparty/angelscript/sdk/angelscript/source/as_builder.cpp:40: /Users/quarnster/code/3rdparty/angelscript/sdk/angelscript/source/as_symboltable.h: In member function 'void asCSymbolTable<ENTRY_TYPE>::BuildKey(const asSNameSpace*, const asCString&, asCString&) const': /Users/quarnster/code/3rdparty/angelscript/sdk/angelscript/source/as_symboltable.h:527: error: invalid use of incomplete type 'const struct asSNameSpace' /Users/quarnster/code/3rdparty/angelscript/sdk/angelscript/source/as_symboltable.h:60: error: forward declaration of 'const struct asSNameSpace'[/code] Same error for multiple files. Just moving the definition of asSNameSpace into as_symboltable seemed to make it work.
  9. Discarding Modules

    Via the engine's [url="http://angelcode.com/angelscript/sdk/docs/manual/classas_i_script_engine.html#afb0ce55e5846eb18afdcf906aeb67cf7"]DiscardModule[/url] method, providing the name of the module you want to discard.
  10. [quote name='Tzarls' timestamp='1344483055' post='4967625'] I think I´ve found a potential problem in the way the JIT is implemented in Angelscript. Blind Mind Studios´ web site shows a sample code for use with their JIT: [source lang="java"]int main() { asIScriptEngine* engine = asCreateScriptEngine(ANGELSCRIPT_VERSION); //Create the JIT Compiler. The build flags are explained below, //as well as in as_jit.h asCJitCompiler* jit = new asCJITCompiler(0); //Enable JIT helper instructions; without these, //the JIT will not be invoked engine->SetEngineProperty(asEP_INCLUDE_JIT_INSTRUCTIONS, 1); //Bind the JIT compiler to the engine engine->SetJITCompiler(jit); //.............Setup some other stuff, compile and execute your scripts..... //Clean up your engine. Code pages will automatically be cleared //by the JIT when the engine is released. DiscardModules(); engine->Release(); delete jit; return 0; }[/source] The problem is in the last lines: when we call engine->Release(), we´re not destroying the engine, we´re just telling it that we won´t hold a reference to it anymore. The engine might continue to exist for some undefined time. Then we destroy (delete) jit, which actually destroys the JIT compiler (which is kind of the correct thing to do, otherwise we might be leaking memory). What if after that the engine (which still might be alive) tries to access the now defunct JIT? Crash! I think the JIT should also be a reference counted object. [/quote] IMO you should just call SetJITCompiler(NULL); /f
  11. Reference counting

    Got it, I'll live with the extra overhead until then. Cheers
  12. Reference counting

    I'm confused how the reference counting works for native functions and for the generic call convention. In my engine I have: 1) Functions returning objects increase the reference count of the returned object. 2) If the argument isn't returned (see 1), the callee increases the reference count of arguments if and only if it stores a copy. This means no unneeded work done, but the caller can't decrease its own reference until after the callee returns, which is exactly how you'd have to do it anyway from any c-like language (Suppose it wasn't reference counted, but a normal pointer, you can't create an object, delete it and then call into a function taking it as an argument can you? You have to create it, call into the function and then delete it) So the expectation is for example: [code] // Some function in c++ Shape* sphere = world->CreateSphereShape(1.0); // sphere comes in as count +1 RigidBody* sphereBody = world->CreateRigidBody(1.0, sphere); // spherebody comes in as +1, what sphere is we don't know and don't care, but it'll be at least 1 still world->AddRigidBody(sphereBody); // No longer references sphere and sphereBody here so we get rid of our references sphere->DelRef(); sphereBody->DelRef(); // ... snippity snip ... void PhysicsWorld::AddRigidBody(RigidBody* node) { if (something) return; // Meh, we didn't want it // yeah, we wanted it node->AddRef(); mNodes.push_back(node); } // ... snippity snip ... Something* Something::Blah(Something* arg) { // We don't care about this object, but the call convention is to increase the ref count for stuff returned, so that's what we'll do arg->AddRef(); return arg; } [/code] Now from [url="http://angelcode.com/angelscript/sdk/docs/manual/doc_obj_handle.html"]http://angelcode.com...obj_handle.html[/url] we can read that functions should proactively call AddRef when returning (same as I have), but also call Release on arguments (in other words the caller increases the ref count before calling the function). Is there a specific reason for that? Imagine a long callstack where an object is passed through but only the endpoint actually needs to store the reference. With the AS call convention a reference is added and released in each step in the callstack whereas in my version it's only done where needed. In a multithreaded environment changes to the reference counter needs to be an atomic operation so isn't necessarily free and it's unneeded work anyway. Am I high? Have I missed something? Scripts wouldn't necessarily call into performance critical sections, but the rest of the engine might and if that section happens to be reference counted we've now introduced unneeded overhead. For the generic call convention we have SetReturnObject and SetReturnAddress which will (if needed) increase the reference count and do nothing to the count respectively. For arguments though release is called always on the arguments. This means that if I have a native function taking reference counted arguments, I need to decrease the reference count in the native function when calling the native function directly. If I have a wrapper though, I need to undo this release by calling AddRef in the wrapper. In a way that is consistent with the call convention used, but it's still all unneeded work isn't it? Then there's auto handles which gives some aid here. To automatically fit the AS call convention into mine, I'd define functions as for example: [code]RigidBody@ CreateRigidBody(float, Shape@+)[/code] But there's the caveat mentioned:[quote]However, it is not recommended to use this feature unless you can't change the functions you want to register to properly handle the reference counters. When using the auto handles, AngelScript needs to process all of the handles which causes an extra overhead when calling application registered functions.[/quote] But this is an extra overhead that's not really needed in the first place, is it? Have I missed something?
  13. CMake project fix for MSVC x86_64

    Since there's an assembly file for MSVC x86_64 this should be added before or after the ANDROID block in sdk/AngelScript/projects/cmake/CMakeLists.txt: [code] if(MSVC AND CMAKE_CL_64) enable_language(ASM_MASM) if(CMAKE_ASM_MASM_COMPILER_WORKS) set(ANGELSCRIPT_SOURCE ${ANGELSCRIPT_SOURCE} ../../source/as_callfunc_x64_msvc_asm.asm) else() message(FATAL ERROR "MSVC x86_64 target requires a working assembler") endif() endif() [/code] I also had to edit the code block around line 1131 in angelscript.h from [code]if(sizeof(void*) == 4)[/code] to [code]#ifndef AS_64BIT_PTR[/code], otherwise the compiler would complain about not being able to static cast the data types involved when trying to register CallMe1 in testmultipleinheritance.cpp. BTW TestSaveLoad reports (but still says it passed) [code]The saved byte code is not of the expected size. It is 1760 bytes The saved byte code contains a different amount of zeroes than the expected. Co nted 530 The saved byte code has different checksum than the expected. Got 0x87C6163E[/code] Not sure if that's a real problem or not as the test passed anyway.
  14. Arguments in native call convention

    Nevermind, seems the argument thing was a red herring. Why it worked when calling CallSystemFunctionNative I have no idea, but it seems the stack was wrong due to broken handling of return values in the AOT code generation.
  15. Arguments in native call convention

    I'm trying to make my AOT compiler "inline" native function calls so it generates a C signature based on the information available and thus bypassing the register copying mechanisms involved when making a non-generic native call. I've run into a problem I've yet to be able to figure out though. It seems there is a difference in how AngelScript stores the parameter for a call to: [code]static void CopyConstructString(const string &other, string *thisPointer);[/code] and a call to [code]static void func26(const MyClass3& a0);[/code] The AOT generated code for these calls are: [code] // (CopyConstructString) // _beh_0_, 10, 0, 2, 0, 0, 0 // ret: 0, 0, 0, 1, 0, 0, 0 // arg0: 67108876, 1, 7938, 0, 0, 0, 1, 0, 1, 1, 2, 2 typedef void (*funcptr)(asQWORD&, void*); funcptr a = (funcptr)sysFunc->func; a(**(asQWORD**)(&args[0]), obj);[/code] [code]// func26, 2, 0, 2, 0, 0, 0 // ret: 0, 0, 0, 1, 0, 0, 0 // arg0: 67108879, 1, 7938, 0, 0, 0, 1, 0, 1, 1, 2, 2 typedef void (*funcptr)(asQWORD&); funcptr a = (funcptr)sysFunc->func; a(**(asQWORD**)(&args[0])); [/code] with the comments being [code]snprintf(buf, 128, "// %s, %d, %d, %d, %d, %d, %d\n", descr->GetName(), sysFunc->callConv, sysFunc->takesObjByVal, sysFunc->paramSize, sysFunc->hostReturnInMemory, sysFunc->hasAutoHandles, sysFunc->scriptReturnSize); snprintf(buf, 128, "// ret: %d, %d, %d, %d, %d, %d, %d\n", sysFunc->hostReturnFloat, retType.IsObject(), retType.IsReference(), retType.IsPrimitive(), retType.GetSizeInMemoryDWords(), retType.GetSizeOnStackDWords(), sysFunc->hostReturnSize); snprintf(buf, 128, "// arg%d: %d, %d, %d, %d, %d, %d, %d, %d, %d, %d, %d, %d\n", n, descr->GetParamTypeId(n), dt.IsObject(), dt.GetObjectType()->GetFlags(),dt.IsObjectHandle(), dt.IsScriptObject(), dt.IsHandleToConst(), dt.IsReference(), dt.IsPrimitive(), dt.CanBeCopied(), dt.CanBeInstanciated(), dt.GetSizeInMemoryDWords(), dt.GetSizeOnStackDWords()); [/code] So to me it looks like the string and the MyClass3 are as close to each other as can be, but the problem is that the call to CopyConstructString only works with [code]a(*(asQWORD*)(&args[0]), obj); // single dereference [/code] and the call to func26 only works with [code]a(**(asQWORD**)(&args[0])); // double dereference [/code] Now I hear you saying "maybe it's because you use a asQWORD rather than the real type". Unfortunately even if I put the real type in there the situation is the same: [code]typedef void (*funcptr)(const std::string&, std::string*); funcptr a = (funcptr)sysFunc->func; a(*(std::string*)(&args[0]), (std::string*) obj); // Only single dereference works[/code] [code]typedef void (*funcptr)(const MyClass3&); funcptr a = (funcptr)sysFunc->func; a(**(MyClass3**)(&args[0])); // Only double dereference works[/code] So now the signature is a perfect match between the function pointer and the real function, agree? If you do, then to me it looks like AngelScript is indeed storing these two parameters differently on the stack, agree? How do I detect whether I should dereference once or twice? Worth mentioning is that if I call into CallSystemFunctionNative instead of this call inlining, everything works as expected so the stack is properly set up. I tried looking in CallSystemFunctionNative for a few platforms to see if there's any special handling, but that only seems to be the case when sysFunc->takesObjByVal, which isn't the case for these two functions, or maybe the information needed isn't available when the jit compiler is invoked to compile the function? Any ideas or suggestions? Cheers