Sign in to follow this  
TheAtom

Wrong function called on bytecode restoration

Recommended Posts

TheAtom    330

Tested on rev1549. The bug is sporadic and difficult to pinpoint, its root cause is likely in the bytecode generation rather than bytecode restoration. In a script with a registerd value-type variable placed in the global scope, sometimes another function is called instead of the variable's type constructor when the script is loaded from the bytecode.

 

Loading the script from the source is without problems. When loading from the generated bytecode, it either works correctly all of the time (for that generated bytecode), or it consistently calls the same, wrong function (which has no special relation to the correct one). Whether the bytecode is buggy or not seems to depend on things like order of compilation of scripts or whether some script was compiled prior to the one in question.

 

Calling the wrong function naturally leads to bad behaviour, here is the call stack I get when I stumble upon a resulting crash:

    CallCDeclFunctionObjFirst + 38, as_callfunc_x86.cpp (494)
    CallSystemFunctionNative + 885, as_callfunc_x86.cpp (207)
    CallSystemFunction + 294, as_callfunc.cpp (486)
    asCContext::ExecuteNext + 2756, as_context.cpp (2450)
    asCContext::Execute + 521, as_context.cpp (1158)
    asCModule::CallInit + 337, as_module.cpp (310)
    asCModule::ResetGlobalVars + 31, as_module.cpp (254)
    asCReader::Read + 278, as_restore.cpp (98)
    asCModule::LoadByteCode + 111, as_module.cpp (1189)

 

Maybe it is important that there could be some functions etc. that are registered to the engine after some scripts are already compiled (or loaded from the bytecode). This never caused any problems before though. For the record, the value-type in question is registered this way:

    if(ASEngine->RegisterObjectType("FuncBind", sizeof(ScriptFuncBind), asOBJ_VALUE) < 0)
        Log("%s FuncBind\n", fail);
    if(ASEngine->RegisterObjectBehaviour("FuncBind", asBEHAVE_CONSTRUCT, "void f()", asFUNCTION(ScriptFuncBind_Construct), asCALL_CDECL_OBJFIRST) < 0)
        Log("%s FuncBind ctor\n", fail);
    if(ASEngine->RegisterObjectBehaviour("FuncBind", asBEHAVE_DESTRUCT, "void f()", asFUNCTION(ScriptFuncBind_Destruct), asCALL_CDECL_OBJFIRST) < 0)
        Log("%s FuncBind dtor\n", fail);

Share this post


Link to post
Share on other sites
WitchLord    4677

This sounds like it may be the same problem reported here:

 

http://www.gamedev.net/topic/637417-losing-some-function-pointers-after-bytecode-reload/

 

Unfortunately I have not been able to reproduce the problem, much less figure out what causes it.

 

I'll see if the information you provided perhaps allow me to find something.

Share this post


Link to post
Share on other sites
TheAtom    330

There is yet another symptom of probably the same bug: after loading all scripts except the last one from the bytecode, and then the last one from the source, it did not compile and produced an interesting error message:
 

Info : Compiling void ask(Critter&inout, int, int, int) : 8, 1.
Error : No matching signatures to 'Is()' : 10, 5.
Info : Candidates are: : 10, 5.
Info : bool Flee(Critter&inout, bool) : 10, 5.


After some debugging, it turns out that the contents of asCScriptEngine::scriptFunctions apparently consisted of a total of 72852 entries, some 3/4 of which were nulls (I assume some padding, maybe removal of duplicates, in any case many array indices and function ids were unused). In particular, there were functions with index (= id) over 65535. The problem is that asCBuilder::GetFunctionDescription treats them as imported (it checks for the 0xFFFF0000 flag, even though I see that the imported functions have ids beginning from 0x40000001) and searches for them in asCScriptEngine::importedFunctions rather than asCScriptEngine::scriptFunctions, giving a match to some random function that has been imported somewhere rather than the right function.

Seems that the problem can only manifest itself in codebases large enough. Apparently we hit this hidden function limit some time ago. By the way, what is the reason for this index/id skipping in asCScriptEngine::scriptFunctions?

Edited by TheAtom

Share this post


Link to post
Share on other sites
WitchLord    4677
That shouldn't happen. The empty slots in scriptFunctions should be reused.

This may very well be the cause of trouble. I imagine you use shared functions and classes, am I right?

I'll investigate why the function ids aren't reused as they should.

Share this post


Link to post
Share on other sites
TheAtom    330

We do use shared classes. I notice that functions from a single module and functions that are specializations of some registered global function/methods that use classes from that module form uninterrupted blocks in scriptFunctions, as if null gaps are added after a module has been entirely loaded from the bytecode.

 

What about that possible inconsistency with GetFunctionDescription using 0xFFFF0000 flag to determine that the id denotes an imported function, where in reality their ids start from 0x40000001 (or so it seems)? We use ~17000 unique ids for non-imported ones at the moment. That's not 65535 yet, but it's the same order of magnitude.

Edited by TheAtom

Share this post


Link to post
Share on other sites
WitchLord    4677

Yes, the 0xFFFF0000 mask is likely why you're seeing the wrong functions being called. I will change this mask too, however, this is just an indirect cause because with only 17000 unique ids you shouldn't be reaching the 65535 limit. 

Share this post


Link to post
Share on other sites
WitchLord    4677

I believe I found the problem. I've checked in the fix in revision 1555.

 

This should solve the problem with the empty slots in scriptFunctions not being reused as they should. I haven't changed the mask for imported functions yet, but if I'm right your problem will already be solved with the changes I made.

 

Please let me know if this corrects your problem, and if you still see a lot of unused slots in scriptFunctions.

Share this post


Link to post
Share on other sites
TheAtom    330

I still see those gaps. I made a stack trace whenever asCScriptEngine::GetNextScriptFunctionId() returns id that is eventually a null entry in scriptFunctions, for each of ~100 functions I checked it's always the same:

 

asCScriptEngine::GetNextScriptFunctionId()  Line 4409    C++
asCReader::ReadFunction(bool & isNew=true, bool addToModule=false, bool addToEngine=false, bool addToGC=true)  Line 646 + 0xb bytes    C++
asCReader::ReadInner()  Line 350 + 0x12 bytes    C++
asCReader::Read(bool * wasDebugInfoStripped=0x00000000)  Line 71 + 0x8 bytes    C++
asCModule::LoadByteCode(asIBinaryStream * in=0x0d533f0c, bool * wasDebugInfoStripped=0x00000000)  Line 1187 + 0xf bytes    C++

 

Trying to step debug to the end of ReadFunction didn't reveal much, but I can tell that the function was imported.

 

[edit] A wild guess: imported but unused functions?

Edited by TheAtom

Share this post


Link to post
Share on other sites
TheAtom    330

as_restore.cpp:350   info->importedFunctionSignature = ReadFunction(isNew, false, false);

ReadFunction internally requests an id to be given, even though info looks to be an information about an imported function, which gets its own, high id anyway. Isn't that a problem?

Share this post


Link to post
Share on other sites
WitchLord    4677

Actually not. The id in this case isn't consumed.

But I have a couple of leads, I was able to identify some test cases that left a couple of empty slots in the scriptFunctions array. Tomorrow I'll investigate those to fix the problem.

 

[edit] My english seems to be getting worse :(

Edited by Andreas Jonsson

Share this post


Link to post
Share on other sites
TheAtom    330

If this is of any help,

int asCScriptEngine::GetNextScriptFunctionId()
{
    if( freeScriptFunctionIds.GetLength() )
        return freeScriptFunctionIds[freeScriptFunctionIds.GetLength()-1]; // this never happens...

    int id = (int)scriptFunctions.GetLength();
    scriptFunctions.PushLast(0);
    return id; // ... and so this returns 1, 2, 3 and so forth without ever repeating a value
}

Share this post


Link to post
Share on other sites
WitchLord    4677

Yes. I identified that bug today, as well as a couple of bugs where a function id was reused incorrectly. I'll have this fix checked in once I get home in a few hours.

Edited by Andreas Jonsson

Share this post


Link to post
Share on other sites
WitchLord    4677

I've checked in the new fixes in revision 1556.

 

Please let me know if this corrects your problem, or if you're still seeing gaps in the scriptFunctions array and/or having the wrong function being called after loading bytecodes.

Share this post


Link to post
Share on other sites
WitchLord    4677

Have you been able to confirm if the problem was fixed yet?

 

For now I'll assume this problem has been corrected and won't investigate further into this. Let me know if my assumption is incorrect and I'll take up the investigation again.

Share this post


Link to post
Share on other sites
TheAtom    330

I am now getting a crash in my first scenario and it happens somewhere inside this loop:

        if( engine->ep.initGlobalVarsAfterBuild )
            r = module->ResetGlobalVars(0);

 

The problem occurs with only one module (after many others have been loaded), and it is the only module where I have a registered value-type variable declared in (the module's) global scope. Everything here is loaded from previously saved bytecode. I'll see if I can provide more details, perhaps it's me at fault here.

Share this post


Link to post
Share on other sites
WitchLord    4677

Even if you find it was something you did incorrectly, please let me know what it was. Perhaps there is something I can do to prevent the crash and instead return a meaningful error code and/or message.

Share this post


Link to post
Share on other sites
TheAtom    330

If I put the offending script at the top of the modules to be loaded, it works every time, both from source and from bytecode. But when I move it to the middle of the scripts and load everything from the (already existing) bytecode again, I get a crash and likely a stack corruption (the callstack is garbage). That would indicate there's still some problem with the restoration routine.

Edited by TheAtom

Share this post


Link to post
Share on other sites
WitchLord    4677

I agree. Something is definitely still wrong.

 

How are the global variables in the module that crashes initialized? Is it just with default constructors, i.e. no explicit initialization, or do you have any initialization expressions?

 

Does the problem also happen if you comment out most of the global variables? Are you able to narrow it down to a single global variable?

Share this post


Link to post
Share on other sites
TheAtom    330

The constructor just initializes a single integer field to zero. Rest is by the book. There is only one global variable of my value-type (the type's name is FuncBind) in that module, by step debugging I see that ExecuteNext() reaches asBC_ALLOC (with a script object), then executes the following line

 

asDWORD *mem = (asDWORD*)m_engine->CallAlloc(objType);

where objType is what it's supposed to be - FuncBind. However at this point the func variable (which I understand is supposed to be the id of the type's constructor) is the id of some unrelated function. This function is called with CallSystemFunction, and the crash happens soon later on

 

if( a ) *a = mem;

at which point I see a corrupted stack.

 

It might be important that in our case the registration of certain global functions happens in between loading of modules (either from source or bytecode). Each module has a list of "dependencies" which are registered to the engine just prior to the compilation of this module, unless they are already loaded. The constructor that fails to be called (and the function called instead of it) are both registered like this.

 

edit: I can also reproduce this with func being the id of a nonregistered function, i.e. a script function. Naturally, CallSystemFunction fails with them on the spot. I have a question, where in ReadInner() "linking stage" there is the translation of registered object's constructor calls? And where are ids of these actually stored - I see that asBC_ALLOC opcodes are followed by a (correct) type pointer, and then pointer to function id - to where do those pointers point specifically?

Edited by TheAtom

Share this post


Link to post
Share on other sites
WitchLord    4677

I think I've discovered the problem in the code. 

 

For some reason that I can't recall why the asBC_ALLOC instruction doesn't have it's function id translated unless the object type is a script class. 

 

That would explain why it works when you always put the script with the global variable as the first module to be loaded, since then the function id will always be same every execution. 

 

I'll have a fix for this checked in soon.

Share this post


Link to post
Share on other sites
TheAtom    330

That would explain why it works when you always put the script with the global variable as the first module to be loaded, since then the function id will always be same every execution.

Strangely enough, I observed the bug even when the script was in the "middle" of all the loaded modules the whole time, without any moving. A case where ids should also stay the same.

 

Nevertheless, revision 1558 seems to have fixed the issue. I can no longer reproduce the bug in any scenario.

Share this post


Link to post
Share on other sites
WitchLord    4677

Thanks for confirming the fix. 

 

Most likely when you observed the bug even when in the "middle" something changed without you realizing it. It could have been a additional script function that was compiled, or template instance that was created, or something like it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this