Sign in to follow this  

Yet another performance comparison (AS vs Small)

This topic is 1090 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all,

I know that this matter has been covered several times, but in all previous posts, there are clear indication of interest about this matter. So, having done my own performance comparison I submit it to the community.

I understand that the scope of this comparison is very specific, but that comes from a particular need I have.

I currently build automation machines with a real time control done with a PC. Machine personalization is done with a series of programs currently written in Small.

These programs are normally compiled into byte code, and the byte code is then injected into the real time environment where a virtual machine does the real execution. The virtual machine exposes to the script several system function call to operate on the physical machine hardware (reading/setting input/outputs, moving axes, setting variables, ecc ...)

 

Due to the limitations of the Small language (lack of structures, no typing, no doubles, ...) I am investigating the possibility to switch to AngelScript.

 

Having solved most of the interface problems, now I have an AngelScript compiler and a real time virtual machine for execution running, so I am able to make some performance comparisons.

 

I started with a very simple script.

Here is the AS version:

int TestNum;

void main(void)
{
  int count = 0;
  for(int col = 0; col < 10000; col++)
  {
    count++;
    ExtVar = count;     // Line A: Makes one call to external environment
    TestNum = count;    // Line B: Only set a script variable
  }
}

Here is the Small version:

new TestNum;

main()
{
  new count = 0;
  for(new col = 0; col < 10000; col++)
  {
    count++;
    Set_ExtVar(count);   // Line A: Makes one call to external environment
    TestNum = count;     // Line B: Only set a script variable
  }
}

As you can see they are very similar, and basically they are made of single loop making one local variable increment and than an assignment.

Line A an B that you see in the code are alternatives. Only one of them were present during the evaluation.

This are the results in mS:

           AS         Small
Line A    4.37         1.85
Line B    3.05         1.07

Looks like AS is two to three times slower than Small.

Few final notes:

  1. The test was done on the same machine with similar load and same real time environment
  2. I didn't investigate the JIT, because I am not sure it can even work in the real time environment
  3. The external call is the same in both tests executing the same code, but it causes an additional 0.8mS in Small and 1.3mS is AS. Thus it is possible to conclude that the external function call is at least 0.5mS slower
  4. The jitter between test runs is in the order of than 0.05 in all cases
  5. The slight syntax difference between the two sources in line A is due to the fact that Small does not allow the definition of an external opAssing method, so en explicit function call is made. I believe that this result in a very similar byte code anyway.
  6. I am under the impression that AS virtual machine makes some allocation/deallocation during the run of the test, even if everything looks static (or at least allocated at the beginning). I tell this because if I let the test run repeatedly for one hour,, when I shout down the real time system the call to engine->Release takes as much as 5 minutes to complete. I am not able at the moment to tell exactly why.

Should you need further information or have me make further tests, just ask.

Regards.

Mau.

 

PS: Unfortunately these results forces us to stay with current solution, but I will keep a eye of further AS development, especially in the performance area.

Share this post


Link to post
Share on other sites

Thanks for letting me know about the performance comparisons. I'll definitely look into this and see what can be done to improve the performance in AngelScript.

 

Did you already follow the instructions in the manual for getting the most performance out of AS?

 

Also, in some cases the generic calling convention can prove to be faster than than the native calling convention (due to less dynamic decisions to set up CPU registers, etc). So it may be worth it to implement the external opAssign method with the generic calling convention and see if that helps the performance.

 

 

 

I would also be very interested in understanding what it is that makes the engine take 5 minutes to complete on your machine. That doesn't sound normal at all, and may very well be related to some bottleneck in the execution itself. If you can help me figure out what the exactly the engine is doing during these 5 minutes it would be very helpful.

Share this post


Link to post
Share on other sites

I'm assuming you're running a decent compiler-optimization level and that it's the same settings across both tests?

 

For example, if it's a fully-debug build with optimizations disabled.. that's not a very good test.

Share this post


Link to post
Share on other sites

I use AngelScript as part of a real time control system as well.  I've never heard of Small, but I can say that AngelScript has been very stable and works well in a real time environment.  I have a simulator for the application on CodeProject that you can download  http://www.codeproject.com/Articles/755009/Open-Dyno-Real-Time-Control-System maybe you can see how it compares to your current setup.  I run an entire test cell (large machine) with a code execution time of about 0.2- 0.4ms.  It's easier to measure these values on Linux than Windows though.  I'm not sure what the support is like for Small.  Andreas has been an excellent help over the years and continues to add new features to the language.     

Share this post


Link to post
Share on other sites

Hi all,

thanks for all suggestions. I want just add a few clarifications.

 

I wasn't aware of the fine tuning page, but having a look at it now, I can tell that most of the suggestions were already included:

My functions does not use anything inside the virtual machine, so, no need for caching. I have already the threads and the GC disabled, but I did notice the following issue.

 

I do need to have the line callback, for debugging purposes. The minimum I need is a call to get the line number and module executing, i.e. a call to ctx->GetLineNumber.

But I noticed that while doing my evaluations I included also a call to ctx->GetCallStackSize and ctx->GetFunction that were not really needed, So I did again the same test with an empty callback first and with a single call to GetLineNumber next.

Here are the results:

           Empty     GetLineNumber
Line A      2.5         4.3
Line B      0.9         3.0

With the GetLineNumber call alone I got similar results as previous tests (meaning that probably GetCallStackSize and GetFunction are not particularly expensive) but there is a significant change removing the GetLineNumber call too. I ignore what this call does internally but it looks strange to me that is is so expensive. Current line and module, should be something at hand in the virtual machine. Probably the fact that it returns a string too causes linear search and string copy/allocate. Probably this is not needed here. Maybe it is worth to investigate here.

Just to make some small suggestion, I don't know how applicable here, but I can tell how Small manages this issue.

Small, after loading a script, requires a special scan of the script itself where the virtual machine patches the script with Ids instead of strings, function calls and other things. During this scan the virtual machine calls also the line callback function passing the corresponding id for each thing substituted, so the application can build a dictionary or a table. During real execution only Ids have to be manipulated. The advantage of this approach is that the real time environment can work with integers only, and only when there is a need to show something to the user (not in real time, obviously) a time consuming dictionary access and string manipulation is performed. Hope this helps.

 

Continuing with your comments (somewhat in order):

 

"Release call"

I do not know what causes the 5 minutes delay in the Release call. I cannot debug it since it is running in the real time portion of the code. I can do some further investigation placing some printout. I will be back with details, but I noticed that this behavior looks incremental. I mean, if I run the test for few seconds, that the Release call returns immediately. If I run the test for 10 minutes or more, than the Release call will take more and more time.

I suspect memory fragmentation or worse memory leak. In the real time environment the memory management is done differently in order to grant predictable time allocation. This means that some allocation/deallocation made during execution that may go unnoticed in standard environment, may cause problems in real time. I will make some more test in this area.

 

"Compiler settings"

I am using the same compiler, with the same setting and the whole application has only the minimum difference in the code base to be able to run the two different script. I am not trying to denigrate AS here. I am really trying to make my best efforts to be unbiased, and I am really willing to switch to AS, so I am really willing to see good numbers there ... I may have made mistakes, of course (like the line callback above) but I want to state here that I want to be as fair as I can.

 

"Performance measurement"

I am measuring the performance using the CPU clock (rdtsc processor instruction), so the result are precise to the nanosecond, and the accuracy is that of a typical crystal (~100ppm). And yes the speed step or other clock optimization/variations are switched off. Moreover the point of measure is the same in both cases. I am quite confident of the measuring method, also because it is the same timing method we are using since years to time internal happenings in the real time environment. Additionally this method is available in all operating systems since actually it is a processor instruction. A small fragment of inline asm allows it to be used in C/C++

 

"Support"

I had some support in the past from the Small community, but now they turned to a different way more oriented to application scripting and making the language less suitable for real time execution, So I froze my sources at some point (before the switch to the new Pawn naming) and evolved from there with my own code.

On the opposite I appreciate very much the support offered by Andreas at first and by the whole community here.

Good job!

 

Thanks all,

Mau.

Edited by ziomau

Share this post


Link to post
Share on other sites

I made a quick test placing some time reference markers inside the engine destructor in order to track down the reason it take so long to complete in real time.

I placed a time marker between every major loop or instruction group inside the destructor code.

Here are the results (in mS) together with a small description to identify the code section.

Follow the ~asCScriptEngine source code for further details.

    1	Calling  asDELETE(const_cast<asCScriptEngine*>(this), asCScriptEngine);
    0	Calling ~asCScriptEngine
    0	Call to SetContextCallbacks
 1178	Loop on scriptModules[n]->Discard()
11997	Call to GarbageCollect
    0	if and call to defaultArrayObjectType->Release()
   13	Loop on templateInstanceTypes
  332	Call to GarbageCollect, FreeUnusedGlobalProperties, ClearUnusedTypes
    0	Loop on scriptTypes
    0	Call to GarbageCollect, FreeUnusedGlobalProperties, ClearUnusedTypes
    0	Loop on scriptFunctions
    0	Call to GarbageCollect, ClearUnusedTypes
    0	Test on refCount
    0	Test on ReportAndReleaseUndestroyedObjects
    4	Loop on mapTypeIdToDataType
  305	Call to RemoveConfiguration and loop on configGroups
  117	Iteration on registeredGlobalProps
    0	Loop on templateInstanceTypes
    0	Loop on allRegisteredTypes and on templateSubTypes
    3	Loop on registeredGlobalFuncs and 4 calls to ReleaseAllFunctions
    0	Loop on funcDefs
    2	Loop on stringConstants
    1	Loop on scriptSectionNames
    0	Loop on userData
    0	Loop on nameSpaces
   81	Call to Unprepare and return to main

The largest time it takes (12 seconds!) is the first Garbage Collect call, even if I should have disabled it and I have no object that exposes some GC behavior.

 

I believe I have not enough knowledge to understand this matter or to delve deeper inside the code, but if someone needs further investigation or some specific test, I am available.

Remember only that I cannot place breakpoints or examine variables in real time.

I can only put time markers, printouts, or counters to be dumped at the end or every now and then during operation.

 

Regards.

Mau.

Edited by ziomau

Share this post


Link to post
Share on other sites
This is valuable information. Thanks a lot for sharing.

I've not really designed the LineCallback for high performance, I expected it to be used only when debugging, not when running the application at full speed. Do you really need to use it at all time?


The garbage collection that happens when shutting down the engine is to clear up the internal objects, for example potential circular references between two script class declarations, etc. It is not expected that this processing time increases the longer you execute the application, unless the scripts themselves create objects that require garbage collection (which is not the case with yout test script). Are you perhaps building the script over and over again? Are you requesting new modules more than one time in your application during the execution? Can you provide the same output again after letting the application run for a longer period? That way hopefully we'll understand what part of the code is taking up to 5 minutes to execute, and thus is the related to the cause of the slowdown. Also, it may be helpful to print the information that asIScriptEngine::GetGCStatistics() provide.


Perhaps you can share the source code for your test application that uses AngelScript, so I can take a look at how you've implemented it? If you don't want to share it publically you can send it to my e-mail (andreas@angelcode.com).



Based on your feedback, I've already begun looking at what can be improved in the library. On my computer I had to increase the number of iterations to 10 million to get close to 1 second in the execution time smile.png. I've identified a few things that adds unnecessary overhead with each call to an application registered function (the opAssign method in your case). I'll be working on reducing this for the next release.

The hardware you have is really slow (about 1000 times slower that a standard desktop computer), and I can understand that the garbage collection at the shutdown takes a long time on your machine. I'll definitely need to look into reducing the amount of work needed by the GC (if possible avoid it all together).

By the way, at what moment do you need to shutdown the script engine? If it is when you're also shutting down the application, then perhaps you can skip releasing the engine all together since the memory will be automatically reclaimed by the system when the application shutsdown anyway.

Share this post


Link to post
Share on other sites

Hi some more comments, before doing some more tests.

 

 

"I've not really designed the LineCallback for high performance"

Actually, most of the time my LineCallback does nothing. But sometimes, during machine setup, we need to show what the real time code is doing.

Since I cannot stop the execution, what I do is to collect the line that are executed at each code scan, to show them highlighted in the editor. That's why I need to collect the line number, and the function executing, nothing more. Sometimes I need to inspect the value of some global variable, but I believe this can be done asking the engine, or execution contest and I can do that in relaxed time.

What look still strange to me, as I said, is that the information I need should be at hand in the execution context, so why it takes so long?

 

 

"The garbage collection that happens when shutting down ..."

I will try to make a printout with GetGCStatistics(), and submit it here. I will try to run for longer time too.

 

 

"Perhaps you can share the source code ..."

The source will not even compile without the real time extension. I will try to send to you some fragment to show how I initialize and manage the virtual machine, in order to have you check if I made something very wrong.

 

 

"I had to increase the number of iterations to 10 million ..."

Probably you missed the fact that my timings were in mS. You increased the loop 1000 times to get into a seconds range. That means that my PC is running at the same speed as yours, or at least in the same magnitude order. wink.png

As I described I measure time using the processor clock. In case the processor clock does not change, the timing is very precise. Some green technology allows the processor to reduce its own speed when under low load, but this normally may be disabled with a bios switch. In our case having to run in real time, we need fastest response even in case the processor has nothing else to do, so we normally disable all "green" setting in the bios.

The TSC is an internal 64bit counter that counts up from zero when the processor is powered up by the processor clock. It can be read in C/C++ with this simple code fragment (MSVC):

inline __declspec(naked) __int64 RdPTime(void)
{
  _asm {
    rdtsc
    ret
  }
};

In Linux (GCC) I have made some test some time ago. I cannot bet it is still working, but here is the code:

inline int64_t RdPTime(void)
{
  int64_t ret;
  
  rdtscll(ret);
  return(ret);
};

My measurements are done getting the counter just before and just after the code I need to time and taking the difference. The minimum granularity is the processor clock. With nowadays 2Ghz machines or more we are talking about less than a nS (10^-9 Sec). Normally far more resolution than necessary. The only problem is that usually the processor clock is not a know measure. A mainboard sold as 2.8Ghz nominal, may actually generate a clock of 2753Mhz, so if you need to precisely convert ticks to seconds you need to know the exact clock running. Normally I compare the TSC counter difference with a OS known timing (Like Sleep, or other suspend with timeout) for one second,and that should be precise enough for our needs.

You get the idea ...

 

 

"... at what moment do you need to shutdown the script engine?"

Actually you are right. I need to shut down the engine only when the application is closed. Actually, on real machines this will never happen. Machines are just turned off ... no shutdown procedure. So this may be a non problem for me.

Nevertheless I prefer is everything works cleanly, before committing to this solution. I don't want to hide maybe some memory leak or a memory fragmentation problem hiding in the code, that may byte me back in some other situation.

Automatic machines may run for days or even weeks without being shutdown, so any problem that tend to increase in time is a no go.

 

Thanks all.

I will be back with further investigations in the shutdown area.

Regards.

Mau.

Share this post


Link to post
Share on other sites

I nailed down a little bit my problem, that happened only if I reload my script several times.
I explain a little what I am doing:
In my environment I have a single asIScriptEngine class instantiated.
This engine is initialized once with the following code:

  engine->SetDefaultAccessMask(0xFFFFFFFF);
  RegisterScriptMath(engine);         // Register math and trig functions add on
  RegisterEngineProperties(engine);   // Register engine properties
  RegisterStdString(engine);          // Register string class add on
  RegisterScriptArray(engine, true);  // Register generic array add on
  RegisterSystemVarsDecl(engine);     // Register 3 system class types
  RegisterSystemEnums(engine);        // Register about 60 system enums with a total of about 1000 values
  RegisterSystemFunctions(engine);    // Register about 100 system functions
  RegisterSystemVariables(engine);    // Register about 400 system variables of the declared types
  RegisterUserVariables(engine);      // Register about 350 user variables of the declared types
  RegisterUserIo(engine);             // Register about 100 IO definitions (one of the declared types)

If needed I can show the classes I am using, but I can say they are very simple, with only few methods and operators declared.
Actually they simply represent a system integer, float and boolean types respectively. Variables of these types live outside
the script and are accessed by the script when needed through the instance declared during the engine initialization.

 

Then I create a new module, load the byte code (compiled and saved separately), create a context, set the line callback.
Apart from error checking, and some simplification the code is:

  mod = engine->GetModule(modname, asGM_ALWAYS_CREATE);
  mod->LoadByteCode(&bs);
  ctx = engine->CreateContext();
  ctx->SetLineCallback(asFUNCTION(LineCallback), &lplc, asCALL_CDECL);
  mod->ResetGlobalVars(ctx);
  mainEntry = mod->GetFunctionByDecl("void main(void)");
  ctx->Prepare(mainEntry);

After this, the real time loop executes the code repeatedly every real time tick as in:

  while(WaitForNextInterrupt())
  {
    st = ctx->Execute();
    if(st == asEXECUTION_FINISHED)
      ctx->Prepare(plc->mainEntry);
  }

If I leave the above code running for long time this does not show the problem.
If I shutdown the program immediately, or even after one hour the results are the same:
I have, in both cases, a GC statistics printout like:

GetGCStatistics: currentSize:732, totalDestroyed:0, totalDetected:0, newObjects:732, totalNewDestroyed:0

and the full shutdown takes about 600mS.
This time is surprisingly long, for what I expect should be done, but at least it is not minutes!
And the fact that it does not show an increase with time, makes me feel I am on the safe side here.

Now, regarding the problem ...
Every now and then, while the real time environment is still running, we need to change/fix, our code and,
as fast as possible, replace the running version of the code with the fixed code.
Apart from some synchronization between load phase and real time execution, and the obvious error checking, basically the code does
the same sequence as above from the GetModule up to the Prepare included.
But, this time the code creates a new module, and uses a different set of mod and ctx, leaving the current ones alone
(they may be actually running by the real time tick).
Then, I wait until the engine is not executing any code (between a real time tick and the next), and atomically swap the two sets of
ctx and mod.
Thus the next real time tick, the engine that was sitting in the while above executes the new code.
After we are sure everything went ok, the old mod and ctx are then destroyed/released.
For completeness I must say that the preparation code and the execution are done on different threads.

Setting or not the AS_NO_THREADS does not change the behavior as far as this problem is concerned.

 

When I reload the code again and again, the GC objects start increasing at about 700 object each time,
and proportionally increases the time it takes to shutdown.

The sequence I tested demonstrates that each time the preparation code is executed, about 700 objects hang out in GC.

Things get worse when the reload is repeated, but even the first time we have those 700 objects around.

 

So the point is:
Why I have about 700 objects in the garbage collector when I explicitly disabled it?
Additionally, Is the above sequence correct, or should I call some other cleanup function during the load/reload process?

 

Sorry for these long posts, but I am trying to be as clear and detailed as possible.

Thanks for any help.

Mau.
 

Share this post


Link to post
Share on other sites

Ah, now it is making much more sense. :)

 

The reason for the increase of objects in the GC for each recompilation, is that every time a module is discarded the script functions, classes, etc are placed on the garbage collector so it can resolve the circular references between functions etc. Since you've turned off the automatic garbage collection (with SetEngineProperty(asEP_AUTO_GARBAGE_COLLECT, false)) and don't manually call the GarbageCollect() methods these objects are never cleaned up until you shut down the engine.

 

You just need to add a call to GarbageCollect() after you discard the module in order to avoid having the objects accumulate. Since you are already allowing reloading the script during execution, this should not be a problem. Though if you feel that the GarbageCollect() is too expensive to do in a single go, you can call it incrementally over a longer period, thus avoid impacting the performance of the real time application.

 

 

I will definitely look into reducing the amount of objects that will be placed on the garbage collector after discarding a module. Though, recompiling (or in your case, reloading) scripts shouldn't really be considered as something you'd do in a real-time environment. ;)

 

 

Even if your LineCallback normally doesn't do anything, it is still a function call that will add overhead to the execution time, thus degrade the performance. You should only set the LineCallback when you actually need to inspect the execution, and then remove it when no longer inspecting it. That way you will only have a performance impact when you're inspecting the execution. You shouldn't need to stop the context to set or remove the LineCallback. It ought to be possible to set it from a second thread even while the context is running in the main thread. 

 

The GetLineNumber() call is relatively slow because contrary to what you think this information is not readily available to the context. It has to be looked up by doing a binary search mapping the current bytecode position to the line number. Normally the context has no need to know the line number, so it doesn't make much sense to keep it in a faster but more memory consuming structure. You can see what goes on when calling GetLineNumber() in the as_scriptfunction.cpp (asCScriptFunction::GetLineNumber)

Share this post


Link to post
Share on other sites

If it would be possible for you to use our JIT, it would be fairly simple to add a custom line callback to it which caches the line number for each call. I'm not sure exactly what requirements exist for a real time program, but the JIT's behavior is simpler than most of what AngelScript's own compiler has to do.

Share this post


Link to post
Share on other sites

If I recall my university time correctly the main requirement for a real-time application is that the response time is predictable, i.e. it cannot be varying, e.g. due to increasing amount of time spent doing memory allocations, garbage collection, etc. 

Share this post


Link to post
Share on other sites

That's my limited understanding as well, and I believe the JIT does satisfy that requirement as long as its allocator does as well (though that could be replaced with a suitable allocator if needed).

Share this post


Link to post
Share on other sites

There's typically a response time requirement, such that a task must complete within X µs. This may be hard to guarantee with unpredictable/varying algorithms. In such cases, you need to verify that the worst case falls within your limits, and that's a very hot research topic at the moment. In short, it's much better to just not do unpredictable/varying stuff.

 

Then there's multiple levels of real time and safety critical systems. Some standards and certifications have extremely strict requirements where you can barely have branches in your code while others are rather relaxed. If you are using something like an x86 cpu then you are likely on the more relaxed side.

Share this post


Link to post
Share on other sites

Hi All,

Thanks for all the information and suggestions.

 

GarbageCollect

I was under the impression that disabling the GC, would cause objects to be released immediately when needed.

Ok, my fault, I will add an explicit call and see. I will report here my results.

One additional question here:

I can call the GarbageCollect on a thread with no strict real time requirements, but I need to know how the execution of the GC will interact with the execution of the script interpreter which must continue to run in real time on alive objects.

I mean: does the GC just rise a semaphore at the beginning and drop it at the end (600mS later) or will protect single critical access to shared areas with finer granularity?

Or (better) since the GC is disabled in script execution, it just runs on its" to be released" memory being sure that nothing will be added there?

 

JIT

Regarding the Jit probably this is not the best place to ask, but I need some general information to understand if it can be applicable to my case.

Just a few questions if someone can give me a simple answer or redirect me to the available information ...

 

1) Does the JIT generate processor (x86 in my case) instructions, or it is some additional byte code optimization but still interpreted by a virtual machine?

 

2) In case real processor instructions are generated, I expect that they are first generated as data. So what is the method used to "jump" to the data? Nowadays processor normally disable data execution and only privileged instructions are allowed to change this.

 

3) Instead, in case only some internal optimization is done and not real processor instructions, what is the expected execution speed gain against the average script code?

 

Real Time

As previous posters have indicated there are several kinds of "real time". Sometimes they have to be certified and/or undergo particular scrutiny.

Fortunately, in my case, my code is not involved in deep space or in life savings, so no need for certification (that's also why I can use code like AS or Small without the need to certify them).

Nevertheless I need to control machine automation and compute space trajectory of several motors, and this need strict timing requirements.

This requirement is what it is normally called "hard real time" which means the deadline should never be missed. Opposite to the "soft real time" where this requirement have to be fulfilled on average, but sparse deadline miss is acceptable (games, or audio processors fall into this second category where this problem is normally solved by a sufficiently deep buffer).

The "hard real time" requirement means that I must be sure there aren't bottlenecks or unnecessary critical sections that may cause priority inversion or other nasty (for real time) effects.

 

Thanks.

Mau.

Share this post


Link to post
Share on other sites

JIT

Regarding the Jit probably this is not the best place to ask, but I need some general information to understand if it can be applicable to my case.

Just a few questions if someone can give me a simple answer or redirect me to the available information ...

 

1) Does the JIT generate processor (x86 in my case) instructions, or it is some additional byte code optimization but still interpreted by a virtual machine?

 

2) In case real processor instructions are generated, I expect that they are first generated as data. So what is the method used to "jump" to the data? Nowadays processor normally disable data execution and only privileged instructions are allowed to change this.

 

3) Instead, in case only some internal optimization is done and not real processor instructions, what is the expected execution speed gain against the average script code?

 

1) It produces native x86 instructions with fallback to the VM under various conditions (some specific types of calls it can't handle natively, script exceptions, and any new ops that might be added since it was last updated).

 

2) The JIT requests a page from the OS which can be set to be executable. There is a rather simple class, CodePage, which is responsible for this allocation and can easily be changed. The JIT does expect that new code pages can be allocated dynamically, but a single large static page should be sufficient for most purposes. Jumping to the executable page is handled by the JIT instructions in the VM.

 

3) Native code runs between 2x and 10x faster depending on the exact code being executed and the architecture involved.

Edited by ThyReaper

Share this post


Link to post
Share on other sites


GarbageCollect
I was under the impression that disabling the GC, would cause objects to be released immediately when needed.
Ok, my fault, I will add an explicit call and see. I will report here my results.
One additional question here:
I can call the GarbageCollect on a thread with no strict real time requirements, but I need to know how the execution of the GC will interact with the execution of the script interpreter which must continue to run in real time on alive objects.
I mean: does the GC just rise a semaphore at the beginning and drop it at the end (600mS later) or will protect single critical access to shared areas with finer granularity?
Or (better) since the GC is disabled in script execution, it just runs on its" to be released" memory being sure that nothing will be added there?

 

The garbage collector is non-blocking, i.e. you can run it in a secondary thread while the primary thread continues to execute the script. (of course, in this way you cannot compile the library with AS_NO_THREADS to turn off support for multithreading). 

 

Your scripts appear to be well written and do not generate garbage on their own (since you didn't get any memory accumulation during normal script execution) but even if your scripts did generate garbage they wouldn't be blocked by the fact that the garbage collector was processing in a second thread.

 

Regards,

Andreas

Share this post


Link to post
Share on other sites
2) The JIT requests a page from the OS which can be set to be executable. There is a rather simple class, CodePage, which is responsible for this allocation and can easily be changed. The JIT does expect that new code pages can be allocated dynamically, but a single large static page should be sufficient for most purposes. Jumping to the executable page is handled by the JIT instructions in the VM.

 

That's what I imagined.

The problem for me is that the real time environment is not equivalent to the OS.

Somewhat similar, not identical and far from complete.

I will investigate if the page request call is available in my environment, or if some sort of workaround is possible.

 

Anyway, thanks for the hint.

Mau.

Share this post


Link to post
Share on other sites

I have been studying the AngelScript docs and so far was very impressed and was planning to integrate it into our codebase next year, it looks pretty perfect for our needs.

 

However, the talk about the garbage collection and its impliations however raised some concerns.

 

Basically, I dont understand why garbage collection is really neccesary at all, given that all objects are either stack based, or reference counted.  Like Ziomau, I also assumed that the memory used by the reference-counted objects would be released when they are destructed (ie on the last Release call which sets the retain count to 0). 

 

We use reference counting heavily in our code base (for realtime audio applications) and this approach never causes problems, as our design ensures that objects are never actually destroyed in the realtime threads, as we ensure that the very last Release call will always be made in the main thread.  (But just to clarify, localised / without a global "garbage" list, hence avoiding the known drawbacks of that).

 

Would it not make sense to have a compiler flag to release memory on object destruction instead of using a garbage collector at all?  Then it is more in the applications / script writers control.

 

I think the global garbage list also implies that objects created on additional threads will always be deleted in the main thread (assuming the garbage collector runs on the main thread).   Is this correct?  I would have to check, but this is also likely to cause issues for us.

 

Does that make sense or am I missing something?  Any further insight into how the garbage collection works would be helpful.

Edited by audioboy77

Share this post


Link to post
Share on other sites

 

The garbage collector is non-blocking, i.e. you can run it in a secondary thread while the primary thread continues to execute the script. (of course, in this way you cannot compile the library with AS_NO_THREADS to turn off support for multithreading).

 

I guess that assumes that malloc and free are non-blocking, which Im not sure but I have always assumed that they are not (as it would be very difficult to write allocators, which are non-blocking, at least fast ones using linked lists for example)

Edited by audioboy77

Share this post


Link to post
Share on other sites

Ok I see.  Well FWIW, for our needs It would be fine if circular references would result in memory leaks, and are considered the script programmers fault and responsibility.  I assume AC could easily reported these errors at shutdown of the script engine, to provide feedback of their occurrence.  (Of course they wouldn't be true leaks, as the memory would be released when the engine is deleted.)

 

Obviously I mean as an optional compiler flag (which basically would be the "Release on destruction" flag and no GC).

Edited by audioboy77

Share this post


Link to post
Share on other sites

ziomau,

 

I've completely re-factored the way the memory management for the internal script code is handled in revision 2094. It should no longer cause any problem to rebuild the scripts like you do, as the internal script code no longer uses the garbage collector to resolve the internal circular references. Discarding the old script module is much faster, and there is no memory build-up. If your script code doesn't create any garbage collected objects at all, you will not have to run the garbage collector at any time.

 

audioboy77,

 

You're thinking like a C++ developer where the programmer has the full responsibility for making sure the program behaves well. AngelScript is not C++ in script form. I cannot assume the script writer knows what he is doing and will not accidentally cause memory leaks or circular references between objects. AngelScript is meant to provide a sand-boxed environment. The script writer should not be able to do something wrong even if he tries to (of course, this also depends on the functionality that the application exposes to the script engine).

 

If you prefer to put the responsibility on your script writers to make sure the scripts are well behaving, then you can easily turn off the automatic garbage collection, and just do a trivial sweep over objects in the GC. I doubt you'll have a performance problem due to that, especially since the GC is incremental, and can be executed from a background thread if desired. I will however take your suggestion to heart and consider adding a flag that completely turns off the use of GC for those who prefer that. 

 

Yes, you're right. malloc and free are blocking. I didn't mean the "blocking" in the sense of quick operations like malloc and free, I meant "blocking" in the sense like Java's garbage collector that (used to) stop all processing until the garbage collector is finished, potentially for several seconds. (I mean no disrespect to Java developers, I'm basing my assumption on 10+ year old knowledge. I'm pretty sure Java has evolved since then cool.png

 

I try to explain how AngelScript's garbage collector works in the manual. Hopefully it should answer most of your questions. Though if you do have further questions, I recommend you start a new forum post for that discussion rather than hijacking ziomau's thread wink.png.

Share this post


Link to post
Share on other sites

ziomau,

 

I've completely re-factored the way the memory management for the internal script code is handled in revision 2094. It should no longer cause any problem to rebuild the scripts like you do, as the internal script code no longer uses the garbage collector to resolve the internal circular references. Discarding the old script module is much faster, and there is no memory build-up. If your script code doesn't create any garbage collected objects at all, you will not have to run the garbage collector at any time.

 

audioboy77,

 

You're thinking like a C++ developer where the programmer has the full responsibility for making sure the program behaves well. AngelScript is not C++ in script form. I cannot assume the script writer knows what he is doing and will not accidentally cause memory leaks or circular references between objects. AngelScript is meant to provide a sand-boxed environment. The script writer should not be able to do something wrong even if he tries to (of course, this also depends on the functionality that the application exposes to the script engine).

 

If you prefer to put the responsibility on your script writers to make sure the scripts are well behaving, then you can easily turn off the automatic garbage collection, and just do a trivial sweep over objects in the GC. I doubt you'll have a performance problem due to that, especially since the GC is incremental, and can be executed from a background thread if desired. I will however take your suggestion to heart and consider adding a flag that completely turns off the use of GC for those who prefer that. 

 

Yes, you're right. malloc and free are blocking. I didn't mean the "blocking" in the sense of quick operations like malloc and free, I meant "blocking" in the sense like Java's garbage collector that (used to) stop all processing until the garbage collector is finished, potentially for several seconds. (I mean no disrespect to Java developers, I'm basing my assumption on 10+ year old knowledge. I'm pretty sure Java has evolved since then cool.png

 

I try to explain how AngelScript's garbage collector works in the manual. Hopefully it should answer most of your questions. Though if you do have further questions, I recommend you start a new forum post for that discussion rather than hijacking ziomau's thread wink.png.

 

Hi Andreas

 

Thanks for the reply.

 

Yes I'm thinking like a C++ developer :) 

 

Ok I will turn of GC and then call the cleanup manually, as you say this will probably have a minimal overhead, and I assume in practice will be fine for our needs.  But thanks for considering the suggested flag!

 

I will check the manual again, and look forward to getting productive with AngelScript

Thanks

 

James

Share this post


Link to post
Share on other sites

This topic is 1090 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this