Yet another performance comparison (AS vs Small)

Started by
22 comments, last by audioboy77 9 years, 4 months ago

If it would be possible for you to use our JIT, it would be fairly simple to add a custom line callback to it which caches the line number for each call. I'm not sure exactly what requirements exist for a real time program, but the JIT's behavior is simpler than most of what AngelScript's own compiler has to do.

Advertisement

If I recall my university time correctly the main requirement for a real-time application is that the response time is predictable, i.e. it cannot be varying, e.g. due to increasing amount of time spent doing memory allocations, garbage collection, etc.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

That's my limited understanding as well, and I believe the JIT does satisfy that requirement as long as its allocator does as well (though that could be replaced with a suitable allocator if needed).

There's typically a response time requirement, such that a task must complete within X µs. This may be hard to guarantee with unpredictable/varying algorithms. In such cases, you need to verify that the worst case falls within your limits, and that's a very hot research topic at the moment. In short, it's much better to just not do unpredictable/varying stuff.

Then there's multiple levels of real time and safety critical systems. Some standards and certifications have extremely strict requirements where you can barely have branches in your code while others are rather relaxed. If you are using something like an x86 cpu then you are likely on the more relaxed side.

Hi All,

Thanks for all the information and suggestions.

GarbageCollect

I was under the impression that disabling the GC, would cause objects to be released immediately when needed.

Ok, my fault, I will add an explicit call and see. I will report here my results.

One additional question here:

I can call the GarbageCollect on a thread with no strict real time requirements, but I need to know how the execution of the GC will interact with the execution of the script interpreter which must continue to run in real time on alive objects.

I mean: does the GC just rise a semaphore at the beginning and drop it at the end (600mS later) or will protect single critical access to shared areas with finer granularity?

Or (better) since the GC is disabled in script execution, it just runs on its" to be released" memory being sure that nothing will be added there?

JIT

Regarding the Jit probably this is not the best place to ask, but I need some general information to understand if it can be applicable to my case.

Just a few questions if someone can give me a simple answer or redirect me to the available information ...

1) Does the JIT generate processor (x86 in my case) instructions, or it is some additional byte code optimization but still interpreted by a virtual machine?

2) In case real processor instructions are generated, I expect that they are first generated as data. So what is the method used to "jump" to the data? Nowadays processor normally disable data execution and only privileged instructions are allowed to change this.

3) Instead, in case only some internal optimization is done and not real processor instructions, what is the expected execution speed gain against the average script code?

Real Time

As previous posters have indicated there are several kinds of "real time". Sometimes they have to be certified and/or undergo particular scrutiny.

Fortunately, in my case, my code is not involved in deep space or in life savings, so no need for certification (that's also why I can use code like AS or Small without the need to certify them).

Nevertheless I need to control machine automation and compute space trajectory of several motors, and this need strict timing requirements.

This requirement is what it is normally called "hard real time" which means the deadline should never be missed. Opposite to the "soft real time" where this requirement have to be fulfilled on average, but sparse deadline miss is acceptable (games, or audio processors fall into this second category where this problem is normally solved by a sufficiently deep buffer).

The "hard real time" requirement means that I must be sure there aren't bottlenecks or unnecessary critical sections that may cause priority inversion or other nasty (for real time) effects.

Thanks.

Mau.

JIT

Regarding the Jit probably this is not the best place to ask, but I need some general information to understand if it can be applicable to my case.

Just a few questions if someone can give me a simple answer or redirect me to the available information ...

1) Does the JIT generate processor (x86 in my case) instructions, or it is some additional byte code optimization but still interpreted by a virtual machine?

2) In case real processor instructions are generated, I expect that they are first generated as data. So what is the method used to "jump" to the data? Nowadays processor normally disable data execution and only privileged instructions are allowed to change this.

3) Instead, in case only some internal optimization is done and not real processor instructions, what is the expected execution speed gain against the average script code?

1) It produces native x86 instructions with fallback to the VM under various conditions (some specific types of calls it can't handle natively, script exceptions, and any new ops that might be added since it was last updated).

2) The JIT requests a page from the OS which can be set to be executable. There is a rather simple class, CodePage, which is responsible for this allocation and can easily be changed. The JIT does expect that new code pages can be allocated dynamically, but a single large static page should be sufficient for most purposes. Jumping to the executable page is handled by the JIT instructions in the VM.

3) Native code runs between 2x and 10x faster depending on the exact code being executed and the architecture involved.


GarbageCollect
I was under the impression that disabling the GC, would cause objects to be released immediately when needed.
Ok, my fault, I will add an explicit call and see. I will report here my results.
One additional question here:
I can call the GarbageCollect on a thread with no strict real time requirements, but I need to know how the execution of the GC will interact with the execution of the script interpreter which must continue to run in real time on alive objects.
I mean: does the GC just rise a semaphore at the beginning and drop it at the end (600mS later) or will protect single critical access to shared areas with finer granularity?
Or (better) since the GC is disabled in script execution, it just runs on its" to be released" memory being sure that nothing will be added there?

The garbage collector is non-blocking, i.e. you can run it in a secondary thread while the primary thread continues to execute the script. (of course, in this way you cannot compile the library with AS_NO_THREADS to turn off support for multithreading).

Your scripts appear to be well written and do not generate garbage on their own (since you didn't get any memory accumulation during normal script execution) but even if your scripts did generate garbage they wouldn't be blocked by the fact that the garbage collector was processing in a second thread.

Regards,

Andreas

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

2) The JIT requests a page from the OS which can be set to be executable. There is a rather simple class, CodePage, which is responsible for this allocation and can easily be changed. The JIT does expect that new code pages can be allocated dynamically, but a single large static page should be sufficient for most purposes. Jumping to the executable page is handled by the JIT instructions in the VM.

That's what I imagined.

The problem for me is that the real time environment is not equivalent to the OS.

Somewhat similar, not identical and far from complete.

I will investigate if the page request call is available in my environment, or if some sort of workaround is possible.

Anyway, thanks for the hint.

Mau.

I have been studying the AngelScript docs and so far was very impressed and was planning to integrate it into our codebase next year, it looks pretty perfect for our needs.

However, the talk about the garbage collection and its impliations however raised some concerns.

Basically, I dont understand why garbage collection is really neccesary at all, given that all objects are either stack based, or reference counted. Like Ziomau, I also assumed that the memory used by the reference-counted objects would be released when they are destructed (ie on the last Release call which sets the retain count to 0).

We use reference counting heavily in our code base (for realtime audio applications) and this approach never causes problems, as our design ensures that objects are never actually destroyed in the realtime threads, as we ensure that the very last Release call will always be made in the main thread. (But just to clarify, localised / without a global "garbage" list, hence avoiding the known drawbacks of that).

Would it not make sense to have a compiler flag to release memory on object destruction instead of using a garbage collector at all? Then it is more in the applications / script writers control.

I think the global garbage list also implies that objects created on additional threads will always be deleted in the main thread (assuming the garbage collector runs on the main thread). Is this correct? I would have to check, but this is also likely to cause issues for us.

Does that make sense or am I missing something? Any further insight into how the garbage collection works would be helpful.

The garbage collector is non-blocking, i.e. you can run it in a secondary thread while the primary thread continues to execute the script. (of course, in this way you cannot compile the library with AS_NO_THREADS to turn off support for multithreading).

I guess that assumes that malloc and free are non-blocking, which Im not sure but I have always assumed that they are not (as it would be very difficult to write allocators, which are non-blocking, at least fast ones using linked lists for example)

This topic is closed to new replies.

Advertisement