I've now more or less completed the new memory manager, which takes a twofold approach. Messages in the inter-thread message queue are now allocated from a special buffer rather than the general freestore; this buffer is managed by a lock-free LIFO stack which keeps track of allocations and frees. Other special memory is allocated via a custom allocator class for the std::stack used in the message caching system; this allocator uses a thread-local heap and thereby avoids locking on the general shared heap.
For a while I was skeptical about the results, as it seemed like a simple critical section-guarded allocation was going to be faster; but after a little bit of profiling and some careful tuning, I've gotten substantial results - the initial implementation could pump about 55,000 messages per second on my dev machine; the lock-free implementation regularly peaks out above 60,000. I consider that to be a solid win.
I've also fixed a handful of bugs in various areas, mainly in the parser. Somewhere in the back of my mind is a vague notion that I also added a cool feature, but I can't quite recall what it could have been, so that will probably forever remain a mystery.
And the ever-shrinking task list now looks thusly:
- Fix some bugs in nested response map support
- Improve syntax for nested structure initialization
- Type aliases (aka. typedefs)
- Change task IDs to string variables for easier metaprogramming
- Perform complete code review for exception safety, documentation, code cleanliness, error handling robustness, and elimination of hardcoded strings/magic numbers
These are mostly small jobs (aside from the code review obviously) so they should hopefully go pretty fast. I'm still not willing to speculate on a release schedule, but I'll definitely say that R7 will be out in the very near future.
I had the idea of reducing as many allocations/frees as I could across the entire codebase; the biggest consumer of tiny allocs is the r-value wrapper class, which propagates values from functions into other functions or into variables. Originally, the VM was designed to assign an r-value to every operation. However, it quickly became clear that this makes no sense, as the vast majority of operations were just returning a null.
So I introduced a second execution path in the VM, named ExecuteFast. This path does not allocate r-values except in very specific circumstances (i.e. when an r-value is truly, desperately needed). This has led to a substantial speedup in the VM.
Specifically, I've shattered the message throughput record. Current peak is just over 65,000 messages per second. Not bad for a 20 minute refactoring!
[Not Done Yet!]
Did some more quick hackery and optimization, and cranked the throughput up to 68,000 messages/second. I've also noticed several relatively straightforward memory allocation optimizations I could make. So at the moment I'm tempted to push back R7 in order to continue doing optimizations... but I'll try my best to resist and keep the major speed stuff for R8 or R9 [smile]