The STL for game engine architecture

Started by
62 comments, last by msn12b 18 years, 5 months ago
What are the pros and cons of using the STL for game engine architecture? And how about writing our own "STL"? Recently I'm reading David H. Eberly's lastest published book about game engine architecture. The author said that the STL is so slow that it is not suitable for a game engine, writing a small "STL" is more desirable. For me, I would rather choose using the STL because I think writing and maintaining another "STL" is really time-consuming even it is small.
Advertisement
always use the STL unless you can PROVE, with numbers, that it is too slow. It is faster than what you would write. Authors who claim that it is too slow are not smart people and you should consider throwing their books away.
snk_kid did a test a while ago, Doom3's custom container system vs the STL and the STL came out (significantly iirc) faster.

So, unless you can prove in your case that the STL is the bottleneck I'd keep on using the STL as much as I can.
So, is there no obvious advantage of writing our own "STL" for game engine?
Unless you are targetting your game at a platform where STL is unavailable, no.

I don't think there are any current consoles where STL is not available.

Mark
The STL is made for general use, so in specific cases you can speed your code up by using specialized containers.
But for general use you can't be faster than the STL, because the STL has strict requirements of being very fast (there is no range-checking and everything that slows down your code).

Until you can prove that the STL is the factor that causes your program to be slow and not your code that uses the STL, go with the STL, otherwise write your custom containers that do better.
Quote:Original post by Glak
always use the STL unless you can PROVE, with numbers, that it is too slow. It is faster than what you would write. Authors who claim that it is too slow are not smart people and you should consider throwing their books away.


People who make blanket statements like this without checking the primary source should be thrown away.

Anytime you use STL components (or Java's container classes) for a high-performance application, you need to understand what the components are designed to do. Much of STL is designed to produce algorithms that meet certain constraints on asymptotic order. For example, "set" maintains a sorted collection to support an O(log n) insertion, deletion, and lookup for n elements. As with any asymptotic analysis, the "order" refers to a limit as n approaches infinity. Moreover, the order has a hidden constant. When you say the positive function f(n) is O(log n), that means f(n) <= C*log(n) for some positive constant C and for n >= n0 for some n0 "sufficiently large". The values "C" and "n0" are typically determined from empirical evidence--you run your program for increasingly larger values of n and analyze the results. In many applications, n is *not* sufficiently large to obtain the benefits of STL, and a hand-rolled substitute *specially designed for your application* will perform better. Computer graphics and computational geometry have always been about micromanaging your resources to boost performance--sometimes you just have to roll your own.

STL (and Java containers) also have potential issues with memory usage and memory management. My favorite example where both fail miserably is in "set". In my medical image applications, I have triangle meshes with millions of vertices. I need adjacency information for doing things like continuous level of detail, surface evolution, and intersection testing. My vertex class stored a "set" of pointers to triangles sharing the vertex. Both STL and Java required an inordinate amount of memory to store the adjacency information. It turns out the vertex "set" members contain on average about 6 elements (this is to be expected statistically for triangle meshes extracted from 3D images). The memory overhead for "set" was massive, but designed to support asymptotic speed constraints. Switching to a simple dynamically resizeable array with a default of 8 elements reduced the memory usage by half (from 1GB to 0.5GB). That said, other applications might not care about the memory overhead, in which case STL is acceptable.

A big issue with memory management for games on a console is that the designers and programmers need to specify a "memory budget" for each component. The graphics system needs to be handed a chunk of memory--that is all it gets. Similarly, the sound system, the AI system, and any other system of interest is given its budget. It is absolutely *essential* that each system not exceed its budget. If you use STL, or any other set of tools, that call 'new' and 'delete' in ways you cannot predict or control, you only risk exceeding your budget at an unexpected time. Chris Butcher, from Bungie, talked about these issues in detail at Game Tech in December 2004. A major problem was that their physics component (Havok) was not designed to be given a memory budget, and that caused havoc in their development of Halo 2. When I make such claims in my books, it is because I attend these conferences and pay attention to what developers are saying, and because I also get to experience some of the problems in my own company's contracts.

All this said, I do use STL sometimes. But like any practicing engineer will do, I profile my applications and I monitor memory usage. If STL code is the bottleneck for speed or memory, I will try to understand what my code is doing to cause that and I will roll my own substitute and test to make certain it does better than STL.

Try a custom allocator if you need extra speed before writing your own container. Chances are the allocation is the aspect that could be optimized most in a custom container.

As a real world example, my bot (www.omni-bot.com) uses A* for pathfinding, and STL for the internal data structures. My open list is an std::vector, my closed list is an stdext::hash_map.

Performance was pretty good, the STL structures haven't been the hot spots on my profiles for a while, but I wanted to experiment with custom allocators to see how much extra performance I could get with 2 extra lines of code.

For my benchmarking I wrote a double for loop that ran A* from every waypoint to every other waypoint. This would ensure a real world performance measure, and not some meaningless benchmark of inserting into a container a million times in a loop.

Anyways, I ran this benchmark in the largest map I had waypointed at the time.

Here were the results useing the raw data structures with default allocators:

with:
typedef stdext::hash_map<int, Waypoint*> WaypointHashMap;

generated 333506 paths in 59.764768 seconds: 5580.311141 paths/second

And then the results by simply using the boost::fast_pool_allocator:
typedef stdext::hash_compare<int> HashMapCompare;
typedef boost::fast_pool_allocator< std::pair< const int, Waypoint* >, boost::default_user_allocator_new_delete, boost::details::pool::default_mutex, 512 > HashMapAllocator;
typedef stdext::hash_map<int, Waypoint*, HashMapCompare, HashMapAllocator > WaypointHashMap;

generated 333506 paths in 46.905756 seconds: 7110.129485 paths/second

Personally I'd much rather use 1-2 lines of code and get a ~20% performance improvement(or more, this is the only allocator I've tried so far) than attempt to custom write a container that will end up less flexible than STL.

Understanding STL, using the right containers for the job, and custom allocators if you need more speed/memory allocation control IMO are the the requirements you should cover before reinventing them.

My 2c

[Edited by - DrEvil on October 23, 2005 11:19:25 AM]
Quote:Original post by DrEvil
As a real world example, my bot (www.omni-bot.com) uses A* for pathfinding, and STL for the internal data structures. My open list is an std::vector, my closed list is an stdext::hash_map.

You are using a std::vector as a O(n) priority queue and tell others about performance? ;)
No. I'm using std::push_heap along with the vector. Don't make assumptions.

This topic is closed to new replies.

Advertisement