The STL for game engine architecture

Started by
62 comments, last by msn12b 18 years, 5 months ago
Quote:Original post by Dave Eberly
STL (and Java containers) also have potential issues with memory usage and memory management. My favorite example where both fail miserably is in "set". In my medical image applications, I have triangle meshes with millions of vertices. I need adjacency information for doing things like continuous level of detail, surface evolution, and intersection testing. My vertex class stored a "set" of pointers to triangles sharing the vertex. Both STL and Java required an inordinate amount of memory to store the adjacency information. It turns out the vertex "set" members contain on average about 6 elements (this is to be expected statistically for triangle meshes extracted from 3D images). The memory overhead for "set" was massive, but designed to support asymptotic speed constraints. Switching to a simple dynamically resizeable array with a default of 8 elements reduced the memory usage by half (from 1GB to 0.5GB). That said, other applications might not care about the memory overhead, in which case STL is acceptable.

A dynamically resizable array, like std::vector? Just because you didn't pick the right container in the first place doesn't automatically mean you need to move outside the standard library automatically.

Quote:
A big issue with memory management for games on a console is that the designers and programmers need to specify a "memory budget" for each component. The graphics system needs to be handed a chunk of memory--that is all it gets. Similarly, the sound system, the AI system, and any other system of interest is given its budget. It is absolutely *essential* that each system not exceed its budget. If you use STL, or any other set of tools, that call 'new' and 'delete' in ways you cannot predict or control, you only risk exceeding your budget at an unexpected time. Chris Butcher, from Bungie, talked about these issues in detail at Game Tech in December 2004. A major problem was that their physics component (Havok) was not designed to be given a memory budget, and that caused havoc in their development of Halo 2. When I make such claims in my books, it is because I attend these conferences and pay attention to what developers are saying, and because I also get to experience some of the problems in my own company's contracts.

Which can be addressed by creating a budgetted allocator, which is a lot easier than writing an entire set of your own container classes.

Quote:
All this said, I do use STL sometimes. But like any practicing engineer will do, I profile my applications and I monitor memory usage. If STL code is the bottleneck for speed or memory, I will try to understand what my code is doing to cause that and I will roll my own substitute and test to make certain it does better than STL.

I notice you leave out a step where you try a different STL type, allocator or algorithm. Does that mean you don't try it?

Now I haven't read your book, but I hope you present better arguments there then you did in your post here, because all I see here are reasons to learn how to use the standard library better, not good reasons for writing your own containers.
Advertisement
I've talked with a programmer from retro studios who worked on Metroid Prime and apparently they did not use STL since they said it was too slow, though they actually used Visual C++ 6.0 to compile if I remember correctly, so I'd imagine that could have definately been one of the sources of their problem. They also decided not to use any exceptions at all. Keep in mind that how good your STL implementation is relies on your compiler and the library implementations it provides.

Really though, I'd still recommend using the STL. Unless you can make a container which has different complexities or added functionalities of the STL containers which better fit your needs, then use it. If you need a dynamically sized array, use a vector (unless it's of bool), if you need a doubly-linked list, use a list, etc. The STL does not provide solutions to all problems, but the solutions that it does provide are [generally] designed very well, and the implementors of them are probably more experienced and spent more time on it than you, not to mention that they can work with non-portable optimizations which only apply to the specific compilers they are provided with, and they are heavily bug-checked.
You already said it, they used VC6. VC6 is now almost 8years old, it was released in the same year as the c++ standard has been finished (but even back in 1998 it was a very bad compiler). Every conclusion made of ouptuts from a compiler as worse as vc6 is garbage.

In the past there were valid reasons to write your own STL equivalent. Generally the reasons were related to how compilers handled templates. Code bloat used to be a real problem and if you were developing for a console this was of particular concern. New compilers are very good at reusing code, so good in fact that using templates can result in smaller code than using hand written specialized code.

Most of the other issues related to the performance of STL really have nothing to do with the STL its self but how people use it. Just read SiCrane’s post for an example. You need to know what to use and when to use it. Incorrect usage of any library be it yours, the STL, or some other library will result in inefficient code.

I only wish I had Dave Eberly math skills, but I completely disagree with his opinion of the STL. Correctly used with a good compiler the STL is an exceptionally fast, optimized, generic library. If you can use it then do so and don’t waste time trying to write your own, use that time to learn correct usage of STL.

Edit: Fix link to SiCranes post
Edit: Why cant I fix this link!
Glak, David Eberly is not "stupid" and his books gap a very deep rift between cursory "Game Programming in 21 Days!" and what it takes to create a commercial quality engine.

That said, attacking the STL around here is like throwing stones at a hornets nest!

Quote:Original post by Dave Eberly
Anytime you use STL components (or Java's container classes) for a high-performance application, you need to understand what the components are designed to do.


Please don't make it sound like the C++ STL is comparable in performance to the Java container classes. The template nature of the STL allows for much more optimization and the containers have significantly less overhead. It also allows for customization of the STL containers which is very important for the memory concern under discussion.


Quote:Original post by Dave Eberly
A big issue with memory management for games on a console is that the designers and programmers need to specify a "memory budget" for each component. The graphics system needs to be handed a chunk of memory--that is all it gets. Similarly, the sound system, the AI system, and any other system of interest is given its budget. It is absolutely *essential* that each system not exceed its budget. If you use STL, or any other set of tools, that call 'new' and 'delete' in ways you cannot predict or control, you only risk exceeding your budget at an unexpected time.


This is what I do not understand. You can control how all STL containers allocate and deallocate memory. Every one takes an allocator as a template parameter and optimized pre-built STL-compatible allocators are available in boost so you don’t have to write your own. Only the default allocator uses new & delete.
You could also use boost::array and then you have a std::vector compatible interface to a fixed-sized array, so you can still use all the STL algorithms on it.
You can also .reserve() the space in the non-associative containers (e.g . vector) ahead of time to prevent multiple allocations.


Quote:
Chris Butcher, from Bungie, talked about these issues in detail at Game Tech in December 2004. A major problem was that their physics component (Havok) was not designed to be given a memory budget, and that caused havoc in their development of Halo 2. When I make such claims in my books, it is because I attend these conferences and pay attention to what developers are saying, and because I also get to experience some of the problems in my own company's contracts.


That's a more troublesome issue. It is a fundamentally different problem when you have constrained memory. General solutions assume you have sufficient memory to solve the problem at hand. When you an insufficient amount of memory you can no longer find the optimal solution. You have to start using heuristics to select the best solution from what you can calculate. Tic-Tac-Toe AI vs. Chess AI.

To me, that says the Havok engine does not scale or the Halo2 team was asking for too much precision given their constrained environment. Did they say how they solved the problem?
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
Quote:The memory overhead for "set" was massive, but designed to support asymptotic speed constraints. Switching to a simple dynamically resizeable array with a default of 8 elements reduced the memory usage by half (from 1GB to 0.5GB).


Otherwise known as a "vector"?

Seems to me the choice of STL container to use was the problem in this case, not the STL itself. If the type of object you are storing is, say, 12 bytes a vector (dynamic array) of those things would be a little over numElements * 12 bytes in size (allowing for capacity being greater than size, and perhaps a few bytes of vector object overhead too). A set would have the overheard of two pointers and a colour enumeration per element, probably 12 bytes in total making the size of each element 24. Yeah, that's doubled the size of your container. The moral of the story: prefer vector to set if memory use is more important than lookup speed. Fairly basic stuff.

If you put lawnmower oil in a Formula One race-car, you're not going to get much performance out of it.

[Edited by - SunTzu on October 24, 2005 4:15:47 AM]
Quote:Original post by Anonymous Poster
The STL is made for general use, so in specific cases you can speed your code up by using specialized containers.


I know you made a fair comment about STL but i would just like to point out this recurring misconception or "common myth" here, just becaue something is generalized does not mean it cannot take advantage of optimizations of *special cases*.

This is exactly the case for STL, it is by it's very nature since with templates virtually all information about a type is available and it's standardized so library implementors can take advantage of there own compilers but still provide an invariant interface to the client.

I've been through this so many times before it gets boring. There are many advance techniques to take advantage of optimizations of *special cases* that have been around for a long while now that your average Joe bloggs C++ programmer just has no clue about. Hence your average Joe bloggs C++ programmer has virtually no chance of writing a customized replica that does any better (by the way STL is already customizable).

Just some of the the the things a typical implementation does:


  • Containers are parameterized by allocator type allowing for custom memory models/schemes, the allocator concept separates allocation/deallocation from construction/destruction for efficiency reasons (like in the case of std::vector). I bet half of the self proclaimed C++ programmers don't know how or knew you could do this.


  • If an allocator type is an empty type which is typically the case since allocators are most likely to be stateless this still may take up storage as a data member, known as code bloat. Most modern C++ compilers support an optimization called empty member/base optimization (EMO/EBO) most library implementors will take advantage of this optimization. Did you even know this problem/solution existed? i bet you didn't.

  • Each container employs a certain level of exception safe code, they typically apply the resource acquisition is initialization (RAII) idiom in slightly different way from the norm without sacrificing performance. When you wrote your custom solution did you even think about exceptions thrown such as for allocation failure among other things? i bet you didn't


  • Where ever there is a possibility, optimizitions for POD-types (POD = Plain old Datatype a technical term used in the standard) and/or contiguous memory are used, this all happens at compile-time.

    The canonical example is std::copy, where ever possible it will dispatch (at compile-time) to typically use memmove (not memcpy because the input and output ranges are permitted to overlap), failing that then different method of iteration can be used for random accessible iterators that is candidate for compiler optimizations such as loop unrolling, failing that then the most general method of copying is used. std::vector typically also does similarly internally.

    This is done using a simple template metaprogramming technique called type/tag dispatching with something called compile-time traits in particular iterators traits and type traits. Now most people don't know what a POD-type is let alone anything about template metaprogramming etc.


  • Most generic algorithms take advantage of particular iterator category and dispatches (at compile-time) to use a better implementation for those *special cases* this is using (as i mentioned earlier) with compile-time tag/type dispatching with iterator and/or type traits.


  • Even the containers use some form of template metaprogramming internally such as tag/type dispatching.



I could go on more but i think i've covered the major ones so now you get the point, for the average joe blogg C++ programmer who does not own a copy of the standard, and has no clue about these advance techniques and idioms in C++ (which takes about ~3 years to learn them all properly) don't waste your time, If it's there to use use it.

By the way yes std::vector on modern C++ compilers pwns Doom 3's equivalent.

[Edited by - snk_kid on October 24, 2005 6:38:17 AM]
Hey,

Yeah, there are some great discussions about STL use in games that we've had in the past. A google search should turn up a few of them, some of them contain some points that may not have been raised here yet.<br><br>That said, I'd have to advise using the STL _until_ there's a compelling reason not to use it. I have rolled my own list classes into my current project, but &#111;nly because they provide me with functionality that IIRC doesn't exist in the STL. That and some STL containers are a bitch to get working with the memory management scheme that I was using at some point. These are nowhere near as fast as the STL &#111;nes [for obvious reasons] when doing STL stuff, but they allow my functionality [the stuff that I do a lot of] to be used quickly and cleanly.<br><br>But yeah, maintaining even your own basic set of containers is a bitch to maintain. And it isn't nessecarily faster [but then again I'm not exactly writing my stuff for speed yet, and I am in no way an optomisation whiz].<br><br>–CJM
My opinnion is that use STL when ever it is possible.
If concerning performance, the numbers(let's think that you code your own core libraries that are fast) have so little difference with STL and your own library, but you have wasted a lot of time coding them.
STL offers good and usable types with little or no hassle.
I'm currently coding engine with C where you must code your own type libraries (like hashtables and such) and it's generally a pain.
If you still want to create your own STL, do it for purposes of study or either if you have much time in your hands ;)
--- sandelz
Quote:My opinnion is that use STL when ever it is possible.

Agree. Spend the time learning how/when/why to use STL instead of coding a replacement.
Quote:New compilers are very good at reusing code, so good in fact that using templates can result in smaller code than using hand written specialized code

I've heard this to be true (I.e all templates instanced with a pointer type can share the same code), but I haven't actually seen it working.

In one (PS2) game (Battlefield - Modern Combat) I manged to get the .elf size down from 6.5 mb to just over 3 mb by just modifying two class templates.
One was my own (~1 mb), the other was std::basic_string (~2 mb).

A common misperception is that templates = inlined code and indeed for the STL we were using this was true (STLPort).

We only had two instances of std::basic_string, char and unsigned short (for unicode strings).
So I split the longer std::basic_string methods (operator+, substr, find etc) into two header files (think .h and .cpp).
Then used explicit instanciation of the two types we used, all other source code is unaffected by this.

Now whenever I create a new templated class I tend to always split it into two files and use explicit instanciation.
In fact the new .elf was faster then the old (fitted more code in the cache I guess).

There was also a source with about 50 std::map<>::find calls (in a setup function), each and every one became inlined by the compiler (I think the compiled code for that function was around 128 kb!).
So I simply wrapped the map.find() call in another method and instead got 50 functions calls reducing the code size to nearly nothing.

If you're aware of these small things STL and templated classes can be very effective.

Whenever I use a method like find, erase and insert I tend to think: "Do I want 1 kb of inlined code right here?", if the answer is no (using the same method later etc), I create a wrapper function (which is more or less the same as letting the compiler decide when to inline).

Just my 2c

This topic is closed to new replies.

Advertisement