const char * or std::string?

Started by
27 comments, last by ToohrVyk 16 years, 1 month ago
So memory fragmentation is something that I don't know alot about.

For example, you can have platforms that have virtual memory and others that don't, and memory fragmentation behaves differently.

Also when a vector re-allocates it's size, in most cases it actually moves to a new spot in memory? (the entire vector) leaving the old spot not usable...

So in that case, it's really important to call reserve and attempt to guess at the biggest size your vector should be. This is really important for programs or games that have a vector that is changing size all the time (push_back, and swap deleting only, otherwise use a list)

I'm assuming list only allocates memory per item in the list, so there should be little to no fragmentation here, it should also re-use the memory when your removing and re-adding items..? Vector will also, that is until you push back into the vector and it has to re-allocate to support the new size.
Black Sky A Star Control 2/Elite like game
Advertisement
It would help if you gave an example. When it comes to function parameters, it is perfectly fine to take a const char*, if all you do for example is pass it through to the API that takes the filename. e.g.
void OpenFileAndDoStuff(const char *filename){    DoStuff(Open(filename)); // where Open takes a const char*}
By resorting to the lowest level type, you increase flexability as you can now use the function with a raw string literal, as well as a std::string, without having to go to the trouble of constructing any temporaries. IMO this is the best thing to do in the above situation.
However if you were to manipulate the filename within this function, such as to append a file extension, then it might be a good idea to have the interface take a std::string anyway.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Original post by ViperG

For example, you can have platforms that have virtual memory and others that don't, and memory fragmentation behaves differently.


Virtual memory has nothing to do with it. Allocation strategy determines fragmentation.

Quote:Also when a vector re-allocates it's size, in most cases it actually moves to a new spot in memory? (the entire vector) leaving the old spot not usable...


Nope. std::allocator calls new to claim memory, and delete to release it. Memory released this way becomes available again.

Quote:I'm assuming list only allocates memory per item in the list, so there should be little to no fragmentation here, it should also re-use the memory when your removing and re-adding items..? Vector will also, that is until you push back into the vector and it has to re-allocate to support the new size.


List is absolutely most horrible with regard to fragmentation. A custom allocator will improve performance some 50-times, if it allocates the nodes in some pre-allocated storage.

Quote:Vector will also, that is until you push back into the vector and it has to re-allocate to support the new size.


Vector will allocate more elements than it currently needs, and will rarely reduce storage size. This means that it will allocate and release rarely.

List on the other hand will be making allocations on every addition, and release on every erase. These will all be small allocations, which are horrible.

Consider list<int>. On every addition, it needs to allocate new int. Each time it's removed, this int is deleted. After frequent updates, list will leave 4 byte holes across all memory.

Vector on the other hand will eventually allocate a single array large enough and in single block. Even more, adding or removing an element requires no allocations at all.

This also has beneficial effect on locality. Data in vector is continuous. Traversing it gains the benefits of high cache hit rate. List is scattered and non-continuous. Traversing a list will be considerably slower, even if the traversal looks the same, most of the time you'll be getting cache misses, which will kill performance.
Making constructors that only take std::string is very very stupid if you ever pass in string literals. For every constructor or assignment function you make that takes std::string, make an identical one that takes const char*. You'll avoid a ton of useless temporaries and fragmentation this way, and it doesn't violate the "zomg const char* is so much harder to use than std::string" vibe that permeates these forums.

You really should care about fragmentation if you're going anywhere near a console or embedded platform, and it's good practice to avoid it anyway. If you pass a std::string by value you're guaranteed a fragment if it stores a copy anywhere inside the class, unless you use a pooled allocator. Even if you solve the fragment, creating extra temporaries has the potential to bloat your code, so take a couple seconds to think about how your class interface can be designed to avoid it and you'll find yourself using const char*.

Where all of these std::string zealots are (mostly) right is that you don't want to be manipulating fixed size char arrays if you have access to std::string... you want to be manipulating (ideally fixed size) std::strings. I'd try to use a std::string that uses alloca for allocations if I was doing anything in local functions and pass in a reserve on construction, again to avoid fragmentation.

I may sound bitter, but it's only because I've seen systems that were horribly bloated because of a stupid design choice to use a string class over const char*. Anyone who tells you that std::string is better than const char* in every situation is horribly mistaken.

---- EDIT ----

By the way, you should consider using hashes instead of strings (regardless of type) whenever possible. Some decent public domain code for this is:
http://burtleburtle.net/bob/c/crc.c

With hashes you can do fast comparisons and lookups that just aren't possible with strings. It also forces you into a bit better usage pattern, and cuts down on your memory usage and storage. It's easy to wrap the code above into a macro or inline function that makes working with the hashes a bit easier... in my home project I do this:
#define ConstID(a,b) a#ifdef NDEBUGinline hashID MakeID (hashID id, const char* pString) { return id; }#elseextern hashID MakeID (hashID id, const char* pString);#endif

In debug builds, MakeID will double check that the string and ID match. I have a macro in my editor that generates the hashes from a string for me very quickly and easily. In the example of filenames, I'd store a table of contents as a tree of hashed directory and file names in my pack file format, then convert all path names to hashes and reference them that way.
C-style string handling really isn't that difficult once you get down to it. To be utterly honest with you, at work I mix and match C-style strings with stl strings all the time. Not because it's fun or exciting, but because it's necessary. Suggest you read up on the C string API.

STL strings preallocating a small buffer of a few bytes is not specified in the standard, and so you cant rely on that behaviour. Believe me when I say that developing a fast stl for the developers is the last thing on a console vendors mind. Professional game developers do write their own string class, and it's written to use C style strings. Now it doesn't matter which compiler vendor you're using for which platform, your code will always be the same.

If you're worried about memory fragmentation then it's time to think about memory management algorithms, memory managers and custom allocators; not strings and virtual memory. Typical game development involves many small object allocations, and thus a memory management scheme which favours this behavour is the best way to go (hint: memory pooling).

The best thing you can do for practise is to get in on the homebrew development scene for some portable console you have. There's no point in being paranoid unless you know what problems you're facing, that's impractical and is going to make your life more difficult than it needs to be. The GP32X would probably be a good start if you dont already have something in mind, and should be relatively friendly compared to something like the PSP or NDS. The fact that out of the box it's open is also a plus.

Hope that helps. Good luck!
Quote:Original post by exwonderEven if you solve the fragment, creating extra temporaries has the potential to bloat your code, so take a couple seconds to think about how your class interface can be designed to avoid it and you'll find yourself using const char*.

Or, you know, pass strings by reference. That way the problem with temporaries just... goes away. But of course, that'd be too easy, when we might otherwise have an excuse to shoot ourselves in the foot with lots and lots of pointers.

Quote:Professional game developers do write their own string class, and it's written to use C style strings.

Professional game developers do a lot of stupid things.
Some of the worst code I've seen have been from professional games.
There can be reasons to stick with C style strings (like you say, if your platform doesn't have access to a sane STL implementations), but saying "real game developers do it" isn't much of an argument.
Quote:Original post by Spoonbender
That way the problem with temporaries just... goes away.


It's still there.
void frobnicate(const std::string & work){  std::ifstream input(work.c_str());  // ...}frobnicate("Hello");
I guess a good approach in terms of performance-versus-work here would be to use a const char* argument if you can, and std::string if you must. Since you can always transparently convert a const char* argument to an std::string, because of the implicit constructor, the only difficulty of the approach is determining whether you need std::string functionality inside the function or not.

Besides, it's quite trivial to generate a string literal wrapper class to handle literals without creating temporaries, having pointer problems, or having ownership problems, and simply change "const literal &" to "const std::string &" when you need to work on something other than a literal.
class literal{  const char *value;  literal(const literal &);public:  literal(const char *value) : value(value) { assert (value); }  operator const char*() const { return this -> value; }  operator std::string() const { return this -> value; }};void frobnicate(const literal & work){  std::ifstream input(work); // 'work' is always a valid string.}void test(const std::string &str){  frobnicate("Hello");     // No temporary value required.  frobnicate(str.c_str()); // Lifetime of the literal is shorter than                           // lifetime of str, and it is noncopyable}
Quote:Original post by f8k8
I know most people will say you should always use std::string over const char *, but for things like filenames, would const char * be preferred? I'm thinking for cross-platform, especially consoles with limited memory, where creating std::strings would potentially create small allocations on the heap, just before files were loaded in, thus fragmenting memory.


Just make your own string class which shares a common memory pool. [smile]
while (tired) DrinkCoffee();
Quote:Original post by polymorphed
Quote:Original post by f8k8
I know most people will say you should always use std::string over const char *, but for things like filenames, would const char * be preferred? I'm thinking for cross-platform, especially consoles with limited memory, where creating std::strings would potentially create small allocations on the heap, just before files were loaded in, thus fragmenting memory.


Just make your own string class which shares a common memory pool. [smile]


No! std::string already does this itself as long as you supply a pool allocator (boost::pool_allocator for instance).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:Original post by swiftcoderNo! std::string already does this itself as long as you supply a pool allocator (boost::pool_allocator for instance).


But that will never do pooled (shared) strings, which is a common and frequent approach to dealing with them on console. (I've worked on at least three different console codebases that basically did the same thing.) Make a new string class that stores a pointer to a shared instance of string data. When you construct one of those (eg, from a const char*), see if it's in the pool already. If so, just point at the shared copy (and increment ref-count). Otherwise, add it.

Benefits:

- Lower total memory cost. You don't pay anything extra for having N copies of any given string being referenced throughout your code.
- Constant-time comparison. Because of the semantics of the shared pool, you can just do pointer-equality to compare strings for equality. This is very, very nice.

On the other hand, conversion to such a string from a const char* or other data is slow (usually a hash + table search), so you do have to be careful about usage. You want to avoid unncessary conversions to/from the shared string type.

This topic is closed to new replies.

Advertisement