Sign in to follow this  
f8k8

const char * or std::string?

Recommended Posts

f8k8    171
I know most people will say you should always use std::string over const char *, but for things like filenames, would const char * be preferred? I'm thinking for cross-platform, especially consoles with limited memory, where creating std::strings would potentially create small allocations on the heap, just before files were loaded in, thus fragmenting memory.

Share this post


Link to post
Share on other sites
osmanb    2082
Even on console, you still want to use things that make your life easier (and make coding simpler). As for fragmentation, you don't want one giant heap (that's always going to end badly), so you have multiple pools -- so allocations are sorted either by lifetime/function or by size. If you do that, then your temporary string allocations for things like filenames won't be in the same pool as file data anyways, so it doesn't really matter.

Share this post


Link to post
Share on other sites
Simian Man    1022
Don't make coding harder on yourself just because you *may* want to target some platform where std::string *may* become a problem. You should only worry about this level of optimization when you're sure it's a problem.

Anyways, I believe most std::string implementations actually keep a small char array around to avoid dynamic memory allocation for very small strings. So it may be a total non factor.

Share this post


Link to post
Share on other sites
Iftah    413
osmanb:
can you give pointers on using different heaps? (no pun intended)

suppose I have a code section which uses a few thousands new/delete pairs and I don't want to fragment the heap, how can I tell it to use an alternative heap?

temporarily overload the new/delete operators?

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by f8k8

where creating std::strings would potentially create small allocations on the heap, just before files were loaded in, thus fragmenting memory.


How long will these strings live?

If they are created and destroyed, it's unlikely they'll contribute much to fragmentation, especially if we're talking hundreds or thousands of such strings per application life-time.

Quote:
but for things like filenames


Ok, so you're traversing the disk 10 times a second, allocating tens of thousands of filenames..... Why? And if not, how many allocations/de-allocations are you making per second?


My experience with files has been that whenever you try to manipulate the filenames (concat, for example), char strings are absolute nightmare.

Consider this - you have your path string, to which you need to append '/', then filename, then extension. How will you handle this?

You can obviously create target buffer on stack (char[8192]; // just to be safe). Next, you need to juggle with strcat and similar mess, relying on strlen. One invalid string, and everything enters the realm of undefined behavior. And in the end, you need to create a properly sized buffer on heap, and pass that on.

So, in the end, you are doing exactly the same thing std::string does, except that you need to do the same thing *for every string manipulation*. And since releasing memory this way is unreliable, you'll write a wrapper class. And presto - you re-invented std::string.

-----

Alternate answer - if you need to ask this question, you're not ready to deal with such low level approach. If you knew the actual allocation issues, you wouldn't need to ask this, you'd evaluate allocation patterns, then write a std::allocator for your strings, and use that.

Quote:
especially consoles with limited memory


They'll be almost certainly using pre-allocated memory blocks, so the question how you access that becomes a moot point.

Quote:
suppose I have a code section which uses a few thousands new/delete pairs and I don't want to fragment the heap, how can I tell it to use an alternative heap?


For std::containers, there's std::allocator. For other purposes, see how templated or type based allocators work.

First way to improve performance is not to allocate anything. Only when you're absolutely positively sure that you cannot do what you want on stack, use pre-allocated storage.

Why do you have thousands of heap allocations. I manage to get by with dozens per application.

See Placement new

Share this post


Link to post
Share on other sites
SiCrane    11839
Quote:
Original post by Iftah
osmanb:
can you give pointers on using different heaps? (no pun intended)

suppose I have a code section which uses a few thousands new/delete pairs and I don't want to fragment the heap, how can I tell it to use an alternative heap?

temporarily overload the new/delete operators?


For the C++ standard library, std::basic_string, has three template arguments. You can use the third to specify an allocator. For example, you can use typedef std::basic_string<char, std::char_traits<char>, boost::pool_allocator<char> > MyPoolAllocatedString.

Share this post


Link to post
Share on other sites
Ashkan    451
I'd say this doesn't worth the extra effort. You can use strings with a custom allocator such as boost::pool_allocator and boost::fast_pool_allocator.

Share this post


Link to post
Share on other sites
BrianL    530
Depends entirely on your needs.

If you need primarily temporary strings, you could make a simple TString<MAX_PATH> class which created a string with an std::string style interface. With a template parameter specifying the internal array size, you wouldn't need to worry about stack allocations. An std::string link interface would make it easy to change between implementations.

The two extremes -

You could try to do away with strings entirely and switch over to an ID based approach (if you plan on having a huge number of identifiers, etc), no dynamic content, etc.

You could just go with std::strings and procrastinate on the issue until you hit problems.

Both are totally viable and depend on the needs of your project and what tradeoffs you need to make.

Share this post


Link to post
Share on other sites
dmatter    4821
Just use std::string, there's no reason not to, and if at a later date you do find that the string allocations are causing you problems (not that you're likely to) then just plug in a smarter allocator.
Writing your own string class, or manually managing char* arrays is going to cause you more headaches in both the short and long term; not to mention the probable loss in efficiency of doing it yourself.

Share this post


Link to post
Share on other sites
Cygon    1219
On top of that, most std::string implementations have a small buffer (12-16 characters) as a fixed array in the class. That means that for short strings, no heap allocations will take place at all while a char * approach would not only harass you with memory management, but actually _reduce_ performance.

-Markus-

Share this post


Link to post
Share on other sites
ViperG    206
So memory fragmentation is something that I don't know alot about.

For example, you can have platforms that have virtual memory and others that don't, and memory fragmentation behaves differently.

Also when a vector re-allocates it's size, in most cases it actually moves to a new spot in memory? (the entire vector) leaving the old spot not usable...

So in that case, it's really important to call reserve and attempt to guess at the biggest size your vector should be. This is really important for programs or games that have a vector that is changing size all the time (push_back, and swap deleting only, otherwise use a list)

I'm assuming list only allocates memory per item in the list, so there should be little to no fragmentation here, it should also re-use the memory when your removing and re-adding items..? Vector will also, that is until you push back into the vector and it has to re-allocate to support the new size.

Share this post


Link to post
Share on other sites
iMalc    2466
It would help if you gave an example. When it comes to function parameters, it is perfectly fine to take a const char*, if all you do for example is pass it through to the API that takes the filename. e.g.

void OpenFileAndDoStuff(const char *filename)
{
DoStuff(Open(filename)); // where Open takes a const char*
}
By resorting to the lowest level type, you increase flexability as you can now use the function with a raw string literal, as well as a std::string, without having to go to the trouble of constructing any temporaries. IMO this is the best thing to do in the above situation.
However if you were to manipulate the filename within this function, such as to append a file extension, then it might be a good idea to have the interface take a std::string anyway.

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by ViperG

For example, you can have platforms that have virtual memory and others that don't, and memory fragmentation behaves differently.


Virtual memory has nothing to do with it. Allocation strategy determines fragmentation.

Quote:
Also when a vector re-allocates it's size, in most cases it actually moves to a new spot in memory? (the entire vector) leaving the old spot not usable...


Nope. std::allocator calls new to claim memory, and delete to release it. Memory released this way becomes available again.

Quote:
I'm assuming list only allocates memory per item in the list, so there should be little to no fragmentation here, it should also re-use the memory when your removing and re-adding items..? Vector will also, that is until you push back into the vector and it has to re-allocate to support the new size.


List is absolutely most horrible with regard to fragmentation. A custom allocator will improve performance some 50-times, if it allocates the nodes in some pre-allocated storage.

Quote:
Vector will also, that is until you push back into the vector and it has to re-allocate to support the new size.


Vector will allocate more elements than it currently needs, and will rarely reduce storage size. This means that it will allocate and release rarely.

List on the other hand will be making allocations on every addition, and release on every erase. These will all be small allocations, which are horrible.

Consider list<int>. On every addition, it needs to allocate new int. Each time it's removed, this int is deleted. After frequent updates, list will leave 4 byte holes across all memory.

Vector on the other hand will eventually allocate a single array large enough and in single block. Even more, adding or removing an element requires no allocations at all.

This also has beneficial effect on locality. Data in vector is continuous. Traversing it gains the benefits of high cache hit rate. List is scattered and non-continuous. Traversing a list will be considerably slower, even if the traversal looks the same, most of the time you'll be getting cache misses, which will kill performance.

Share this post


Link to post
Share on other sites
exwonder    100
Making constructors that only take std::string is very very stupid if you ever pass in string literals. For every constructor or assignment function you make that takes std::string, make an identical one that takes const char*. You'll avoid a ton of useless temporaries and fragmentation this way, and it doesn't violate the "zomg const char* is so much harder to use than std::string" vibe that permeates these forums.

You really should care about fragmentation if you're going anywhere near a console or embedded platform, and it's good practice to avoid it anyway. If you pass a std::string by value you're guaranteed a fragment if it stores a copy anywhere inside the class, unless you use a pooled allocator. Even if you solve the fragment, creating extra temporaries has the potential to bloat your code, so take a couple seconds to think about how your class interface can be designed to avoid it and you'll find yourself using const char*.

Where all of these std::string zealots are (mostly) right is that you don't want to be manipulating fixed size char arrays if you have access to std::string... you want to be manipulating (ideally fixed size) std::strings. I'd try to use a std::string that uses alloca for allocations if I was doing anything in local functions and pass in a reserve on construction, again to avoid fragmentation.

I may sound bitter, but it's only because I've seen systems that were horribly bloated because of a stupid design choice to use a string class over const char*. Anyone who tells you that std::string is better than const char* in every situation is horribly mistaken.

---- EDIT ----

By the way, you should consider using hashes instead of strings (regardless of type) whenever possible. Some decent public domain code for this is:
http://burtleburtle.net/bob/c/crc.c

With hashes you can do fast comparisons and lookups that just aren't possible with strings. It also forces you into a bit better usage pattern, and cuts down on your memory usage and storage. It's easy to wrap the code above into a macro or inline function that makes working with the hashes a bit easier... in my home project I do this:
#define ConstID(a,b) a

#ifdef NDEBUG
inline hashID MakeID (hashID id, const char* pString) { return id; }
#else
extern hashID MakeID (hashID id, const char* pString);
#endif

In debug builds, MakeID will double check that the string and ID match. I have a macro in my editor that generates the hashes from a string for me very quickly and easily. In the example of filenames, I'd store a table of contents as a tree of hashed directory and file names in my pack file format, then convert all path names to hashes and reference them that way.

Share this post


Link to post
Share on other sites
TheGilb    372
C-style string handling really isn't that difficult once you get down to it. To be utterly honest with you, at work I mix and match C-style strings with stl strings all the time. Not because it's fun or exciting, but because it's necessary. Suggest you read up on the C string API.

STL strings preallocating a small buffer of a few bytes is not specified in the standard, and so you cant rely on that behaviour. Believe me when I say that developing a fast stl for the developers is the last thing on a console vendors mind. Professional game developers do write their own string class, and it's written to use C style strings. Now it doesn't matter which compiler vendor you're using for which platform, your code will always be the same.

If you're worried about memory fragmentation then it's time to think about memory management algorithms, memory managers and custom allocators; not strings and virtual memory. Typical game development involves many small object allocations, and thus a memory management scheme which favours this behavour is the best way to go (hint: memory pooling).

The best thing you can do for practise is to get in on the homebrew development scene for some portable console you have. There's no point in being paranoid unless you know what problems you're facing, that's impractical and is going to make your life more difficult than it needs to be. The GP32X would probably be a good start if you dont already have something in mind, and should be relatively friendly compared to something like the PSP or NDS. The fact that out of the box it's open is also a plus.

Hope that helps. Good luck!

Share this post


Link to post
Share on other sites
Spoonbender    1258
Quote:
Original post by exwonderEven if you solve the fragment, creating extra temporaries has the potential to bloat your code, so take a couple seconds to think about how your class interface can be designed to avoid it and you'll find yourself using const char*.

Or, you know, pass strings by reference. That way the problem with temporaries just... goes away. But of course, that'd be too easy, when we might otherwise have an excuse to shoot ourselves in the foot with lots and lots of pointers.

Quote:
Professional game developers do write their own string class, and it's written to use C style strings.

Professional game developers do a lot of stupid things.
Some of the worst code I've seen have been from professional games.
There can be reasons to stick with C style strings (like you say, if your platform doesn't have access to a sane STL implementations), but saying "real game developers do it" isn't much of an argument.

Share this post


Link to post
Share on other sites
ToohrVyk    1595
Quote:
Original post by Spoonbender
That way the problem with temporaries just... goes away.


It's still there.
void frobnicate(const std::string & work)
{
std::ifstream input(work.c_str());
// ...
}

frobnicate("Hello");
I guess a good approach in terms of performance-versus-work here would be to use a const char* argument if you can, and std::string if you must. Since you can always transparently convert a const char* argument to an std::string, because of the implicit constructor, the only difficulty of the approach is determining whether you need std::string functionality inside the function or not.

Besides, it's quite trivial to generate a string literal wrapper class to handle literals without creating temporaries, having pointer problems, or having ownership problems, and simply change "const literal &" to "const std::string &" when you need to work on something other than a literal.
class literal
{
const char *value;
literal(const literal &);
public:
literal(const char *value) : value(value) { assert (value); }
operator const char*() const { return this -> value; }
operator std::string() const { return this -> value; }
};

void frobnicate(const literal & work)
{
std::ifstream input(work); // 'work' is always a valid string.
}

void test(const std::string &str)
{
frobnicate("Hello"); // No temporary value required.

frobnicate(str.c_str()); // Lifetime of the literal is shorter than
// lifetime of str, and it is noncopyable
}

Share this post


Link to post
Share on other sites
polymorphed    272
Quote:
Original post by f8k8
I know most people will say you should always use std::string over const char *, but for things like filenames, would const char * be preferred? I'm thinking for cross-platform, especially consoles with limited memory, where creating std::strings would potentially create small allocations on the heap, just before files were loaded in, thus fragmenting memory.


Just make your own string class which shares a common memory pool. [smile]

Share this post


Link to post
Share on other sites
swiftcoder    18426
Quote:
Original post by polymorphed
Quote:
Original post by f8k8
I know most people will say you should always use std::string over const char *, but for things like filenames, would const char * be preferred? I'm thinking for cross-platform, especially consoles with limited memory, where creating std::strings would potentially create small allocations on the heap, just before files were loaded in, thus fragmenting memory.


Just make your own string class which shares a common memory pool. [smile]


No! std::string already does this itself as long as you supply a pool allocator (boost::pool_allocator for instance).

Share this post


Link to post
Share on other sites
osmanb    2082
Quote:
Original post by swiftcoderNo! std::string already does this itself as long as you supply a pool allocator (boost::pool_allocator for instance).


But that will never do pooled (shared) strings, which is a common and frequent approach to dealing with them on console. (I've worked on at least three different console codebases that basically did the same thing.) Make a new string class that stores a pointer to a shared instance of string data. When you construct one of those (eg, from a const char*), see if it's in the pool already. If so, just point at the shared copy (and increment ref-count). Otherwise, add it.

Benefits:

- Lower total memory cost. You don't pay anything extra for having N copies of any given string being referenced throughout your code.
- Constant-time comparison. Because of the semantics of the shared pool, you can just do pointer-equality to compare strings for equality. This is very, very nice.

On the other hand, conversion to such a string from a const char* or other data is slow (usually a hash + table search), so you do have to be careful about usage. You want to avoid unncessary conversions to/from the shared string type.

Share this post


Link to post
Share on other sites
iMalc    2466
Quote:
Original post by osmanb
But that will never do pooled (shared) strings, which is a common and frequent approach to dealing with them on console. (I've worked on at least three different console codebases that basically did the same thing.) Make a new string class that stores a pointer to a shared instance of string data. When you construct one of those (eg, from a const char*), see if it's in the pool already. If so, just point at the shared copy (and increment ref-count). Otherwise, add it.

Benefits:

- Lower total memory cost. You don't pay anything extra for having N copies of any given string being referenced throughout your code.
- Constant-time comparison. Because of the semantics of the shared pool, you can just do pointer-equality to compare strings for equality. This is very, very nice.
Actually, implementations are tending away from using shared (ref counted) strings anymore. The problem is COW (copy on write). In the ever-growing world of multi-threading as multi-core CPUs become more common and with more cores etc, is that the thread synchronisation required to access these strings is slowing things down much more than simply copying the string outright in the first place does. Memory is vast and cheap too.
I agree it's a great idea for a single-threaded console app, but in the PC world it is definitely fading out.

Share this post


Link to post
Share on other sites
exwonder    100
Quote:
Original post by ToohrVyk
Besides, it's quite trivial to generate a string literal wrapper class to handle literals without creating temporaries, having pointer problems, or having ownership problems, and simply change "const literal &" to "const std::string &" when you need to work on something other than a literal.
class literal
{
const char *value;
literal(const literal &);
public:
literal(const char *value) : value(value) { assert (value); }
operator const char*() const { return this -> value; }
operator std::string() const { return this -> value; }
};

void frobnicate(const literal & work)
{
std::ifstream input(work); // 'work' is always a valid string.
}

void test(const std::string &str)
{
frobnicate("Hello"); // No temporary value required.

frobnicate(str.c_str()); // Lifetime of the literal is shorter than
// lifetime of str, and it is noncopyable
}


I'm on the border between calling this really good advice and not understanding the point. Would you please summarize for me the benefit of using the literal class over const char*? The usage seems the same to me...

Share this post


Link to post
Share on other sites
exwonder    100
Quote:
Original post by Spoonbender
Quote:
Original post by exwonderEven if you solve the fragment, creating extra temporaries has the potential to bloat your code, so take a couple seconds to think about how your class interface can be designed to avoid it and you'll find yourself using const char*.

Or, you know, pass strings by reference. That way the problem with temporaries just... goes away. But of course, that'd be too easy, when we might otherwise have an excuse to shoot ourselves in the foot with lots and lots of pointers.


In the specific situation I had to deal with, IIRC you are right, using references made many of the temporaries go away, but this is because I was passing them through a large class hierarchy. Using pointers, especially const char* isn't shooting yourself in the foot, though. That's just a silly idea.

I'm talking about situations like this:
class NeedsToStoreAString
{
std::string blah;
public:
//NeedsToStoreAString (std::string s) : blah(s) { } // passing by value is stupid!
NeedsToStoreAString (const std::string& s) : blah (s) { } // leaps and bounds better if you're actually passing in std::strings.
NeedsToStoreAString (const char* s) : blah(s) { } // lets you use literals without any problems.
};

Now you can do this:
NeedsToStoreAString one ("one"); // no temporary created.
NeedsToStoreAString two (std::string("two")); // this is what would happen without the extra constructor
std::string three ("three");
NeedsToStoreAString (three); // works fine

How is adding a single extra constructor to the above shooting yourself in the foot in any way shape or form? I'm not disputing that std::string is superior for string manipulation. I'm disputing that it's useful for situations where you're passing literals.

I said this in another thread, but it'd be best if std::string and other string classes had explicit conversion constructors from const char*. Then you'd at least be forced to think about when you're making temporaries, at the cost of a little convenience. Also good would be placement-new style allocation, where you pass it a preallocated area of memory and size and it just uses that as its reserve (and cannot expand). I'd suggest taking these two considerations in mind if you're writing a string class for a console or embedded target.

Share this post


Link to post
Share on other sites
ToohrVyk    1595
Quote:
Original post by exwonder
I'm on the border between calling this really good advice and not understanding the point. Would you please summarize for me the benefit of using the literal class over const char*? The usage seems the same to me...


There are several points:
  • It's a specific name used to represent a constant string represented as a C NUL-terminated character buffer, which always comes in handy when you need to look for all occurrences of that concept. Of course, this could also have been a typedef.

  • Unlike const char*, it cannot be null. You don't want to have to assert your string literals whenever you use them, so the constructor takes care of that.

  • It has no copy constructor, meaning that it will not allow storage of the literal beyond the lifetime which was granted to that literal by its creator. This comes in handy when you have code of the form:
    void assumes_string_literal(const char *c)
    {
    // store 'c' in a data structure somewhere.
    }

    void makes_no_assumptions(const char *c)
    {
    // Indirectly and opaquely calls 'assumes_string_literal(c)'
    // The assumptions made there fail to "climb" back up the
    // call tree and appear in this function's
    // signature.
    }

    void mistaken(std::string s)
    {
    // This call 'feels safe' because the called function
    // does not explicitly assume anything about the lifetime
    // of the pointer.
    makes_no_assumptions(s.c_str());

    // Here, the string dies and the value stored by
    // 'assumes_string_literal' becomes invalid. Program
    // dies a flashy death.
    }



    Then, const char* becomes a notification of "I need ownership" (which you may represent with another type that asserts in its constructor, for further ease of use) and as such (non-third-party) functions of the form foo(const char*) will only safely be called when the argument is a true character literal.

Share this post


Link to post
Share on other sites
exwonder    100
Quote:
Original post by ToohrVyk
Quote:
Original post by exwonder
I'm on the border between calling this really good advice and not understanding the point. Would you please summarize for me the benefit of using the literal class over const char*? The usage seems the same to me...


There are several points:

I'd be up in the air about points one and two, but point 3 is pure gold. Thanks for the tip. (Though in practice using *literals* this way usually won't fail, but there's always the worry that you can still pass in character arrays that will indeed fail.)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this