const char * or std::string?

Started by
27 comments, last by ToohrVyk 16 years, 1 month ago
Quote:Original post by osmanb
But that will never do pooled (shared) strings, which is a common and frequent approach to dealing with them on console. (I've worked on at least three different console codebases that basically did the same thing.) Make a new string class that stores a pointer to a shared instance of string data. When you construct one of those (eg, from a const char*), see if it's in the pool already. If so, just point at the shared copy (and increment ref-count). Otherwise, add it.

Benefits:

- Lower total memory cost. You don't pay anything extra for having N copies of any given string being referenced throughout your code.
- Constant-time comparison. Because of the semantics of the shared pool, you can just do pointer-equality to compare strings for equality. This is very, very nice.
Actually, implementations are tending away from using shared (ref counted) strings anymore. The problem is COW (copy on write). In the ever-growing world of multi-threading as multi-core CPUs become more common and with more cores etc, is that the thread synchronisation required to access these strings is slowing things down much more than simply copying the string outright in the first place does. Memory is vast and cheap too.
I agree it's a great idea for a single-threaded console app, but in the PC world it is definitely fading out.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Advertisement
Quote:Original post by ToohrVyk
Besides, it's quite trivial to generate a string literal wrapper class to handle literals without creating temporaries, having pointer problems, or having ownership problems, and simply change "const literal &" to "const std::string &" when you need to work on something other than a literal.
class literal{  const char *value;  literal(const literal &);public:  literal(const char *value) : value(value) { assert (value); }  operator const char*() const { return this -> value; }  operator std::string() const { return this -> value; }};void frobnicate(const literal & work){  std::ifstream input(work); // 'work' is always a valid string.}void test(const std::string &str){  frobnicate("Hello");     // No temporary value required.  frobnicate(str.c_str()); // Lifetime of the literal is shorter than                           // lifetime of str, and it is noncopyable}


I'm on the border between calling this really good advice and not understanding the point. Would you please summarize for me the benefit of using the literal class over const char*? The usage seems the same to me...
Quote:Original post by Spoonbender
Quote:Original post by exwonderEven if you solve the fragment, creating extra temporaries has the potential to bloat your code, so take a couple seconds to think about how your class interface can be designed to avoid it and you'll find yourself using const char*.

Or, you know, pass strings by reference. That way the problem with temporaries just... goes away. But of course, that'd be too easy, when we might otherwise have an excuse to shoot ourselves in the foot with lots and lots of pointers.


In the specific situation I had to deal with, IIRC you are right, using references made many of the temporaries go away, but this is because I was passing them through a large class hierarchy. Using pointers, especially const char* isn't shooting yourself in the foot, though. That's just a silly idea.

I'm talking about situations like this:
class NeedsToStoreAString{ std::string blah;public: //NeedsToStoreAString (std::string s) : blah(s) { } // passing by value is stupid! NeedsToStoreAString (const std::string& s) : blah (s) { } // leaps and bounds better if you're actually passing in std::strings. NeedsToStoreAString (const char* s) : blah(s) { } // lets you use literals without any problems.};

Now you can do this:
NeedsToStoreAString one ("one"); // no temporary created.NeedsToStoreAString two (std::string("two")); // this is what would happen without the extra constructorstd::string three ("three");NeedsToStoreAString (three); // works fine

How is adding a single extra constructor to the above shooting yourself in the foot in any way shape or form? I'm not disputing that std::string is superior for string manipulation. I'm disputing that it's useful for situations where you're passing literals.

I said this in another thread, but it'd be best if std::string and other string classes had explicit conversion constructors from const char*. Then you'd at least be forced to think about when you're making temporaries, at the cost of a little convenience. Also good would be placement-new style allocation, where you pass it a preallocated area of memory and size and it just uses that as its reserve (and cannot expand). I'd suggest taking these two considerations in mind if you're writing a string class for a console or embedded target.
Quote:Original post by exwonder
I'm on the border between calling this really good advice and not understanding the point. Would you please summarize for me the benefit of using the literal class over const char*? The usage seems the same to me...


There are several points:
  • It's a specific name used to represent a constant string represented as a C NUL-terminated character buffer, which always comes in handy when you need to look for all occurrences of that concept. Of course, this could also have been a typedef.

  • Unlike const char*, it cannot be null. You don't want to have to assert your string literals whenever you use them, so the constructor takes care of that.

  • It has no copy constructor, meaning that it will not allow storage of the literal beyond the lifetime which was granted to that literal by its creator. This comes in handy when you have code of the form:
    void assumes_string_literal(const char *c){  // store 'c' in a data structure somewhere.}void makes_no_assumptions(const char *c){  // Indirectly and opaquely calls 'assumes_string_literal(c)'  // The assumptions made there fail to "climb" back up the  // call tree and appear in this function's   // signature.}void mistaken(std::string s){  // This call 'feels safe' because the called function   // does not explicitly assume anything about the lifetime  // of the pointer.  makes_no_assumptions(s.c_str());     // Here, the string dies and the value stored by  // 'assumes_string_literal' becomes invalid. Program  // dies a flashy death.}


    Then, const char* becomes a notification of "I need ownership" (which you may represent with another type that asserts in its constructor, for further ease of use) and as such (non-third-party) functions of the form foo(const char*) will only safely be called when the argument is a true character literal.
Quote:Original post by ToohrVyk
Quote:Original post by exwonder
I'm on the border between calling this really good advice and not understanding the point. Would you please summarize for me the benefit of using the literal class over const char*? The usage seems the same to me...


There are several points:

I'd be up in the air about points one and two, but point 3 is pure gold. Thanks for the tip. (Though in practice using *literals* this way usually won't fail, but there's always the worry that you can still pass in character arrays that will indeed fail.)
Quote:Original post by exwonder
Making constructors that only take std::string is very very stupid if you ever pass in string literals. For every constructor or assignment function you make that takes std::string, make an identical one that takes const char*. You'll avoid a ton of useless temporaries and fragmentation this way


Except when you discover that the std::string functionality actually would be useful. And even when it isn't, you still shouldn't need to write the function out twice: just delegate the std::string one to the const char* one, passing the .c_str().

Quote:and it doesn't violate the "zomg const char* is so much harder to use than std::string" vibe that permeates these forums.


It's not a "vibe"; it's a very well documented and logically-to-be-expected phenomenon.

Quote:
I may sound bitter, but it's only because I've seen systems that were horribly bloated because of a stupid design choice to use a string class over const char*.


I would be willing to bet that what you actually saw were systems that were horribly bloated because of stupid implementation choices to do far more string manipulation work than was actually necessary to solve the problem.

Quote:With hashes you can do fast comparisons and lookups that just aren't possible with strings.


String comparison has a habit of failing fast. It usually works out fast enough. Profile first, anyway. And if you are using const char*'s for tokens (immutable strings), then an "interning" system will let you just compare pointer addresses.

Quote:in my home project I do this:
#define ConstID(a,b) a#ifdef NDEBUGinline hashID MakeID (hashID id, const char* pString) { return id; }#elseextern hashID MakeID (hashID id, const char* pString);#endif

In debug builds, MakeID will double check that the string and ID match.


You're willing to risk hash collisions in release mode?!?
Quote:Original post by Zahlman
I would be willing to bet that what you actually saw were systems that were horribly bloated because of stupid implementation choices to do far more string manipulation work than was actually necessary to solve the problem.

No, string manipulation in any form isn't the issue here. The issue is temporaries and fragmentation, both of which are created when you pass *any class that allocates in its constructor* by value or through an implicit conversion to a function that stores a copy in the class (most often the constructor).

Using pooled allocators is a solution, but as a wise man once said, the best solution to dealing with garbage is to not create any. You can avoid creating garbage by providing constructors that take const char* if you choose to pass in string literals. It's simple and easy, and yes you can delegate the functionality to a common function if you want (except in the case of initializer lists).

Quote:You're willing to risk hash collisions in release mode?!?

Collisions aren't the issue that's being checked. It's checking to make sure no one changed the string inside the MakeID macro without changing the hashed value.

A hash system is really very useful for reasons other than reduced memory usage and comparison speeds. You shouldn't throw it out with a hand-wavy "profile first" before seeing how it can help you.
Quote:Original post by ToohrVyk
There are several points:
  • It's a specific name used to represent a constant string represented as a C NUL-terminated character buffer, which always comes in handy when you need to look for all occurrences of that concept. Of course, this could also have been a typedef.

  • Unlike const char*, it cannot be null. You don't want to have to assert your string literals whenever you use them, so the constructor takes care of that.

  • It has no copy constructor, meaning that it will not allow storage of the literal beyond the lifetime which was granted to that literal by its creator. This comes in handy when you have code of the form:*** Source Snippet Removed ***

    Then, const char* becomes a notification of "I need ownership" (which you may represent with another type that asserts in its constructor, for further ease of use) and as such (non-third-party) functions of the form foo(const char*) will only safely be called when the argument is a true character literal.


I still don't see how the Literal class can disallow storage of the literal beyond its lifetime. Here's an example:
class Literal {public:	Literal( const char* value ) : value(value) {}	operator const char*() const { return this -> value; }	operator std::string() const { return this -> value; }private:	Literal( const Literal& literal );	const char* value;};struct Data {	const char* data;};Data data;void assumes_string_literal( const Literal& literal ){	// store 'c' in a data structure somewhere.	data.data = literal;}void makes_no_assumptions( const Literal& literal ){	// Indirectly and opaquely calls 'assumes_string_literal(c)'	// The assumptions made there fail to "climb" back up the	// call tree and appear in this function's 	// signature.	assumes_string_literal( literal );}void mistaken(std::string s){	// This call 'feels safe' because the called function 	// does not explicitly assume anything about the lifetime	// of the pointer.	makes_no_assumptions(s.c_str());	// Here, the string dies and the value stored by	// 'assumes_string_literal' becomes invalid. Program	// dies a flashy death.}int main () {	using namespace std;	string str( "SomeString" );	mistaken( str );        // prints corrupted data	cout << data.data;	return 0;}
void assumes_string_literal( const Literal& literal ){	// store 'c' in a data structure somewhere.	data.data = literal;}


Your code assumes a string which it can store, yet its argument is of a class which cannot be stored. The argument should be a const char*.

This topic is closed to new replies.

Advertisement