Sign in to follow this  
Antonym

String vs char arrays

Recommended Posts

Concerning c++. Which one should I use? Which one is better? For example in the case I would want to store a character's name or check user input should I store them in a char array or a string(Right now those are the only two things I am concerned about with char arrays and strings but others might come up)? Is one better for certain things over the other and vice versa? Could someone point the best example? Thanks

Share this post


Link to post
Share on other sites
What platform? If its not a limited one (like gba, or something like taht), stick with strings. Easier to handle and less of a hassle. Then again, if you are using a lot of libraries to use your strings (and they require char arrays), consider using char*.

Share this post


Link to post
Share on other sites
The rule of thumb is to use std::string whenever possible. It's part of "good" C++. Checking user input (or any other form of text) is going to be a lot easier if it's a std::string rather than a char array. Example:
std::string name = getSomeName();
if (name == "Bob")
{
doSomething();
}

char* name = getSomeName();
if (strcmp(name, "Bob") == 0)
{
doSomething();
}


You tell me which of the two is cleaner (if you say the second one you're crazy [smile]).

Share this post


Link to post
Share on other sites
Windows xp would be the platform I think. So in this case I should stick to strings? I think I know what you mean, I've been working with directX and it always uses char*. That's something I've wondered though, why do people use char* instead of char[] I mean I sort of know how it works but why is the former method prefered?

Share this post


Link to post
Share on other sites
Quote:
Original post by skullfire
What platform? If its not a limited one (like gba, or something like taht), stick with strings. Easier to handle and less of a hassle. Then again, if you are using a lot of libraries to use your strings (and they require char arrays), consider using char*.


Have you been introduced to std::string::c_str()?

@ Antonym

Use std::string unless you can give an excellent reason not to. The alternative is lots of manual memory management, or fixed size strings.

Quote:

why do people use char* instead of char[] I mean I sort of know how it works but why is the former method prefered?


The two are quite different. char [] is used in a local function or type to create a fixed size array.

char * is used to point to a dynamically sized array, an array that must outlive its function or when passing any array to a function. You cannot actually pass an array to a function by value, you can only pass by reference or decay to pointer.

Quote:

I've been working with directX and it always uses char*.


You will find that many (maybe even the vast majority of) APIs use raw char arrays because it places the fewest demands on calling code. In particular, C APIs don't have a choice.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antonym
Windows xp would be the platform I think. So in this case I should stick to strings? I think I know what you mean, I've been working with directX and it always uses char*. That's something I've wondered though, why do people use char* instead of char[] I mean I sort of know how it works but why is the former method prefered?

Yes, but if you use std::string, you can retrieve a const char* from the string if you ever need by dong string.c_str(). An example:

std::string name = "Demon";
const char* str = name.c_str();

This means that you can easily convert whatever std::string you have to a const char* if you ever need to. Many libraries use const char* because of C code bases, and since C doesn't have std::string they can't use it.

[edit]

Quote:
Original post by Antonym
No.. :o


Here is a tutorial on std::string and how to use its awesomeness.
Here is a good reference for the std::string class that will tell you what each member function does and how to use it.

Share this post


Link to post
Share on other sites
Quote:
Original post by rip-off
Quote:
Original post by skullfire
What platform? If its not a limited one (like gba, or something like taht), stick with strings. Easier to handle and less of a hassle. Then again, if you are using a lot of libraries to use your strings (and they require char arrays), consider using char*.


Have you been introduced to std::string::c_str()?

@ Antonym

Use std::string unless you can give an excellent reason not to. The alternative is lots of manual memory management, or fixed size strings.

Quote:

why do people use char* instead of char[] I mean I sort of know how it works but why is the former method prefered?


The two are quite different. char [] is used in a local function or type to create a fixed size array.

char * is used to point to a dynamically sized array, an array that must outlive its function or when passing any array to a function. You cannot actually pass an array to a function by value, you can only pass by reference or decay to pointer.

Quote:

I've been working with directX and it always uses char*.


You will find that many (maybe even the vast majority of) APIs use raw char arrays because it places the fewest demands on calling code. In particular, C APIs don't have a choice.


Of course... i meant that if you are using full char array libraries, it might be more efficient just using plain char arrays instead of calling c_str function all the time. :)

Share this post


Link to post
Share on other sites
Quote:

... it might be more efficient just using plain char arrays instead of calling c_str function all the time.


You'd be surprised. C strings are extremely inefficient for operations that require the length to be known. std::string::length() {or size()} is a O(1) operation while strlen() is O(n).

If you are smart enough to avoid excessive copies of std::string instances then I wouldn't expect std::string to be particularly slower.

There is also programmer efficiency to consider. std::string is orders of magnitude easier to work with than trying to manually control the lifetime of raw char arrays. Code manipulating a c string is far more likely to contain bugs than the equivalent code that uses std::string.

Finally, whether code is "efficient" is something we decide after the fact. If the code runs acceptably, why put a lot of effort into improving the efficiency if you might not be able to measure the result.

Far better to wait until the code runs slowly. You can then run the code in a profiler and determine the actual bottlenecks - rather than guess and hope in advance.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antonym
Which one should I use? Which one is better?

std::string

Quote:
Original post by Antonym
For example in the case I would want to store a character's name or check user input should I store them in a char array or a string

You can't use a char array to store user input because you don't know how much the user will input. For example:

char input[1000]; // lets hope the user doesn't input more than 1000 characters
std::cin >> input; // if he does, we have a buffer overflow and thus a security hole

Code like this is responsible for 99% of the security holes in todays systems.

My advice is to NEVER use char* for data that varies at runtime, except when you have to interoperate with C APIs.

Share this post


Link to post
Share on other sites
Quote:
Original post by MikeTacular
The rule of thumb is to use std::string whenever possible. It's part of "good" C++. Checking user input (or any other form of text) is going to be a lot easier if it's a std::string rather than a char array. Example:
std::string name = getSomeName();
if (name == "Bob")
{
doSomething();
}

char* name = getSomeName();
if (strcmp(name, "Bob") == 0)
{
doSomething();
}


You tell me which of the two is cleaner (if you say the second one you're crazy [smile]).



hehe. I remember bits and pieces from my studies for my computer science degree. One of them was about ambiguity when overloading the "==" operator for comparision operations, because its C++ it could mean anything you like. Using the function strcmp described a little more about what the function did. For example in the first example its unclear (if you had never seen the string library before) whether the comparison is case sensitive or otherwise (I guess you could argue the same case for the second example, but at least the comparison is a little more descriptive as too what it is comparing).

also, when I was learning Java and was confronted with string classes, I got very frustrated with the higher level of abstraction it provided; I felt it was a little too much away from "just storing some numbers in an array". Of course they're big advantages to the string class, but sometimes a char array will do. I hate having to trawl through documentation detailing how to get an ascii character number out of a string when I've forgotten the function name, or visa versa when I know its probably stored as an ascii number anyway. (I know, because I'm the one who put it there)

I do truely believe there is nothing wrong with a char array, as there is nothing wrong with using the string class to store strings; its all about abstraction and who provides that abstraction, the api or the programmer. A bad programmer (or one of little understanding of how a string is interpreted by the hardware) is going to misuse a char array. If somebody looks over my code and frowns at my char array, and the only answer they can give as too why its bad is to say "you should be using the std:string class instead"; I'd like to belt them into submission until I get a more rational explanation out of them. One method should be used against the other when you're sure you know the differences and why you have choosen the method.

rant over.

Share this post


Link to post
Share on other sites
Quote:
Original post by moosedude
One of them was about ambiguity when overloading the "==" operator for comparision operations, because its C++ it could mean anything you like.

That argument never gets old, does it? :)
Operator overloading doesn't involve more ambiguity than using functions. strcmp could do ANYTHING as well.

Quote:
Original post by moosedude
Of course they're big advantages to the string class, but sometimes a char array will do.

There's so many differences, I don't know where to start... for example, Strings are immutable in Java, char arrays are not.

Quote:
Original post by moosedude
I hate having to trawl through documentation detailing how to get an ascii character number out of a string when I've forgotten the function name

That's where operator overloading kicks in.

std::string s = "hello";
char c = s[4]; // awesome, isn't it?

Unfortunately, Java doesn't provide operator overloading, so you have to call the member function get(index) or something. Just write "s." and then press Ctrl+Space in Eclipse, you'll find the function pretty quickly.

Quote:
Original post by moosedude
I do truely believe there is nothing wrong with a char array

Arrays are second class citizens in C and C++ which means you cannot pass them to functions, you cannot return them from functions, and you cannot assign them. Doesn't that bother you at all?

Share this post


Link to post
Share on other sites
I tend to be in disagreement when people say use std::string all the time, being bitten by string classes before because of memory fragmentation. First commercial project I worked on everyone has abused the string class all over, and last resort was to make it use its own heap for allocation, which needed to be 8MB! On a PS2, that is a pretty big chunk of memory to lose to strings!

Though I will say, developing for PC? Use std::string, or whatever string class you want if you want. I'll personally stick with char arrays so I know I'm in control!

Share this post


Link to post
Share on other sites
Quote:
Original post by rip-off
Like many of the standard container classes, std::basic_string has an allocator type parameter.


Still potentially gives you unnessisary memory overhead - A heap size of 64k does NOT give you a total of 64kb to play with, depends entirely on the number of allocations.

However on the other side of the argument, it could save you memory as you won't be allocating nearly as much as needed - but the heap you've created to allocate from is typically going to be a fixed max size, which would need to cater for worst case anyways.

PS: I normally won't go and tell anyone to use std::string or char arrays because one is better - simply because I don't really believe either is globally 'better'. I actually avoid ALL STL in my engine and game code as I prefer to be in full control of memory and keep allocation count to a minimum. In my toolchain - I don't care, I allocate freely, use STL, and don't worry myself since I want convenience there.

Share this post


Link to post
Share on other sites
- Do you need to modify strings?
- If the are modified, how?
- What is the life-cycle of your strings?
- What interoperability do you need (C fallback or similar)
- What are the characteristics of your memory?
- What type of computing power do you have?
- What are the usage patterns for your strings?
- What about internationalization?
- Which methodology are users of the code most familiar with?
- Which can you afford, how much will improper choice cost?
- Are you aware of portability issues of basic_string and even const char *?
- Is serialization/persistence important?

And more....

After understanding the implications of your entire problem domain, one can make an informed decision over which is optimal.

Note that either choice meets only a small number of above requirements. Sometimes neither is directly usable. Often, std::string is good enough, and covers more ground than plain char strings.

Share this post


Link to post
Share on other sites
Quote:
Original post by Richy2k
I'll personally stick with char arrays so I know I'm in control!

I'd prefer to write a fixed-length equivalent of std::string than piss all control over invariants (like keeping the string NUL terminated) into the wind, which is exactly what using char arrays directly would do.
Quote:
everyone has abused the string class all over

You'll get bitten by anything if you abuse it enough.

[Edited by - MaulingMonkey on November 8, 2008 12:31:27 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by moosedude
One of them was about ambiguity when overloading the "==" operator for comparision operations, because its C++ it could mean anything you like. Using the function strcmp described a little more about what the function did. For example in the first example its unclear (if you had never seen the string library before) whether the comparison is case sensitive or otherwise (I guess you could argue the same case for the second example, but at least the comparison is a little more descriptive as too what it is comparing).

While you could overload the operators to do something weird, why would you? Any sane programmer is going to make their overloaded operators logical. And since the STL is made by good programmers, any noob could easily guess correctly what == does. strcmp() is no more descriptive than ==. In ==, you know you are testing if one string is equal to another. strcmp() compares two strings. How is strcmp() more descriptive than ==? Besides, strcmp returns an integer, not a boolean, and dealing with -1, 0, and 1 when testing equality is much more cryptic than a simple true/false.

Quote:
Original post by moosedude
also, when I was learning Java and was confronted with string classes, I got very frustrated with the higher level of abstraction it provided; I felt it was a little too much away from "just storing some numbers in an array". Of course they're big advantages to the string class, but sometimes a char array will do. I hate having to trawl through documentation detailing how to get an ascii character number out of a string when I've forgotten the function name, or visa versa when I know its probably stored as an ascii number anyway. (I know, because I'm the one who put it there)

I personally am not a big fan of Java's string class implementation. Why? Because I like my operator overloading :). But this is C++ we are talking about, and even though some of the details are abstracted away in std::string, std::string gives you all the power and access you need if you want to go to a lower level. But in both C++ and Java, isn't abstraction kind of the whole point of OOP?

Quote:
Original post by moosedude
I do truely believe there is nothing wrong with a char array, as there is nothing wrong with using the string class to store strings; its all about abstraction and who provides that abstraction, the api or the programmer. A bad programmer (or one of little understanding of how a string is interpreted by the hardware) is going to misuse a char array. If somebody looks over my code and frowns at my char array, and the only answer they can give as too why its bad is to say "you should be using the std:string class instead"; I'd like to belt them into submission until I get a more rational explanation out of them. One method should be used against the other when you're sure you know the differences and why you have choosen the method.

Yes, I agree that there are definitely times where a char array is perfectly acceptable. If what you really need is a char array and not a whole string class, then go ahead and use a char array. But std::string is often simpler, cleaner, and more elegant. For this reason people should try to use std::string if they can, but if they can't it's not the end of the world.

Share this post


Link to post
Share on other sites
Quote:
Original post by MaulingMonkey
I'd prefer to write a fixed-length equivalent of std::string than piss all control over invariants (like keeping the string NUL terminated) into the wind, which is exactly what using char arrays directly would do.


I was actually tempted to do this, however I'd prefer to keep a template parameter for maximum size, rather than just have, say, 1024 as maximum size, which would make performing comparisons via ==, or any other operation, difficult. Probably possible and easy, but as of now, I don't know how to make the following work:


MyString< 128 > StringA;
MyString< 256 > StringB;

if( StringA == StringB )
{
// ...
}




Quote:
Original post by MaulingMonkey
You'll get bitten by anything if you abuse it enough.


Agreed [smile] But to prevent it happening with strings, I've strayed away for the meantime. Well, I lie slightly...as they are technically used inside of Python since I bind that to my engine and framework, however, high level string manipulation in a high level language is how I'll keep it for now.

Actually, I'll mention another reason I stick to char arrays personally. This is not a reason to stick entirely to them, but a reason to use them where you might have once considered using std::string:


struct CsResource
{
BcU32 ResourceType_; // 4
BcChar ResourceName_[128]; // 132
BcU32 ResourceID_; // 136
BcU32 DataOffset_; // 140
BcU32 DataSize_; // 144
void* pActualResource_; // 148

// Pad to 32 bytes
BcU8 Padding_[12];
};





CsResource is a structure stored in a file, packed tightly into an array of x amount of resources. Using char arrays, I don't need to mess about with varying length strings. (Just before someone points out, yes I know void* is not nessisarily fixed size, its something I'm busy fixing in my engine for 64-bit compatibility)

Share this post


Link to post
Share on other sites
Quote:

Probably possible and easy, but as of now, I don't know how to make the following work


It would look something like this. If you can't implement it (or similar things) through the public interface it might still be possible to make all MyString's friends of each other.


template <unsigned LMax, unsigned RMax>
bool operator==(const MyString<LMax>& lhv, const MyString<RMax>& rhv)
{
if (lhv.size() != rhv.size()) return false;
return std::equal(lhv.c_str(), lhv.c_str() + lhv.size(), rhv.c_str());
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Richy2k
CsResource is a structure stored in a file, packed tightly into an array of x amount of resources. Using char arrays, I don't need to mess about with varying length strings. (Just before someone points out, yes I know void* is not nessisarily fixed size, its something I'm busy fixing in my engine for 64-bit compatibility)


I remember doing that under DOS.

These days, considering the portability implications and the overall performance increase, I use proper serialization.

It's simple reasoning. I need to load data once, and use it million times. So I prefer to organize my data in format that is easiest for CPU to execute (endianess, pointer sizes, etc.), rather than format which is optimal for file layout.

In above example you have 32-bit ints, and in order to maintain compatibility with file format you need to pack the structure tighter, regardless of platform's preferred data alignment.

Even more, changes to code cause huge problems with backward compatibility. It's also problematic for variable-sized character encodings.

The whole format specification is also highly problematic because of manual padding and certain assumptions (char gets padded to 32 bits). General recommendation is to sort data largest to smallest to avoid such issues.

I'd consider such code to be incredibly fragile and unsuitable for anything besides shared memory IPC or MPI-style data passing. For serialization there exist methods which are not only exactly as fast (reading from disk is 1000 times slower than any justified processing during load), but are also backward-compatible and fully portable. For example, the 32/64/n-bit problem simply becomes a non-issue.

Also: std::vector<std::string> or something that uses string internally has exactly the same memory layout as you suggest.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Also: std::vector<std::string> or something that uses string internally has exactly the same memory layout as you suggest.


Don't mean to be arrogant, but that is an absolute load of crap. std::string, and std::vector allocate memory themselves, so just hide that from you.
Besides, std::string or vector in a struct, I can't use a single file operation to write or read them [smile] Path of least resistance.

Share this post


Link to post
Share on other sites
Time for a necro post!

Quote:
Original post by Richy2k
Quote:
Original post by Antheus
Also: std::vector<std::string> or something that uses string internally has exactly the same memory layout as you suggest.

Don't mean to be arrogant, but that is an absolute load of crap. std::string, and std::vector allocate memory themselves, so just hide that from you.

Surely you don't plan to use your void* just to point at various globals? And do you really plan to never have CsResources anywhere but on the stack?

Quote:
Besides, std::string or vector in a struct, I can't use a single file operation to write or read them [smile] Path of least resistance.

You can't do that with your current struct either without all the drawbacks mentioned.


Quote:
Original post by Richy2k
I was actually tempted to do this, however I'd prefer to keep a template parameter for maximum size, rather than just have, say, 1024 as maximum size, which would make performing comparisons via ==, or any other operation, difficult. Probably possible and easy, but as of now, I don't know how to make the following work:


Here's some stuff to get you started.

template < size_t M , typedef CharT = char > class fixed_string {
BOOST_STATIC_ASSERT( M > 0 ); // totally optional dude

std::size_t length; // not including NUL -- we count here to avoid O(N) size() checks
CharT data[M]; // should always be NUL terminated for c_str() to return a valid string
public:
CharT* begin() { return data+0; }
CharT* end() { return data+length; }
const CharT* begin() const { return data+0; }
const CharT* end() const { return data+length; }

template < typename InIterator >
void assign( InIterator begin, InIterator end ) {
length = std::distance(begin,end); // I'm lazy
assert( length+1 < M ); // include NUL
std::copy( begin, end, data );
data[length] = '\0';
}

fixed_string(): length(0) { data[0] = '\0'; }
fixed_string( const CharT* original ) { assign( original+0, original+strlen(original) ); }
template < size_t N > fixed_string( const CharT& original[N] ) {
// You may not want this constructor!
// literal optimized version of fixed_string( const char* )

// Double check that the body of the string has no NULs:
assert( std::find(original+0,original+N-1,'\0') == original+N-1 );
// Double check that the string does end in a NUL:
assert( original[N-1] == '\0' );
// If either of the above fail, it's probably not really a literal, but a char array.
// If you need to handle such a case, just remove this version of the ctor.

assign( original+0, original+N );
}
fixed_string( const std::basic_string<CharT>& original ) { assign( original.begin(), original.end() ); }
template < size_t N > fixed_string( const fixed_string<CharT,N>& original ) { assign( original.begin(), original.end() ); }
template < typename InIterator > fixed_string( InIterator begin, InIterator end ) { assign(begin,end); }

std::size_t size() const { return length; }
CharT& operator[]( std::size_t index ) { assert(index<length); return data[index]; }
CharT operator[]( std::size_t index ) const { assert(index<length); return data[index]; }

template < size_t N > fixed_string& operator+=( const fixed_string<CharT,N>& appendee ) {
assert( length+appendee.size()+1 < M ); // include NUL
std::copy( appendee.begin(), appendee.end(), data+length );
length += appendee.size();
return *this;
}
const char* c_str() const { return data; }
};

template < typename CharT, std::size_t LS, std::size_t RS >
inline bool operator==( const fixed_string<CharT,LS>& lhs, const fixed_string<CharT,RS>& rhs ) {
return (lhs.size() == rhs.size()) && std::equal( lhs.begin(), lhs.end(), rhs.begin() );
}

template < typename CharT, std::size_t LS, std::size_t RS >
inline bool operator!=( const fixed_string<CharT,LS>& lhs, const fixed_string<CharT,RS>& rhs ) {
return !(lhs==rhs);
}

template < typename CharT, std::size_t LS, std::size_t RS >
inline fixed_string<CharT,LS+RS> operator+( const fixed_string<CharT,LS>& lhs, const fixed_string<CharT,RS>& rhs ) {
// Should be NRVO optimizable by MSVC...
fixed_string<CharT,LS+RS> fs;
fs += lhs;
fs += rhs;
return fs;
}

template < typename CharT, std::size_t S >
inline std::basic_ostream<CharT>& operator<<( std::basic_ostream<CharT>& os, const fixed_string<CharT,S>& s ) {
os << s.c_str();
return os;
}

// No operator>> or getline() version because I am a lazy bum




HTH

(You can also get a similar effect with a custom allocator to std::basic_string, but that's a less straightforward, less efficient, and less understandable alternative -- both to use and to write -- in this case)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this