Sign in to follow this  
webwraith

std::string vs std::wstring

Recommended Posts

I want to use the wstring version of the C++ stdlib, but I have a couple of worries. First, will the c_str() function work 'correctly', i.e.; will it return a pointer to a zero-terminated array of wchar_t? And will this work in the wide-character win32 api functions? I know that win32 hides whether you're using the wide- or ASCII- character strings by default, but I wanted to know if something like the following is correct;
std::wstring wstr = "hello, blah,blah..."; // for example
...
CreateDirectory(wstr.c_str(),NULL);

Share this post


Link to post
Share on other sites
Quote:
Original post by webwraith
I want to use the wstring version of the C++ stdlib, but I have a couple of worries. First, will the c_str() function work 'correctly', i.e.; will it return a pointer to a zero-terminated array of wchar_t? And will this work in the wide-character win32 api functions?

I know that win32 hides whether you're using the wide- or ASCII- character strings by default, but I wanted to know if something like the following is correct;

*** Source Snippet Removed ***
Yes and yes. std::wstring will give you a const wchar_t* for c_str(), and is fine to pass to Win32 functions.

Share this post


Link to post
Share on other sites
If you really want Unicode independence, you may want something like

#ifdef UNICODE
typedef std::wstring tstring;
#else
typedef std::string tstring;
#endif

tstring str = TEXT("hello, blah,blah...");
CreateDirectory(str.c_str(),NULL);


Similar to what Win32 does for all its functions (i.e., define the A or W variant depending on the UNICODE define). TEXT is a Windows define that prefixes the string with L if UNICODE is defined

Edit:
or typedef std::basic_string<TCHAR> tstring;

Share this post


Link to post
Share on other sites
Quote:
Original post by Mike nl
If you really want Unicode independence, you may want something like

*** Source Snippet Removed ***
Similar to what Win32 does for all its functions (i.e., define the A or W variant depending on the UNICODE define). TEXT is a Windows define that prefixes the string with L if UNICODE is defined

Edit:
or typedef std::basic_string<TCHAR> tstring;


I used to do that. Unfortunately, after using it for a while, I realized that it actually doesn't create Unicode independence. It actually creates a situation where you have code that works with Unicode or with ASCII, but never both and creates more maintenance overhead than it saves.

What you want, instead, if you want Unicode independence is methods and classes that are templated based on char type (e.g.: char vs. wchar_t) and then you make use of std::basic_string<CharType> in all of your code. You will need to make overloads of some of the common functions, since none of the standard C functions have overloads, since C doesn't support overloading. Such as:



template <class TCharType>
void doSomething(const std::basic_string<TCharType>& str)
{
...
}

Share this post


Link to post
Share on other sites
You could also use UTF8 strings, which means that for normal (inline) strings, you can still use the "---" notation, as long as you don't use any characters with code > 127 inside your source files. All characters with a code > 127 would be represented using a combination of multiple >127 characters, but it's generally a good idea to not use those in inline strings.

Most string operations, like searching for a specific character, or character sequence still work like with normal char arrays, only some things, like reversing a string, may be a bit trickier.

This is what glib/gtk uses, and it works really well.

Share this post


Link to post
Share on other sites
Quote:
Original post by Rydinare
What you want, instead, if you want Unicode independence is methods and classes that are templated based on char type (e.g.: char vs. wchar_t) and then you make use of std::basic_string<CharType> in all of your code.

I wrote something for char type "independence" after stumbling upon partial template specialization.

There's probably a much more easier way to achieve it but it gave me some practice and more headaches with boost preprocessor anyway. It does require a macro around the Windows function of interest which unfortunately kills intellisense in VS, and tends to clutter up the code if you're doing a lot of string manipulation with the Windows functions.

EDIT: Forum seems to be eating the backslashes in the macros so I've pasted it here

The IfThenElse class is from the Josuttis book:


// copied from C++ Templates: A Complete Guide
//
template<bool cond, class TrueArg, class FalseArg>
struct IfThenElse;

template<class TrueType, class FalseType>
struct IfThenElse<true, TrueType, FalseType>
{
typedef TrueType type;
};

template<class TrueType, class FalseType>
struct IfThenElse<false, TrueType, FalseType>
{
typedef FalseType type;
};





An example usage

template<class Elem>
void GetModuleDirectory(const std::basic_string<Elem>& module, std::basic_string<Elem>& dir)
{
Elem buf[MAX_PATH] = {0};
HMODULE module = WIN_FUNC(GetModuleHandle)(module.c_str());
if(module)
{
WIN_FUNC(GetModuleFileName)(module, buf, MAX_PATH);
WIN_FUNC(PathRemoveFileSpec)(buf);
WIN_FUNC(PathAddBackslash)(buf);
dir = buf;
}
}



Share this post


Link to post
Share on other sites
Wow, thanks to all for the replies! In actual fact, I intend to just use Unicode, rather than make it compatible with ASCII/whatever, so "independance" isn't too high on my list of requirements. The only real help I needed was with the difference between std::string and std::wstring, and what I needed to watch out for when using the latter instead of the former.

Share this post


Link to post
Share on other sites
Quote:
Original post by adeyblue
Quote:
Original post by Rydinare
What you want, instead, if you want Unicode independence is methods and classes that are templated based on char type (e.g.: char vs. wchar_t) and then you make use of std::basic_string<CharType> in all of your code.

I wrote something for char type "independence" after stumbling upon partial template specialization.

There's probably a much more easier way to achieve it but it gave me some practice and more headaches with boost preprocessor anyway. It does require a macro around the Windows function of interest which unfortunately kills intellisense in VS, and tends to clutter up the code if you're doing a lot of string manipulation with the Windows functions.

EDIT: Forum seems to be eating the backslashes in the macros so I've pasted it here

The IfThenElse class is from the Josuttis book:

*** Source Snippet Removed ***

An example usage
*** Source Snippet Removed ***


Interesting. Can you show the code for WIN_FUNC (I assume a macro) and how you tied in the IfThenElse template? Thanks.

Share this post


Link to post
Share on other sites
Quote:
Original post by adeyblue
It's the first link (pastebin) in my post above.


Gotcha. Sorry, missed that earlier. That's pretty slick. It hadn't occurred to me to try to use such a technique to call the proper version. Nice work.

That will easily take care of the Win32 cases, because of the consistent style. I suppose the technique could also be applied to standard C functions with some care.

Share this post


Link to post
Share on other sites
Would there be any advantage, then, to using std::wstring over std::string? Would an application that just outputs characters to the bottom half of a byte value be recognized as UTF-8, instead of/as well as ASCII?

Share this post


Link to post
Share on other sites
If you're just storing the string then I don't know if that would be a problem. What would be a problem is if you were manipulating the string in some way (substrings, case conversions, retrieving the length) as the routines would be operating on the byte rather than character level.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this