std::string vs std::wstring

Recommended Posts

I want to use the wstring version of the C++ stdlib, but I have a couple of worries. First, will the c_str() function work 'correctly', i.e.; will it return a pointer to a zero-terminated array of wchar_t? And will this work in the wide-character win32 api functions? I know that win32 hides whether you're using the wide- or ASCII- character strings by default, but I wanted to know if something like the following is correct;
std::wstring wstr = "hello, blah,blah..."; // for example
...
CreateDirectory(wstr.c_str(),NULL);



Share on other sites
Quote:
 Original post by webwraithI want to use the wstring version of the C++ stdlib, but I have a couple of worries. First, will the c_str() function work 'correctly', i.e.; will it return a pointer to a zero-terminated array of wchar_t? And will this work in the wide-character win32 api functions?I know that win32 hides whether you're using the wide- or ASCII- character strings by default, but I wanted to know if something like the following is correct;*** Source Snippet Removed ***
Yes and yes. std::wstring will give you a const wchar_t* for c_str(), and is fine to pass to Win32 functions.

Share on other sites
But don't assume that "---" is a wchar string. In fact, its a regular char string! L"---" creates a long char (wchar) string. I don't know exactly what you might want to google for more info, though. Just C++ Unicode gets a lot of irrelevant pages.

Share on other sites
If you really want Unicode independence, you may want something like

#ifdef UNICODEtypedef std::wstring tstring;#elsetypedef std::string tstring;#endiftstring str = TEXT("hello, blah,blah...");CreateDirectory(str.c_str(),NULL);

Similar to what Win32 does for all its functions (i.e., define the A or W variant depending on the UNICODE define). TEXT is a Windows define that prefixes the string with L if UNICODE is defined

Edit:
or typedef std::basic_string<TCHAR> tstring;

Share on other sites
Quote:
 Original post by Mike nlIf you really want Unicode independence, you may want something like*** Source Snippet Removed ***Similar to what Win32 does for all its functions (i.e., define the A or W variant depending on the UNICODE define). TEXT is a Windows define that prefixes the string with L if UNICODE is definedEdit:or typedef std::basic_string tstring;

I used to do that. Unfortunately, after using it for a while, I realized that it actually doesn't create Unicode independence. It actually creates a situation where you have code that works with Unicode or with ASCII, but never both and creates more maintenance overhead than it saves.

What you want, instead, if you want Unicode independence is methods and classes that are templated based on char type (e.g.: char vs. wchar_t) and then you make use of std::basic_string<CharType> in all of your code. You will need to make overloads of some of the common functions, since none of the standard C functions have overloads, since C doesn't support overloading. Such as:

template <class TCharType>void doSomething(const std::basic_string<TCharType>& str){   ...}

Share on other sites
You could also use UTF8 strings, which means that for normal (inline) strings, you can still use the "---" notation, as long as you don't use any characters with code > 127 inside your source files. All characters with a code > 127 would be represented using a combination of multiple >127 characters, but it's generally a good idea to not use those in inline strings.

Most string operations, like searching for a specific character, or character sequence still work like with normal char arrays, only some things, like reversing a string, may be a bit trickier.

This is what glib/gtk uses, and it works really well.

Share on other sites
Quote:
 Original post by RydinareWhat you want, instead, if you want Unicode independence is methods and classes that are templated based on char type (e.g.: char vs. wchar_t) and then you make use of std::basic_string in all of your code.

I wrote something for char type "independence" after stumbling upon partial template specialization.

There's probably a much more easier way to achieve it but it gave me some practice and more headaches with boost preprocessor anyway. It does require a macro around the Windows function of interest which unfortunately kills intellisense in VS, and tends to clutter up the code if you're doing a lot of string manipulation with the Windows functions.

EDIT: Forum seems to be eating the backslashes in the macros so I've pasted it here

The IfThenElse class is from the Josuttis book:

// copied from C++ Templates: A Complete Guide// template<bool cond, class TrueArg, class FalseArg>struct IfThenElse;template<class TrueType, class FalseType>struct IfThenElse<true, TrueType, FalseType>{	typedef TrueType type;};template<class TrueType, class FalseType>struct IfThenElse<false, TrueType, FalseType>{	typedef FalseType type;};

An example usage
template<class Elem>void GetModuleDirectory(const std::basic_string<Elem>& module, std::basic_string<Elem>& dir){    Elem buf[MAX_PATH] = {0};    HMODULE module = WIN_FUNC(GetModuleHandle)(module.c_str());    if(module)    {        WIN_FUNC(GetModuleFileName)(module, buf, MAX_PATH);        WIN_FUNC(PathRemoveFileSpec)(buf);        WIN_FUNC(PathAddBackslash)(buf);        dir = buf;    }}

Share on other sites
Wow, thanks to all for the replies! In actual fact, I intend to just use Unicode, rather than make it compatible with ASCII/whatever, so "independance" isn't too high on my list of requirements. The only real help I needed was with the difference between std::string and std::wstring, and what I needed to watch out for when using the latter instead of the former.

Share on other sites
Quote:
Quote:
 Original post by RydinareWhat you want, instead, if you want Unicode independence is methods and classes that are templated based on char type (e.g.: char vs. wchar_t) and then you make use of std::basic_string in all of your code.

I wrote something for char type "independence" after stumbling upon partial template specialization.

There's probably a much more easier way to achieve it but it gave me some practice and more headaches with boost preprocessor anyway. It does require a macro around the Windows function of interest which unfortunately kills intellisense in VS, and tends to clutter up the code if you're doing a lot of string manipulation with the Windows functions.

EDIT: Forum seems to be eating the backslashes in the macros so I've pasted it here

The IfThenElse class is from the Josuttis book:

*** Source Snippet Removed ***

An example usage
*** Source Snippet Removed ***

Interesting. Can you show the code for WIN_FUNC (I assume a macro) and how you tied in the IfThenElse template? Thanks.

Share on other sites
It's the first link (pastebin) in my post above.

Share on other sites
Quote:
 Original post by adeyblueIt's the first link (pastebin) in my post above.

Gotcha. Sorry, missed that earlier. That's pretty slick. It hadn't occurred to me to try to use such a technique to call the proper version. Nice work.

That will easily take care of the Win32 cases, because of the consistent style. I suppose the technique could also be applied to standard C functions with some care.

Share on other sites
Would there be any advantage, then, to using std::wstring over std::string? Would an application that just outputs characters to the bottom half of a byte value be recognized as UTF-8, instead of/as well as ASCII?

Share on other sites
If you're just storing the string then I don't know if that would be a problem. What would be a problem is if you were manipulating the string in some way (substrings, case conversions, retrieving the length) as the routines would be operating on the byte rather than character level.

Create an account

Register a new account

• Forum Statistics

• Total Topics
628290
• Total Posts
2981858

• 11
• 10
• 10
• 11
• 17