Convert std::string to WCHAR ?

Started by
20 comments, last by iMalc 10 years, 11 months ago

This code is deeply broken. You're returning a pointer to memory that is freed when the function exits (because the temporary wstring will be destructed). It may seem to work, but that's the horrifying reality of undefined behavior - it often seems to work until it bites you viciously.

+1 for noticing this, I totally read over it to be honestly. Especially noteworth that this can, and will lead to many horrible things happening - for example, in my first resource cache class, I was storing the resources by LPCWSTR. It seemed to work fine while I was only hardcoding everything in the code, but after a while I got an issue where setting constant to my effects was only working sporadically. Turned out the LPCWSTR-keys got trashed, and whenever accessing an effect from the cache, a new isntance was loaded and stored in another key that became trashed, and so forward. Wouldn't even had thought about looking there in the first place, as it appeared to be a graphics-related error. Moral of the story: Always double-check on how you use your strings.

Advertisement

Here is my implementation:


		template <typename T>
			size_t SizeOfStr(T * str)
			{
				if(!str) return 0 ;
				size_t i = 0 ;
				while(str[i++]);
				return i ;
			}

		template <typename T , typename U >
			U * ConvertStr(T * t, size_t Size = 0)
			{
				if(!Size)
				{
					Size = SizeOfStr(t);
				}

				U * u = new U[Size]; 
				for(size_t i = 0 ; i < Size ; i++ ) u = U(t);
				return u ;
			}

Do not forget to clean resources once you are done.. smile.png

I solve my problem by doing the following:


std::wstring s2ws(const std::string& s)
{
	int len;
	int slength = (int)s.length() + 1;
	len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0);
	wchar_t* buf = new wchar_t[len];
	MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
	std::wstring r(buf);
	delete[] buf;
	return r;
}

#ifdef UNICODE
std::wstring stemp = s2ws(filename); // Temporary buffer is required
LPCTSTR L_filename = stemp.c_str();
#else
LPCTSTR L_filename = s.c_str();
#endif
	
D3DX11CreateShaderResourceViewFromFile(device, L_filename, NULL, NULL, &texture, NULL);

i just couldn't imagine such a simple thing to be that complicated.


i just couldn't imagine such a simple thing to be that complicated.

And you didn't do it as simple as:


std::wstring s2ws(const std::string& s)
{   
	return std::wstring(s.begin(), s.end());
}

#ifdef UNICODE
std::wstring stemp = s2ws(filename); // Temporary buffer is required
#else
std::Wstring stemp = s;
#endif
	
D3DX11CreateShaderResourceViewFromFile(device, stemp.c_str(), NULL, NULL, &texture, NULL);

like suggested because of... ?

behold the magic trick 1:

D3DX11CreateShaderResourceViewFromFileA(device, filename.c_str(), NULL, NULL, &texture, NULL);


Notice the final A? That will give you the (char*) version of the function as opposed to the WCHAR* version of it.

If you really want to use the WCHAR* .. the code is quite easy:

wstring l_filename(filename.begin(),filename.end());
D3DX11CreateShaderResourceViewFromFile(device, l_filename.c_str(), NULL, NULL, &texture, NULL);

Stefano Casillo
TWITTER: [twitter]KunosStefano[/twitter]
AssettoCorsa - netKar PRO - Kunos Simulazioni

Here is one nice small snippet.


const wchar_t* to_wide( const std::string& strToConvert ) {
  return std::wstring( strToConvert.begin(), strToConvert.end() ).c_str();
}
 

This code is deeply broken. You're returning a pointer to memory that is freed when the function exits (because the temporary wstring will be destructed). It may seem to work, but that's the horrifying reality of undefined behavior - it often seems to work until it bites you viciously.

You are right! :)

+1 for noticing it.

My C++ is a bit rusty ... but perhaps


std::wstring toWideString(std::string & str) {
  std::wstringstream ws;
  ws << str.c_str();
  return ws.str();
} 

My C++ is a bit rusty ... but perhaps

std::wstring toWideString(std::string & str) {
std::wstringstream ws;
ws << str.c_str();
return ws.str();
}

Should work, but is still more complicated than the wstring-ctor solution offered above many times.

Here is my implementation:


		template <typename T>
			size_t SizeOfStr(T * str)
			{
				if(!str) return 0 ;
				size_t i = 0 ;
				while(str[i++]);
				return i ;
			}

		template <typename T , typename U >
			U * ConvertStr(T * t, size_t Size = 0)
			{
				if(!Size)
				{
					Size = SizeOfStr(t);
				}

				U * u = new U[Size]; 
				for(size_t i = 0 ; i < Size ; i++ ) u = U(t);
				return u ;
			}

Do not forget to clean resources once you are done.. smile.png

Converting character code points doesn't work like that (at least for any code points requiring multiple bytes). If the input string is UTF-8 (and has any code points requiring more than one byte), that will not convert it right (For example, 0xC482 (UTF-8 "?") will be converted to 0x00C40082, which is not the right conversion).

[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

If the input string is ASCII, you can easily expand the string using the methods given above. However if the input string is unicode, that won't work. In unicode a single character can consist of multiple code points which in turn can consist of multiple bytes. Typically you don't think in characters when encoding strings, but in code points. This is because unicode can combine characters from two code points, for example ' and e become é (although in this particular case there is a code point for é as well). This is also why you can never compare if two unicode strings are equal without normalizing them first. Unicode also has over a million code points (although not all are used yet), but a byte can only hold 256 and a 16-bit wchar only 65536. In order to be able to store all million code points, both UTF-8 and UTF-16 (although lots of people seem to forget this about the latter) encode code points in one or more pairs. For UTF-8 this means a single code point can take 1, 2, 3 or 4 bytes, for UTF-16 it's always 2 or 4. That means when converting, you first have to decode the code point, then encode it again, there is no direct convertion byte by byte possible.

You could do this yourself, it's not that hard and a good exercise to make a UTF-8 to UTF-16 converter. I would be useful if you start mixing the two to actually understand how it works, rather then to copy/paste a piece of code that may or may not work, or code that will later fail (the wstring constructor trick seems to fail horribly if your string is non-ascii). Or lookup boost, it has some conversion stuff in it as well.

This topic is closed to new replies.

Advertisement