Optimizing char translation

Started by
5 comments, last by doynax 19 years, 3 months ago
Is it possible to optimize the following piece of code? There are many unnecessary string copies...


std::wstring ascii_cast(const char *Ascii)
{
	// Get number of char's in the string
	size_t Length = std::strlen(Ascii);

	// Create an temporarily buffer
	std::vector<wchar_t> Wide(Length + 1);

	// Convert 'Ascii' to Unicode
	MultiByteToWideChar(CP_ACP, 0, Ascii, -1, &Wide[0], (int)Length + 1);

	// Create an temporarily wstring
	std::wstring WideStr(&Wide[0]);

	// Return the Unicode string
	return WideStr;
}



Advertisement
Doesn't std::wstring inherit from std::vector anyway, so shouldn't it be possible to convert directly onto the result string? Lets say by calling the reseve() method to ensure that the string is large enough and letting MultiByteToWideChar write directly into it.
Also, it should be more efficient to pass the wstring as reference instead of returning a copy.
If you really need to maximize the performance you could make an initial guess about the maximum string size and gradually convert more as needed, calling strlen() on memory mapped multi-megabyte documents is far from free.
> Doesn't std::wstring inherit from std::vector anyway

Imho not.


> so shouldn't it be possible to convert directly onto the result string

I dont' think there is a way to write directly to a std::string...


> Also, it should be more efficient to pass the wstring as reference instead of returning a copy.

Wouldn't the local copy of wstring destroy before?
Quote:Original post by TrueTom
Quote:Original post by doynax
Doesn't std::wstring inherit from std::vector anyway

Imho not.

Maybe not, and it's defeneatly not guaranteed. I just wanted to point out that it could be since it's nothing more special case of a vector in most implementations anyway.

Quote:from SGI's documentation on basic_string
Note that the C++ standard does not specify the complexity of basic_string operations. In this implementation, basic_string has performance characteristics very similar to those of vector: access to a single character is O(1), while copy and concatenation are O(N).

Quote:Original post by TrueTom
Quote:Original post by doynax
so shouldn't it be possible to convert directly onto the result string

I dont' think there is a way to write directly to a std::string...

Maybe not a safe way. But having I seriously doubt that you'll find a non-vector implementation, so a fast hack should be possible (maybe coupled with an #ifdef just in case you find one where it doesn't work)

Quote:Original post by TrueTom
Quote:Original post by doynax
Also, it should be more efficient to pass the wstring as reference instead of returning a copy.

Wouldn't the local copy of wstring destroy before?

Yes - and that's the problem. It creates a local copy of the string which is later once again copied to the return variable, which is probably copied once more when it's later processed by the callee.
A reference parameter would avoid this.

If performance becomes a problem and want to avoid any ugly hacks you should consider using a custom string class instread. That way you gain a lot of flexibility. Knowing the input string's size in advance would also help to speed things up a bit (that's my main problem with c-strings).
#include <malloc.h>std::wstring ascii_cast(const char* ascii, int length){ wchar_t* buffer = reinterpret_cast<wchar_t*>(_alloca(length * sizeof(wchar_t))); MultiByteToWideChar(CP_ACP, 0, ascii, length, buffer, length); return std::wstring(buffer, length);}

Commentary: caller probably has the length of the ASCII text, so make them pass it in. Allocates a temporary buffer *on the stack* (which is extremely fast), converts the text (if you have the length, no need to make MultiByteToWideChar compute it all over again!), and then tries to facilitate the return value optimization.
--God has paid us the intolerable compliment of loving us, in the deepest, most tragic, most inexorable sense.- C.S. Lewis
> A reference parameter would avoid this.

You are right, it can be avoided.

But returning the vector is shorter:

return &Wide[0];


Thank's for your help, performance isn't such a problem, just wanted to do it without wasting to much resources.
Quote:Original post by TrueTom
Thank's for your help, performance isn't such a problem, just wanted to do it without wasting to much resources.
Yeah, just do whatever you feel is easiest until you run into performance problems.

I'll post this anyway since I had fun writing it, maybe you'll need it someday.
typedef std::vector<char> aString;typedef std::vector<wchar_t> uString;bool asciiToUnicode(const aString &input, uString &output) {	size_t iSize = input.size();	output.reserve(iSize);	size_t oSize = MultiByteToWideChar(		CP_ACP,		0,		input.begin(),		-1,		output.begin(),		oSize	);	output.resize(oSize);	return !iSize || oSize;}

It's untested but apart from optimizing the vector allocations this should be about as fast as a MultiByteToWideChar wrapper will get.

This topic is closed to new replies.

Advertisement