Jump to content
  • Advertisement
Sign in to follow this  
TrueTom

Optimizing char translation

This topic is 4973 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is it possible to optimize the following piece of code? There are many unnecessary string copies...

std::wstring ascii_cast(const char *Ascii)
{
	// Get number of char's in the string
	size_t Length = std::strlen(Ascii);

	// Create an temporarily buffer
	std::vector<wchar_t> Wide(Length + 1);

	// Convert 'Ascii' to Unicode
	MultiByteToWideChar(CP_ACP, 0, Ascii, -1, &Wide[0], (int)Length + 1);

	// Create an temporarily wstring
	std::wstring WideStr(&Wide[0]);

	// Return the Unicode string
	return WideStr;
}



Share this post


Link to post
Share on other sites
Advertisement
Doesn't std::wstring inherit from std::vector anyway, so shouldn't it be possible to convert directly onto the result string? Lets say by calling the reseve() method to ensure that the string is large enough and letting MultiByteToWideChar write directly into it.
Also, it should be more efficient to pass the wstring as reference instead of returning a copy.
If you really need to maximize the performance you could make an initial guess about the maximum string size and gradually convert more as needed, calling strlen() on memory mapped multi-megabyte documents is far from free.

Share this post


Link to post
Share on other sites
> Doesn't std::wstring inherit from std::vector anyway

Imho not.


> so shouldn't it be possible to convert directly onto the result string

I dont' think there is a way to write directly to a std::string...


> Also, it should be more efficient to pass the wstring as reference instead of returning a copy.

Wouldn't the local copy of wstring destroy before?

Share this post


Link to post
Share on other sites
Quote:
Original post by TrueTom
Quote:
Original post by doynax
Doesn't std::wstring inherit from std::vector anyway

Imho not.

Maybe not, and it's defeneatly not guaranteed. I just wanted to point out that it could be since it's nothing more special case of a vector in most implementations anyway.

Quote:
from SGI's documentation on basic_string
Note that the C++ standard does not specify the complexity of basic_string operations. In this implementation, basic_string has performance characteristics very similar to those of vector: access to a single character is O(1), while copy and concatenation are O(N).

Quote:
Original post by TrueTom
Quote:
Original post by doynax
so shouldn't it be possible to convert directly onto the result string

I dont' think there is a way to write directly to a std::string...

Maybe not a safe way. But having I seriously doubt that you'll find a non-vector implementation, so a fast hack should be possible (maybe coupled with an #ifdef just in case you find one where it doesn't work)

Quote:
Original post by TrueTom
Quote:
Original post by doynax
Also, it should be more efficient to pass the wstring as reference instead of returning a copy.

Wouldn't the local copy of wstring destroy before?

Yes - and that's the problem. It creates a local copy of the string which is later once again copied to the return variable, which is probably copied once more when it's later processed by the callee.
A reference parameter would avoid this.

If performance becomes a problem and want to avoid any ugly hacks you should consider using a custom string class instread. That way you gain a lot of flexibility. Knowing the input string's size in advance would also help to speed things up a bit (that's my main problem with c-strings).

Share this post


Link to post
Share on other sites

#include <malloc.h>

std::wstring ascii_cast(const char* ascii, int length)
{
wchar_t* buffer = reinterpret_cast<wchar_t*>(_alloca(length * sizeof(wchar_t)));
MultiByteToWideChar(CP_ACP, 0, ascii, length, buffer, length);
return std::wstring(buffer, length);
}


Commentary: caller probably has the length of the ASCII text, so make them pass it in. Allocates a temporary buffer *on the stack* (which is extremely fast), converts the text (if you have the length, no need to make MultiByteToWideChar compute it all over again!), and then tries to facilitate the return value optimization.

Share this post


Link to post
Share on other sites
> A reference parameter would avoid this.

You are right, it can be avoided.

But returning the vector is shorter:

return &Wide[0];


Thank's for your help, performance isn't such a problem, just wanted to do it without wasting to much resources.

Share this post


Link to post
Share on other sites
Quote:
Original post by TrueTom
Thank's for your help, performance isn't such a problem, just wanted to do it without wasting to much resources.
Yeah, just do whatever you feel is easiest until you run into performance problems.

I'll post this anyway since I had fun writing it, maybe you'll need it someday.

typedef std::vector<char> aString;
typedef std::vector<wchar_t> uString;

bool asciiToUnicode(const aString &input, uString &output) {
size_t iSize = input.size();
output.reserve(iSize);

size_t oSize = MultiByteToWideChar(
CP_ACP,
0,
input.begin(),
-1,
output.begin(),
oSize
);
output.resize(oSize);

return !iSize || oSize;
}

It's untested but apart from optimizing the vector allocations this should be about as fast as a MultiByteToWideChar wrapper will get.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!