Sign in to follow this  
elih1

loop through array of wchar_t

Recommended Posts

Hello Is the following code correct, cause Unicode character is not always 2 bytes long : wchar_t arr[20]=L"Hello"; int num=wcslen(arr); for(int i=0;i<num;i++) do something with arr[i]; [Edited by - elih1 on June 4, 2009 1:53:35 AM]

Share this post


Link to post
Share on other sites
There was a thread on this topic not too long ago. As far as I can recall, C++ has no notion of "Unicode"; wchar_t is a signed short, and nothing more.

Quote:
Original post by elih1
cause Unicode character is not always 2 bytes long


An encoded Unicode character is not always two bytes long. A decoded Unicode character is always an integer of some predetermined size -- generally a wchar_t/short, sometimes an int.

And to answer your question, yes, that code is correct.

[Edited by - _fastcall on June 4, 2009 2:14:06 AM]

Share this post


Link to post
Share on other sites
I will assume you're using windows where wchar_t is 16 bits.

Whether it is correct depends on what you want to achieve.
This counts code values.

If you want to count code points or graphemes, it's wrong.

Quote:
cause Unicode character is not always 2 bytes long

Unicode code points are coded on 21 bits.
Every code point is encoded by one or two code values in UTF-16.

Share this post


Link to post
Share on other sites
Quote:
Original post by _fastcall
There was a thread on this topic not too long ago. As far as I can recall, C++ has no notion of "Unicode"; wchar_t is a signed short, and nothing more.

In standard C++, wchar_t is a separate type from all the other integral types. It's size and whether it's signed or unsigned are both implementation defined. Of the compilers I'm familiar with, only Borland's C++ compilers use a signed 16-bit wchar_t. Other Windows compilers such as MSVC or the MinGW port of gcc wchar_t use a unsigned 16-bit type. On every *nix platform I've worked with, wchar_t is a 32-bit type.

And whether or not the code is correct depends on the encoding wchar_t uses on the platform the code is compiled for and what "do something with arr[i]" actually means.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this