How to store "unsigned long" character in a string?

Started by
8 comments, last by taby 16 years, 1 month ago
I'm using Freetype to convert fonts to my own bitmap font format for use with OpenGL. Freetype returns character codes as unsigned longs. My bitmap font format inherits from my texture atlas format. This format contains a std::map of a std::string id key, and the corresponding data for all the various individual images (letters in the case of a font) in the atlas. What I want to do is assign this unsigned long to the string, so that I can access the letter by just typing "a" instead of the actual number character code for a. If I just directly assign the unsigned long to the std::string, it works.... for letters with a character code < 255. But if the charcode is higher, they just overwrite a smaller image/letter in the std::map. (So a character with code 0x161 overwrites one 0x061... make sense?) Do std::strings only allow chars of up to 255? If so any ideas for a workaround? Thanks
Advertisement
If you're getting unicode characters, use std::wstring and the _w_ variants (std::wcout...)
std::wstring looks like just what I needed. Thanks.

I remember reading about wstring once or twice, but most books/articles I've read about C++ just talk about std::string. Are there any differences I should be aware of? And what about converting between them? Is it best to avoid this, and only use either wstring or string throughout your code?

Also, is there a practical performance/memory difference. ( wstring obviously must take up more space, but is it basically negligible under standard circumstances such as just having maybe a few hundred strings? )

Or are there any good articles/references on it? I'm the curious type [wink].
std::string is just a typedef std::basic_string<char> string; and std::wstring is just typedef std::basic_string<wchar_t> wstring;<\tt>; so they should behave the same, you also shouldn't have any trouble just using std::wstring everywhere.
Quote:Original post by BeauMN

And what about converting between them?


Conversions aren't recommended, since wstring uses unicode characters (64k different ones) and string uses ASCII characters (256 different). Unfortunately, neither of them explicitly defines locales, so while you can perform by-value conversion, the result if using non-ascii characters may be complete giberish. There's various compiler/platform specific conversions.

But as said, there's no reliable way to convert them. So unless it's for debug output, prefer not to do it without i18n library. For debug however, simple conversion might work.

Quote:Also, is there a practical performance/memory difference.


Yes. It's specialized for wchar_t type instead of char. Unfortunately, size of wchar_t is not standardized. MVC uses 2 bytes (short), gcc uses 4 bytes (int). This can become a problem if you want to access them directly as arrays, or if you're serializing them.

Quote:( wstring obviously must take up more space, but is it basically negligible under standard circumstances such as just having maybe a few hundred strings? )


No, it's not really an issue, it's just factor of 2 or 4. And if it is, you'll have to deal with many other problems first.

Quote:Or are there any good articles/references on it? I'm the curious type [wink].


No, because it's exactly the same class definition as regular string, just template specialization.

A couple more questions:

Does the member function size() in std::string and std::wstring return the number of characters or the number of bytes in the string? Different references I've checked say different things on this.

For saving my string data to a binary file, I've always used the size() function to get the number of bytes and the data() to get a pointer to the first byte. This worked fine for std::string as 1 char = 1 byte. But with wchars will it still work? Or will I need to do size() * sizeof(wchar) to get the size in bytes?

Also, what do people here generally use: std::string or std::wstring? I'm asking because, right now, I only really need std::string. I could just drop the characters with a code past 255, as everything I'm making is in English. But I feel that it might be a good idea to get in the habit of using std::wstring. That way if at some point I want to have a program I've made translated into another language, the process will be easier. What are people's opinions on this?
AFAIK, size() returns the number of characters, and returns the same value as length(). So, yes, you will need to multiply by sizeof(wchar).
NextWar: The Quest for Earth available now for Windows Phone 7.
Quote:Original post by Sc4Freak
AFAIK, size() returns the number of characters, and returns the same value as length(). So, yes, you will need to multiply by sizeof(wchar).


Strings have a .length(). It returns same as size(), number of characters.

Quote:For saving my string data to a binary file, I've always used the size() function to get the number of bytes and the data() to get a pointer to the first byte. This worked fine for std::string as 1 char = 1 byte. But with wchars will it still work? Or will I need to do size() * sizeof(wchar) to get the size in bytes?


This would be generic version that accepts arbitrary string:
template < class CharType, class Traits, class Allocator>void write( const std::basic_string<C, T, A> & s ){  typedef typename Traits::char_type char_type;  write_bytes( s.c_str(), s.size() * sizeof(char_type) );};

Although it should be noted that it's not generic enough, since it doesn't take into consideration many other aspects of string representation that are supported by basic_string.

If you need a fully generic version, you'll need to use iterators. Those might cover everything, but they are considerably slower.

Quote:Also, what do people here generally use: std::string or std::wstring? I'm asking because, right now, I only really need std::string. I could just drop the characters with a code past 255, as everything I'm making is in English. But I feel that it might be a good idea to get in the habit of using std::wstring. That way if at some point I want to have a program I've made translated into another language, the process will be easier. What are people's opinions on this?


- for most languages western languages there's no need to internationalize.
- internationalization is very expensive from production perspective
- wstring is a very poor way to do i18n, there's good packages that do that
- i18n is not just unicode. It involves either multiple compilation or run-time switching, as well as asset loading. - wstring is costly (in a relative manner), and redundant for everything except UI (there really is no need to translate status logs into n languages)
- some projects use wstring consistently
- many projects rely heavily on C code or C with classes.
- memory constraints
Thanks for all the info!

After considering everything I decided to ditch wstring and font glyphs above 255. It doesn't seem worth the trouble for now. ASCII will work fine, no need go overboard.

Thanks again!

This topic is closed to new replies.

Advertisement