Sign in to follow this  
CoffeeMug

C++ locale questions

Recommended Posts

I can't seem to figure this one out. Internally in my program I represent all strings as wchar_t based objects. At some points I need to read files that may have different encodings (from simple ASCII file to any other code page one can think of). Assuming I know the encoding of a file, how do I read it using istream_iterator into my wchar_t strings? I tried to use wifstream, but I keep bumping into problems. How do I tell wifstream which encoding the file is in? In particular, how do I do it for simple ASCII file and some other arbitrary encoding (i.e. what's the name of the simple ASCII locale?) Thanks.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Assuming I know the encoding of a file, how do I read it using istream_iterator into my wchar_t strings?


It wont make a difference to the code of istream_iterator, you just need to give a different character type, in this case wchar_t. Its one of the facet types contained in an locale that preforms code conversion from external to internal (and vice-versa) repsentations called codecvt/_byname, assuming you really want to use istream_iterator for std::basic_string and not istreambuf_iterator then its still:


#include <iterator>
#include <fstream>
#include <string>

int main() {

std::wifstream ifs("foo");

// .....

std::wstring s((std::istream_iterator<wchar_t, wchar_t>(ifs)),
std::istream_iterator<wchar_t, wchar_t>());

}


Use stream's "imbue" method to setup to use a different locale instance.

Quote:
Original post by CoffeeMug
I tried to use wifstream, but I keep bumping into problems. How do I tell wifstream which encoding the file is in?


In most cases you just assign a named locale object (with the wright format string) to a stream, in some other cases you'll want to explicitly set up a some locale object with a std::codecvt_byname facet with different setup for conversion between external and internal representations, in rare cases you might need to make your own codecvt facet by deriving from std::codecvt.

You'll need to check out what character encoding schemes your compiler supports (for std::codecvt_byname).

Quote:
Original post by CoffeeMug
In particular, how do I do it for simple ASCII file and some other arbitrary encoding (i.e. what's the name of the simple ASCII locale?)


You can create locales in different ways:

classic locale - C classic U.S English ASCII, created by invoking: std::locale::classic()

global locale - always available, default constructed locale will be a copy of a global locale, typically set-up to equal classic locale but not always the case and can be changed to something else.

native locale - specified as std::locale foo("") is the one set-up by the user's OS enviroment, may not be equal to global.

named locale - Obviously one setup for a specific location (plus some other attributes), giving the string "C" gives a classic U.S English ASCII locale aswell.

combind locale - a combination of any of the above.

Luckly enough Bjarnes has put the appendix D (Locales) of the book "The C++ Programming Language Special Edition" online here. Not as detailed as the book "Standard C++ IOStreams and Locales: Advanced Programmer's Guide and Reference" of course.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this