Sign in to follow this  
hkBattousai

Simple code for reading and writing file in UTF-16 mode

Recommended Posts

I have an input file named "Input.txt". I have prepared it in Notepad++, then converted to UTF-16 from "Notepad++ Menu > Encoding > Convert to UCS-2 LE BOM". I want to read the contents of this file line by line, then print these lines to a file named "Output.txt".

 

Here is my code:

#include <vector>
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>

int wmain(int argc, wchar_t *argv[])
{
	std::vector<std::wstring> Lines;
	std::wstring Line;
	const std::wstring LINE_END = L"\n";

	std::wifstream InputFileStream(L"C:\\Users\\Administrator\\Desktop\\Test\\Input.txt", std::wifstream::in);
	InputFileStream.imbue(std::locale(InputFileStream.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));
	while (std::getline(InputFileStream, Line))
	{
		if (Line.size())	// Delete the '\r' character from the line ending mark if it exists.
		{
			if (Line.back() == L'\r')
			{
				Line.pop_back();
			}
		}
		Lines.emplace_back(std::move(Line));
	}
	InputFileStream.close();
	
	std::wofstream OutputFileStream(L"C:\\Users\\Administrator\\Desktop\\Test\\Output.txt", std::ios_base::out | std::ios_base::trunc);
	OutputFileStream.imbue(std::locale(OutputFileStream.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));
	
	for (uintmax_t i=0; i<Lines.size(); i++)
	{
		OutputFileStream << Lines[i] /*<< LINE_END*/ << std::endl;
	}
	OutputFileStream.close();
	
	return 0;
}

Input.txt

Hello World!

Variable1 = 12.345
Variable2 = "Test"

End

 

Output.txt

Hello World!??
Variable1 = 12.345????????????????????
?????

 

Input.txt
xIhxBIQ.png

 

Output.txt

GYWuOcF.png

 

Requirements

  • The input and output data must always be kept in UTF-16 encoding.
  • The same text must appear at the output.
  • The line ending format may change. \r\n line endings in the input file may change to \n in the output file, and vice versa.
  • There must be cyclic consistency between the input and output. If I give n-1th output to the nth code run as input, it must give correct and exactly the same output. As I stated in the previous item, the line endings may change after the first run.

 

How do I make this code run?

 

Share this post


Link to post
Share on other sites

Are you sure you need to imbue the input/output streams? Since you do not want to modify the encoding that seems like one additional point of failure...

 

No, I'm not sure. I read the documentation of std::codecvt, but I didn't understand much. I can't find an online example matching my use case. I am totally stuck.

Share this post


Link to post
Share on other sites

The problem seems to be std::endl outputting ascii line endings instead of UTF-16 encoded line endings (notice the missing 00 between them, which is there in the input).

 

This makes the whole string get shifted after it, so all code points are no longer aligned to word boundaries. Only reason the second line works at all is because there is two line endings after the first line making the string aligned again.

 

Unfortunately I have no idea why std::endl would behave in this way when you have set the locale on the stream...  I would expect that to work, but I don't know, since I haven't used UTF-16 strings much, I usually have my strings in UTF-8.

Edited by Olof Hedman

Share this post


Link to post
Share on other sites
I would just try to read/write the data using standard streams on char16_t instead of char while not trying to change the locale or anything else.

Well, actually I would avoid UTF-16 like the plague.

Share this post


Link to post
Share on other sites

On Windows (but preferably everywhere else too) always use binary mode file streams, because text mode file streams may replace end of line characters:

std::wofstream OutputFileStream(L"C:\\Users\\Administrator\\Desktop\\Test\\Output.txt", std::ios_base::out | std::ios_base::trunc | std::ios_base::binary);

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this