# Simple code for reading and writing file in UTF-16 mode

This topic is 1009 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have an input file named "Input.txt". I have prepared it in Notepad++, then converted to UTF-16 from "Notepad++ Menu > Encoding > Convert to UCS-2 LE BOM". I want to read the contents of this file line by line, then print these lines to a file named "Output.txt".

Here is my code:

#include <vector>
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>

int wmain(int argc, wchar_t *argv[])
{
std::vector<std::wstring> Lines;
std::wstring Line;
const std::wstring LINE_END = L"\n";

InputFileStream.imbue(std::locale(InputFileStream.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));
while (std::getline(InputFileStream, Line))
{
if (Line.size())	// Delete the '\r' character from the line ending mark if it exists.
{
if (Line.back() == L'\r')
{
Line.pop_back();
}
}
Lines.emplace_back(std::move(Line));
}
InputFileStream.close();

OutputFileStream.imbue(std::locale(OutputFileStream.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));

for (uintmax_t i=0; i<Lines.size(); i++)
{
OutputFileStream << Lines[i] /*<< LINE_END*/ << std::endl;
}
OutputFileStream.close();

return 0;
}


Input.txt

Hello World!

Variable1 = 12.345
Variable2 = "Test"

End

Output.txt

Hello World!??
Variable1 = 12.345????????????????????
?????

Input.txt

Output.txt

Requirements

• The input and output data must always be kept in UTF-16 encoding.
• The same text must appear at the output.
• The line ending format may change. \r\n line endings in the input file may change to \n in the output file, and vice versa.
• There must be cyclic consistency between the input and output. If I give n-1th output to the nth code run as input, it must give correct and exactly the same output. As I stated in the previous item, the line endings may change after the first run.

How do I make this code run?

##### Share on other sites
Are you sure you need to imbue the input/output streams? Since you do not want to modify the encoding that seems like one additional point of failure...

##### Share on other sites

Are you sure you need to imbue the input/output streams? Since you do not want to modify the encoding that seems like one additional point of failure...

No, I'm not sure. I read the documentation of std::codecvt, but I didn't understand much. I can't find an online example matching my use case. I am totally stuck.

##### Share on other sites

The problem seems to be std::endl outputting ascii line endings instead of UTF-16 encoded line endings (notice the missing 00 between them, which is there in the input).

This makes the whole string get shifted after it, so all code points are no longer aligned to word boundaries. Only reason the second line works at all is because there is two line endings after the first line making the string aligned again.

Unfortunately I have no idea why std::endl would behave in this way when you have set the locale on the stream...  I would expect that to work, but I don't know, since I haven't used UTF-16 strings much, I usually have my strings in UTF-8.

Edited by Olof Hedman

##### Share on other sites
I would just try to read/write the data using standard streams on char16_t instead of char while not trying to change the locale or anything else.

Well, actually I would avoid UTF-16 like the plague.

##### Share on other sites

On Windows (but preferably everywhere else too) always use binary mode file streams, because text mode file streams may replace end of line characters:

std::wofstream OutputFileStream(L"C:\\Users\\Administrator\\Desktop\\Test\\Output.txt", std::ios_base::out | std::ios_base::trunc | std::ios_base::binary);


• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 10
• 11
• 13
• 9
• 9
• ### Forum Statistics

• Total Topics
634085
• Total Posts
3015407
×