• Advertisement
  • entries
    422
  • comments
    1540
  • views
    489659

Unicoding fun

Sign in to follow this  

66 views

Afternoon everyone,

Been a while since I updated my journal and even longer since I put anything remotely useful in it [grin]

So i gave up on that stupid D3D shader thing. No matter what I try it seems to go wrong [headshake]. I'll give it some time and get back to it when I come back from Spain.

In the meantime I refactored the F1CM codebase to Unicode (or wchar_t and std::wstring). From over 2-3 thousand errors I'm down to a mere 73 errors now [oh]

I'm getting really good at breaking stuff

Anyway, the biggest headache I've had is external I/O (a.k.a. file reading/writing). Based on some googling it seems that getting STL to eat wide characters is a brutally painful affair that not only I had to experience. Upgrading an STL-based application to use Unicode. is a good article on the subject.

STL is without a doubt the best and worst thing about C++ programming

So I decided to roll my own Little Endian Unicode Text File Library [rolleyes]

I've tested it against my source data and it seems to work as expected, and I'm quite content that the code is clean enough - but it's only empiracally tested by myself so who knows [grin]

For anyone who wants the code:

UnicodeUtils.h:
//=============================================================================
// FILE: UnicodeUtils.h
// AUTHOR: Jack Hoxley
// VERSION: 1.0 (16th August 2005)
//
// This source code is provided "as-is" and you use this source code
// entirely at your own risk. Whilst testing has been completed
// it is not guaranteed to be completely bug free.
//=============================================================================





// Defines
//--------
#define UNICODE
#define _UNICODE

// Standard headers
//-----------------
#include


// Inclusion Guards
//-----------------
#ifndef INC_UNICODEUTILS_H
#define INC_UNICODEUTILS_H



namespace UnicodeUtils
{

bool isFileUnicode( const std::string& filename );

void LoadUnicodeFile( const std::string& filename, std::wstring& data );

void WriteUnicodeFile( const std::string& filename, const std::wstring& data );

}; // UnicodeUtils



#endif // INC_UNICODEUTILS_H



UnicodeUtils.cpp:
//=============================================================================
// FILE: UnicodeUtils.cpp
// AUTHOR: Jack Hoxley
// VERSION: 1.0 (16th August 2005)
//
// This source code is provided "as-is" and you use this source code
// entirely at your own risk. Whilst testing has been completed
// it is not guaranteed to be completely bug free.
//=============================================================================





// Defines
//--------
#define UNICODE
#define _UNICODE

// Standard Headers
//-----------------
#include
#include
#include
#include
#include

// Project Headers
//----------------
#include "UnicodeUtils.h"

// Macros
//-------
#ifndef SAFE_DELETE
#define SAFE_DELETE( ptr ) {if(ptr){delete ptr;ptr=0;}}
#endif

#ifndef SAFE_DELETE_ARRAY
#define SAFE_DELETE_ARRAY( ptr ) {if(ptr){delete[] ptr;ptr=0;}}
#endif




namespace UnicodeUtils
{

// -------------------------
// i s F i l e U n i c o d e
// -------------------------
//
// AUTHOR:
// Jack Hoxley
// DATE:
// 16th August 2005
//
// DESCRIPTION:
// This utility function will examine the first two bytes of the provided
// file and return true if they indicate that it is a unicode text file.
//
// The first two bytes should be 0xFF followed by 0xFE (-1 and -2 respectively)
//
// PARAMS:
// filename - the file to examine, can be absolute or relative.
//
// NOTES:
// - This function only supports little endian unicode files.
//
bool isFileUnicode( const std::string& filename )
{
std::ifstream ifs;

ifs.open( filename.c_str() );

if( ifs.is_open( ) )
{
char* header = new char[ 2 ];
ifs.read( header, 2 );
if( ( header[ 0 ] == -1 ) && ( header[ 1 ] == -2 ) )
{
ifs.close( );
return true;
}
else
{
ifs.close( );
return false;
}

}

return false;

}


// -----------------------------
// L o a d U n i c o d e F i l e
// -----------------------------
//
// AUTHOR:
// Jack Hoxley
// DATE:
// 16th August 2005
//
// DESCRIPTION:
// This utility function will load the requested file and read in the data
// interpretting each pair of bytes as a unicode character.
//
// PARAMS:
// filename - the file to read from, path can be absolute or relative
// data - storage for the data that is loaded. Will be emptied.
//
// NOTES:
// - The first parameter is narrow/ascii characters and the second is
// wide/unicode characters!
// - This function only supports little endian unicode files.
//
void LoadUnicodeFile( const std::string& filename, std::wstring& data )
{

// Make sure we've got a clean output to work with
data.clear( );

// Check that we're actually attempting to load a valid unicode file
if( !isFileUnicode( filename ) )
{
std::ostringstream oss;
oss << "LoadUnicodeFile() - Specified file is not stored in little-endian unicode format."
<< std::endl
<< "File: "
<< filename
<< std::ends;
throw std::runtime_error( oss.str().c_str() );
return;
}

// Open up the byte stream for reading
std::ifstream ifs;
ifs.open( filename.c_str(), std::ios::binary );

// Only proceed if we successfully opened this file
if( ifs.is_open( ) )
{

char b1 = '\0';
char b2 = '\0';

// Read past the standard unicode header, the isFileUnicode()
// call made at the top of this function allows us to safely
// ignore the contents of the first two bytes.
ifs.read( &b1, 1 ); //0xFF
ifs.read( &b2, 1 ); //0xFE

// Loop through each pair of bytes in the file, we know the
// file is unicode, so there should be an even number of bytes.
// It is possible with a corrupted file that this isn't the case..
while( ifs.rdstate() == std::ios::goodbit )
{
ifs.read( &b1, 1 );
ifs.read( &b2, 1 );
data += ( wchar_t )( b1 | ( b2 << 8 ) );
}

// Loading the data seems to echo the last character, so
// the string is resized to 1 less in order to strip out the echo
data.resize( data.length() - 1 );


// Before we leave, we need to close the file
ifs.close( );

}
else
{

// Output an error indicating whats gone wrong...
std::ostringstream oss;
oss << "LoadUnicodeFile() Failed - The specified filename could not be opened."
<< std::endl
<< "File: "
<< filename
<< std::ends;
throw std::runtime_error( oss.str().c_str() );

}

}



// -------------------------------
// W r i t e U n i c o d e F i l e
// -------------------------------
//
// AUTHOR:
// Jack Hoxley
// DATE:
// 16th August 2005
//
// DESCRIPTION:
// Will open the specified file for output and write the wide characters out
// in little endian format. Effectively the functional opposite of LoadUnicodeFile
//
// PARAMS:
// filename - The file to write to, will be overwritten if exists.
// data - The wide characters to write to the file
//
// NOTES:
// - The first parameter is narrow/ascii characters and the second is
// wide/unicode characters!
// - This function only supports little endian unicode files.
//
void WriteUnicodeFile( const std::string& filename, const std::wstring& data )
{
// Open up the file to write to
std::ofstream ofs;
ofs.open( filename.c_str(), std::ios::binary | std::ios::out );

// Only continue if we succesfully opened the file
if( ofs.is_open() )
{
char b1 = -1; // Signed equivalent of 0xFF
char b2 = -2; // Signed equivalent of 0xFE

// Write out the unicode header bytes
ofs.write( &b1, 1 );
ofs.write( &b2, 1 );

// Loop through the contents of the string and write it to the file
for( std::wstring::const_iterator it = data.begin(); it != data.end(); ++it )
{
// Get the character at this position
wchar_t wc = *it;

// Break into 2 bytes
b1 = (char)(wc & 0xFF);
b2 = (char)( ( wc >> 8 ) & 0xFF );

// Write the low 8 bits out first
ofs.write( &b1, 1 );

// Write the high 8 bits out second
ofs.write( &b2, 1 );

}

// When we've finished writing the characters out
ofs.close( );

}
else
{

// Something went wrong, report an error
std::ostringstream oss;
oss << "WriteUnicodeFile() - Could not open file for output."
<< std::endl
<< "File: "
<< filename
<< std::ends;
throw std::runtime_error( oss.str().c_str() );

}

}

}; // UnicodeUtils



Depending on any feedback you guys have I might put out a message in the General Programming forum to see if anyone wants to make use of it [smile]

Till next time...
Sign in to follow this  


0 Comments


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement