Jump to content
  • Advertisement
  • entries
    422
  • comments
    1540
  • views
    490100

Unicoding fun

Sign in to follow this  
jollyjeffers

75 views

Afternoon everyone,

Been a while since I updated my journal and even longer since I put anything remotely useful in it [grin]

So i gave up on that stupid D3D shader thing. No matter what I try it seems to go wrong [headshake]. I'll give it some time and get back to it when I come back from Spain.

In the meantime I refactored the F1CM codebase to Unicode (or wchar_t and std::wstring). From over 2-3 thousand errors I'm down to a mere 73 errors now [oh]

I'm getting really good at breaking stuff

Anyway, the biggest headache I've had is external I/O (a.k.a. file reading/writing). Based on some googling it seems that getting STL to eat wide characters is a brutally painful affair that not only I had to experience. Upgrading an STL-based application to use Unicode. is a good article on the subject.

STL is without a doubt the best and worst thing about C++ programming

So I decided to roll my own Little Endian Unicode Text File Library [rolleyes]

I've tested it against my source data and it seems to work as expected, and I'm quite content that the code is clean enough - but it's only empiracally tested by myself so who knows [grin]

For anyone who wants the code:

UnicodeUtils.h:
//=============================================================================
// FILE: UnicodeUtils.h
// AUTHOR: Jack Hoxley
// VERSION: 1.0 (16th August 2005)
//
// This source code is provided "as-is" and you use this source code
// entirely at your own risk. Whilst testing has been completed
// it is not guaranteed to be completely bug free.
//=============================================================================





// Defines
//--------
#define UNICODE
#define _UNICODE

// Standard headers
//-----------------
#include


// Inclusion Guards
//-----------------
#ifndef INC_UNICODEUTILS_H
#define INC_UNICODEUTILS_H



namespace UnicodeUtils
{

bool isFileUnicode( const std::string& filename );

void LoadUnicodeFile( const std::string& filename, std::wstring& data );

void WriteUnicodeFile( const std::string& filename, const std::wstring& data );

}; // UnicodeUtils



#endif // INC_UNICODEUTILS_H



UnicodeUtils.cpp:
//=============================================================================
// FILE: UnicodeUtils.cpp
// AUTHOR: Jack Hoxley
// VERSION: 1.0 (16th August 2005)
//
// This source code is provided "as-is" and you use this source code
// entirely at your own risk. Whilst testing has been completed
// it is not guaranteed to be completely bug free.
//=============================================================================





// Defines
//--------
#define UNICODE
#define _UNICODE

// Standard Headers
//-----------------
#include
#include
#include
#include
#include

// Project Headers
//----------------
#include "UnicodeUtils.h"

// Macros
//-------
#ifndef SAFE_DELETE
#define SAFE_DELETE( ptr ) {if(ptr){delete ptr;ptr=0;}}
#endif

#ifndef SAFE_DELETE_ARRAY
#define SAFE_DELETE_ARRAY( ptr ) {if(ptr){delete[] ptr;ptr=0;}}
#endif




namespace UnicodeUtils
{

// -------------------------
// i s F i l e U n i c o d e
// -------------------------
//
// AUTHOR:
// Jack Hoxley
// DATE:
// 16th August 2005
//
// DESCRIPTION:
// This utility function will examine the first two bytes of the provided
// file and return true if they indicate that it is a unicode text file.
//
// The first two bytes should be 0xFF followed by 0xFE (-1 and -2 respectively)
//
// PARAMS:
// filename - the file to examine, can be absolute or relative.
//
// NOTES:
// - This function only supports little endian unicode files.
//
bool isFileUnicode( const std::string& filename )
{
std::ifstream ifs;

ifs.open( filename.c_str() );

if( ifs.is_open( ) )
{
char* header = new char[ 2 ];
ifs.read( header, 2 );
if( ( header[ 0 ] == -1 ) && ( header[ 1 ] == -2 ) )
{
ifs.close( );
return true;
}
else
{
ifs.close( );
return false;
}

}

return false;

}


// -----------------------------
// L o a d U n i c o d e F i l e
// -----------------------------
//
// AUTHOR:
// Jack Hoxley
// DATE:
// 16th August 2005
//
// DESCRIPTION:
// This utility function will load the requested file and read in the data
// interpretting each pair of bytes as a unicode character.
//
// PARAMS:
// filename - the file to read from, path can be absolute or relative
// data - storage for the data that is loaded. Will be emptied.
//
// NOTES:
// - The first parameter is narrow/ascii characters and the second is
// wide/unicode characters!
// - This function only supports little endian unicode files.
//
void LoadUnicodeFile( const std::string& filename, std::wstring& data )
{

// Make sure we've got a clean output to work with
data.clear( );

// Check that we're actually attempting to load a valid unicode file
if( !isFileUnicode( filename ) )
{
std::ostringstream oss;
oss << "LoadUnicodeFile() - Specified file is not stored in little-endian unicode format."
<< std::endl
<< "File: "
<< filename
<< std::ends;
throw std::runtime_error( oss.str().c_str() );
return;
}

// Open up the byte stream for reading
std::ifstream ifs;
ifs.open( filename.c_str(), std::ios::binary );

// Only proceed if we successfully opened this file
if( ifs.is_open( ) )
{

char b1 = '\0';
char b2 = '\0';

// Read past the standard unicode header, the isFileUnicode()
// call made at the top of this function allows us to safely
// ignore the contents of the first two bytes.
ifs.read( &b1, 1 ); //0xFF
ifs.read( &b2, 1 ); //0xFE

// Loop through each pair of bytes in the file, we know the
// file is unicode, so there should be an even number of bytes.
// It is possible with a corrupted file that this isn't the case..
while( ifs.rdstate() == std::ios::goodbit )
{
ifs.read( &b1, 1 );
ifs.read( &b2, 1 );
data += ( wchar_t )( b1 | ( b2 << 8 ) );
}

// Loading the data seems to echo the last character, so
// the string is resized to 1 less in order to strip out the echo
data.resize( data.length() - 1 );


// Before we leave, we need to close the file
ifs.close( );

}
else
{

// Output an error indicating whats gone wrong...
std::ostringstream oss;
oss << "LoadUnicodeFile() Failed - The specified filename could not be opened."
<< std::endl
<< "File: "
<< filename
<< std::ends;
throw std::runtime_error( oss.str().c_str() );

}

}



// -------------------------------
// W r i t e U n i c o d e F i l e
// -------------------------------
//
// AUTHOR:
// Jack Hoxley
// DATE:
// 16th August 2005
//
// DESCRIPTION:
// Will open the specified file for output and write the wide characters out
// in little endian format. Effectively the functional opposite of LoadUnicodeFile
//
// PARAMS:
// filename - The file to write to, will be overwritten if exists.
// data - The wide characters to write to the file
//
// NOTES:
// - The first parameter is narrow/ascii characters and the second is
// wide/unicode characters!
// - This function only supports little endian unicode files.
//
void WriteUnicodeFile( const std::string& filename, const std::wstring& data )
{
// Open up the file to write to
std::ofstream ofs;
ofs.open( filename.c_str(), std::ios::binary | std::ios::out );

// Only continue if we succesfully opened the file
if( ofs.is_open() )
{
char b1 = -1; // Signed equivalent of 0xFF
char b2 = -2; // Signed equivalent of 0xFE

// Write out the unicode header bytes
ofs.write( &b1, 1 );
ofs.write( &b2, 1 );

// Loop through the contents of the string and write it to the file
for( std::wstring::const_iterator it = data.begin(); it != data.end(); ++it )
{
// Get the character at this position
wchar_t wc = *it;

// Break into 2 bytes
b1 = (char)(wc & 0xFF);
b2 = (char)( ( wc >> 8 ) & 0xFF );

// Write the low 8 bits out first
ofs.write( &b1, 1 );

// Write the high 8 bits out second
ofs.write( &b2, 1 );

}

// When we've finished writing the characters out
ofs.close( );

}
else
{

// Something went wrong, report an error
std::ostringstream oss;
oss << "WriteUnicodeFile() - Could not open file for output."
<< std::endl
<< "File: "
<< filename
<< std::ends;
throw std::runtime_error( oss.str().c_str() );

}

}

}; // UnicodeUtils



Depending on any feedback you guys have I might put out a message in the General Programming forum to see if anyone wants to make use of it [smile]

Till next time...
Sign in to follow this  


0 Comments


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!