• Advertisement
Sign in to follow this  

binary I/O

This topic is 3225 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a structure called an Anibox. It just plays cartoons. All it contains is the number of images (int isize), the type of Anibox (int itype), where to start looping (int iloop), and an array of integers which are the names of images loaded into OpenGL (int ipics[isize]). That's it. Just those four things. Three ints and one array of ints. (oh and also it contains the string name which is the name of itself and the file it will write to) I want to write the Anibox to a file and be able to retrieve it later. I want to write the names of the images though, not the OpenGL number which of course will be useless later. I have a separate class that holds the names of all the images loaded into OpenGL and the int that OpenGL named it. In this case, it's *picbox. picbox has a function called: string NameGrab2(int); which will return the name of the image if you send it the OpenGL name (int). So I did this:
void Anibox::Writebox()
{
   ofstream outfile;
   int ix;
   int length;
   string saver;
   char *copy;

   outfile.open(name.c_str(),ios::out|ios::binary);
   outfile.write(reinterpret_cast<char *>(&itype),sizeof(int));
   outfile.write(reinterpret_cast<char *>(&isize),sizeof(int));
   outfile.write(reinterpret_cast<char *>(&iloop),sizeof(int));
   for(ix=0;ix<isize;ix++)
   {
      saver=picbox->NameGrab2(ipics[ix]);
      length=saver.length();
      copy=new char[length+1];
      strcpy(copy,saver.c_str());
      outfile.write(reinterpret_cast<char *>(&length),sizeof(int));
      outfile.write(copy,length);
      delete[] copy;
   }
   outfile.close();
}


It works! It does exactly what I want it to do. (I think) But isn't there a better way to do it? Is there no way I can just use string.length() and the string itself directly in outfile.write? Dynamically allocating a char array just seems kind of silly.

Share this post


Link to post
Share on other sites
Advertisement
...oh yeah. I guess that is a char*. What am I smoking. I don't remember why I made a separate int to record length() either. Man.

Oh wait. I have to record string.length() into an int because write() wants everything cast as a char*. I mean this doesn't make any sense does it?:
outfile.write(reinterpret_cast<char *>(&picbox->NameGrab2(ipics[ix]).length()),sizeof(int));

I don't think you can reference a return value.

Share this post


Link to post
Share on other sites
Alright, here it is, new and improved. I don't see any way to get around copying length() into int length.


void Anibox::Writebox()
{
ofstream outfile;
int ix;
int length;

outfile.open(name.c_str(),ios::out|ios::binary);
outfile.write(reinterpret_cast<char *>(&itype),sizeof(int));
outfile.write(reinterpret_cast<char *>(&isize),sizeof(int));
outfile.write(reinterpret_cast<char *>(&iloop),sizeof(int));
for(ix=0;ix<isize;ix++)
{
length=picbox->NameGrab2(ipics[ix]).length();
outfile.write(reinterpret_cast<char *>(&length),sizeof(int));
outfile.write(picbox->NameGrab2(ipics[ix]).c_str(),length);
}
outfile.close();
}

Share this post


Link to post
Share on other sites
Okay I wrote this as the opposite of above. It reads from one of the files I outputted with the function above and recontructs the Anibox:

bool Anibox::Spawnbox(string anifile, ImageVault *picbox)
{
ifstream infile;
int ix;
string strtemp;
int length;
char *temp;
bool breturn=true;

infile.open(anifile.c_str(),ios::in|ios::binary);
if(infile.fail()==false)
{
if(ipics!=NULL)
delete[] ipics;

name=anifile;
infile.read(reinterpret_cast<char *>(&itype),sizeof(int));
infile.read(reinterpret_cast<char *>(&isize),sizeof(int));
infile.read(reinterpret_cast<char *>(&iloop),sizeof(int));
ipics=new GLuint[isize];

for(ix=0;ix<isize;ix++)
{
infile.read(reinterpret_cast<char *>(&length),sizeof(int));
temp=new char[length+1];
temp[length]='\0';
infile.read(temp,length);
strtemp="Data/Images/";
strtemp+=temp;
picbox->ImageLoader(strtemp,ipics[ix]);
delete[] temp;
}
}
else
{
breturn=false;
}
return breturn;
}



Again, it works perfectly. But I don't like it. Is there a way to avoid the dynamic char array again? I'm gonna read about strings but I don't know if there's a c_str() trick for reading into strings.

Share this post


Link to post
Share on other sites
You're really terribly misusing stream objects here.

For your serialization (writing class to a sequence of bytes):

void Anibox::Writebox()
{
ofstream outfile(name.c_str(), ios::out | ios::binary);
outfile << itype << L"\n";
outfile << isize << L"\n";
outfile << iloop << L"\n";
for(int i = 0; i < isize; ++i)
{
std::string thename = picbox->NameGrab2(ipics);
outfile << thename << L"\n";
}
}






And to deserialize:

bool Anibox::Spawnbox(string anifile, ImageVault *picbox)
{
ifstream infile(anifile.c_str(), ios::in | ios::binary);

if(infile.fail())
return false;

delete[] ipics;

name=anifile;
infile >> itype;
infile >> isize;
infile >> iloop;

ipics = new GLuint[isize];

for(int i = 0; i < isize; ++i)
{
std::string temp;
getline(infile, temp);
temp = L"Data/Images/" + temp;
picbox->ImageLoader(temp, ipics);
}

return true;
}





Note that we have no need to store the name string's length anymore. Also observe that we need no temporary buffers; in general if you are writing C++ and you start using char* for anything, you are probably making a huge mistake. std::string is incredibly well supported in the C++ standard library, and chances are you can use it directly for better performance and better safety.

You may also want to think of a way to avoid hard coding that "Data/Images/" path into your Anibox code. Maybe ImageLoader() should add it internally?

Share this post


Link to post
Share on other sites
What are all those capital L's before your string literals? I've never seen those before.

I read that when doing binary I/O I was supposed to use read() and write()

I don't think there's any way for string to use read(). I guess I'll try your way.

Share this post


Link to post
Share on other sites
Here's a sample of what your write function created. I thought I tried using the << >> operators with ints but I guess I didn't. So I guess it works this way.

00x80628c4100x80628c400x80628c4water1.jpg0x80628c4water2.jpg0x80628c4water3.jpg
0x80628c4water4.jpg0x80628c4water5.jpg0x80628c4water6.jpg0x80628c4water7.jpg
0x80628c4water8.jpg0x80628c4water9.jpg0x80628c4water10.jpg0x80628c4

So is this THE WAY to do it? Serialize. ios::binary. << >> operators. '\n' as delimiter and use getline()? And every other way is the dumb way? Just making sure this is the last time I have to rewrite this.

Does anybody know what that capital L is?

Share this post


Link to post
Share on other sites
I can't make it work Apoch's way. It seems like the serializing part is working. But when I try to deserialize it doesn't work. It opens the infile okay. But then it runs these three:

infile >> itype;
infile >> isize;
infile >> iloop;

And afterwards all three are still zero. It's not reading anything in. I don't get it.

*******

I tried just declaring a string and reading >> the entire file into a string and that worked. So it's reading the file. It's just not reading the serial info into ints for some reason.

Share this post


Link to post
Share on other sites
This page seems pretty sure that << and >> are for text format and read() write() are for binary format but it doesn't say why:

http://www.parashift.com/c++-faq-lite/serialization.html#faq-36.3

I'm starting to wonder if maybe I should go with text format, anyway. I just realized all I have to do is put spaces between ints and then I can use << >>. ...I didn't know that. Spaces. That's so easy. Yeah I'll probably be going with text format. This isn't a massive project and every struct makes it's own little files. I can probably master serialization some other day.

That still bugs me why I couldn't deserialize those ints. I can't figure it out.

Share this post


Link to post
Share on other sites
I can't find a flag for opening an fstream in text mode. I guess if you don't do ios::binary then it's automatically text mode.

Share this post


Link to post
Share on other sites
Quote:
Original post by icecubeflower
Does anybody know what that capital L is?


It is the prefix for wide characters string literals.

"foo" is a const char[4].
L"foo" is a const wchar_t[4].

Quote:
This page seems pretty sure that << and >> are for text format and read() write() are for binary format but it doesn't say why


On a stream, the 'text' or 'binary' flag only controls whether end-of-line character translation should be performed (text) or not (binary) when performing formatted I/O. For example, on Win32 systems, when you write a '\n' onto a "text" file stream, it really writes "\r\n" into the file. When you read back, the "\r\n" sequence is translated back into a single '\n'. Opening a file in "binary" mode disables that translation. When you write a '\n', it'll go as a '\n' into the file.


Formatted I/O is what you do when you use << and >>. The stream framework takes the value you pass it, formats it in a human-readable format (e.g. the integer 42 becomes the character sequence '4' and '2') and writes it to the stream. Conversely, when reading, it does the inverse transformation. If you have written things without delimiters (such as '\n'), when you read back, you'll get a mess (write a 4 and then a 2 and you'll read back a 42). Robustly reading formatted data is a complicated (and sometimes complex) endeavor.

Unformatted I/O (read() and write()) takes the chunk of memory you pass it and writes it 'as is' into the stream, or takes a chunk of stream and copies it back into memory. This is what you really mean by 'binary'. The piece of data are fixed-size (e.g. an int is always sizeof(int) bytes) so there are no overruns from one chunk into the next. The downside is, obviously, that the data is not human-readable. It is unformatted.

Share this post


Link to post
Share on other sites
Quote:
Original post by icecubeflower
What are all those capital L's before your string literals? I've never seen those before.


I'll answer this question as it's harder to find it out if you don't know what you are looking for.

The L is a literal that means the current string or character is "wide" (most commonly associated with Unicode).

For example, if your current project is set to a "Multi-Byte Character Set", which you can change in the Visual Studio project properties under the General Tab's Character Set entry, all strings and characters are interpreted as non-wide. You will notice this when trying to use Win32 functions and if you do not have the correct strings, then you will get compiler errors. For a simple example MessageBoxA(0, L"", L"", 0) 0) will always fail to compile, where as MessageBoxW(0, "", "", 0); will only fail to compile on a Multi-Byte Character Set project.

If you execute the code: std::cout << "Hi"; you will see "Hi" printed out. If you were to execute the code: std::cout << L"Hi";, then you would not see "Hi", you would see the address of the string since cout will only output non-wide characters and strings.

If you were to execute the code: std::wcout << L"Hi";, then you would see "Hi" since wcout is "wide cout" and since you are passing in a wide string, it works properly. Likewise, you can also use std::wcout << "Hi"; and you will see "Hi" due to how the C++ operator << coupled with the special object wcout is used. You shouldn't get errors here like you would previously.

[edit]Ninja'ed by Fruny, I so didn't see that one coming [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by icecubeflower
Does anybody know what that capital L is?
Wide strings. Each character will be two bytes instead of one.

Share this post


Link to post
Share on other sites
Quote:
Original post by icecubeflower
Here's a sample of what your write function created. I thought I tried using the << >> operators with ints but I guess I didn't. So I guess it works this way.

00x80628c4100x80628c400x80628c4water1.jpg0x80628c4water2.jpg0x80628c4water3.jpg
0x80628c4water4.jpg0x80628c4water5.jpg0x80628c4water6.jpg0x80628c4water7.jpg
0x80628c4water8.jpg0x80628c4water9.jpg0x80628c4water10.jpg0x80628c4

So is this THE WAY to do it? Serialize. ios::binary. << >> operators. '\n' as delimiter and use getline()? And every other way is the dumb way? Just making sure this is the last time I have to rewrite this.

Does anybody know what that capital L is?



My mistake - I'm used to working with wide strings so I just added the L prefixes out of habit [smile]

Removing them in your code should get the serialization half to format more nicely, and in turn should get the deserialization code to work correctly. (Just make sure that if you use something like the std::hex formatter that you use it on both ends.)

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
Quote:
Original post by icecubeflower
Does anybody know what that capital L is?
Wide strings. Each character will be two bytes instead of one.


wchar_t's actual size is implementation-defined. It could be one (same as a char), just as it could be four. You can only portably rely on it being sizeof(wchar_t).

Share this post


Link to post
Share on other sites
Quote:

On a stream, the 'text' or 'binary' flag only controls whether end-of-line character translation should be performed

That's it? That's all it does? That doesn't make any sense to me. Wait, hold on.
Augh! That's all it does! I took out the capital L's and did Apoch's write function and got this:

0
29
19
cartoon1.jpg
cartoon2.jpg
cartoon3.jpg
cartoon4.jpg
cartoon5.jpg
cartoon6.jpg
cartoon7.jpg
cartoon8.jpg
cartoon9.jpg
cartoon10.jpg
cartoon11.jpg
cartoon12.jpg
cartoon13.jpg
cartoon14.jpg
cartoon15.jpg
cartoon16.jpg
cartoon17.jpg
cartoon18.jpg
cartoon19.jpg
cartoon20.jpg
cartoon21.jpg
cartoon22.jpg
cartoon23.jpg
cartoon24.jpg
cartoon25.jpg
cartoon26.jpg
cartoon27.jpg
cartoon28.jpg
cartoon29.jpg



I thought binary was what was making all those 00xAADFD things. Man.

That's funny. I went and redid everything in text mode and then I realized the functions I wrote were the same as Apochs.

Except getline() wasn't working for me. The very first getline() read an empty string. And then after that it read cartoon1, cartoon2, etc. So then I switched to the >> operator and that worked. I wonder if I did it Apoch's way now in binary if getline() will work. It basically worked before, it's just in text mode somewhere between the 19, the '\n' and cartoon1 it goofed up and read an empty string.

Share this post


Link to post
Share on other sites
Quote:
Original post by icecubeflower
Quote:

On a stream, the 'text' or 'binary' flag only controls whether end-of-line character translation should be performed

That's it? That's all it does? That doesn't make any sense to me.

It's ... historical.

Quote:
I thought binary was what was making all those 00xAADFD things. Man.


If you open a text editor and read "0xAADF", it is still text. You just formatted it as hexadecimal rather than decimal. If you open a hex editor and read AA DF, then it really is unformatted binary.

The result depends on the functions you use to write the data, not on the mode you used to open the file -- though if you open the file in text mode and try to do unformatted I/O, you are asking for trouble. :)

Share this post


Link to post
Share on other sites
Apoch's deserialization still doesn't work for me. I can do it exactly the way he does it but getline() always reads an empty string on i==0. I just changed it to the >> operator and then it works fine.


bool Anibox::Spawnbox(string anifile, ImageVault *picbox)
{
ifstream infile(anifile.c_str(), ios::in | ios::binary);

if(infile.fail())
return false;

delete[] ipics;

name=anifile;
infile >> itype;
infile >> isize;
infile >> iloop;

ipics = new GLuint[isize];

for(int i = 0; i < isize; ++i)
{
std::string temp;
//getline(infile, temp);
infile>>temp;
temp = "Data/Images/" + temp;
picbox->ImageLoader(temp, ipics);
}

return true;
}

Share this post


Link to post
Share on other sites
Yeah, another oversight on my part - when you read the last integer, it only skips past the numerals themselves in the input stream. When the next read comes along (the getline call), the next data available to be read is the \n immediately following the integer. So that gets read (which results in an empty read) and then you proceed on to the next few strings as normal.

As long as you don't have spaces in your filenames, using the >> operator should work fine.

Share this post


Link to post
Share on other sites
Hey Apoch, you said this:
Quote:

You may also want to think of a way to avoid hard coding that "Data/Images/" path into your Anibox code. Maybe ImageLoader() should add it internally?

So what's the professional way to handle file I/O? The way I have it right now I assume there's a directory called "Data" in the same directory as the executable. And in the Data directory there are other directories like "Images" and "Sounds".

project5/project5 (executable)
project5/Data/Sounds
project5/Data/Images
project5/Data/Maps
porject5/Data/Anibox

But now in my code there's lots of places where I hard code "Data/Images" or whatever and load it into a string. It's a little sporadic where I do it at. Is it better form to put all that directory info into one place? Like ImageLoader ALWAYS looks for files in "Data/Images" and if it's not there then it says it can't find it? And the map loader always loads and writes to "Data/Maps" so it never needs to be passed directory info?

And then would it be even better to have a separate .h file where I #define all the directory names like

#define IMAGEDIR "Data/Images"
#define MAPDIR "Data/Maps"

Share this post


Link to post
Share on other sites
Using #defines is not really a great idea. If anything you should just use constant strings.

If you want to keep things flexible, store your directory names in a config file and read that prior to doing any other file IO.

Note that there's nothing wrong with hardcoding your paths per se; but you should centralize that information instead of scattering it all across your code.


I'd suggest just a simple solution like this:

namespace FileLocations
{
const char* PathToImages = "path/to/images/";
const char* PathToAudio = "path/to/audio/";
// etc.
}



Then you can access the path definition with a simple FileLocations::PathToFoo, which is clear, self-explanatory, and avoids needless repetition of the path information.

Share this post


Link to post
Share on other sites
Quote:
Original post by icecubeflower
Quote:

On a stream, the 'text' or 'binary' flag only controls whether end-of-line character translation should be performed

That's it? That's all it does? That doesn't make any sense to me.


Like Fruny says, it's "historical". There actually is one more difference: in text mode, there will typically be a specific byte that is interpreted as an "end of file" marker. Under Windows, this is the byte with value 26; under Linux, the byte with value 4. You can, incidentally, trigger "end of file" on std::cin by supplying those characters; they're typed with control-Z and control-D respectively. (The control key plus A through Z are mapped to characters 1 through 26, oddly enough. And if you want to annoy people (including yourself), try writing character 7 to std::cout multiple times.)

Quote:
I thought binary was what was making all those 00xAADFD things. Man.


All files are composed of bytes. They do not "contain" text nor numbers nor anything else. That is an interpretation we impose upon them. Each byte, if you're on a sufficiently "normal" computer and are not using Unicode, is an 8-bit integral value that is interpreted as a character of text using the 'ASCII' mapping.

Consider the abstract concept of the number sixty-five. This is a small enough number to represent in 8 bits, so there is a single byte with that value. A file could well contain a byte with that value, and if you looked at it with a text editor, you would see the character 'A'. If you looked at it in a hex editor, you would see the value 41, which is sixty-five written in hexadecimal (base sixteen) - hence "hex". The hex editor would display the character for the digit symbol '4' and the character for the digit symbol '1', because that's how it's programmed.

That file could be meant to represent the text "A". But it could also be meant to represent the number sixty-five. The interpretation is up to whatever program is reading the file.

You could also, however, have a file which contains the two bytes that represent the text "65", which would then be interpreted by a person reading the file as the number sixty-five. If you open this file in a text editor, you see the character '6' followed by the character '5', making the text "65" (which happens to be something we interpret as a number) on screen. If you open it in a hex editor, you see the characters '3635', where '36' represent the first byte (54 in hex) and '35' represent the second byte (53 in hex).

Now, when you read in from a file using operator>>, with an integer type variable as the destination (short, int, long, or the unsigned variations of those), the resulting code attempts to read several bytes from the current file position "as a human would". That is, it would read the byte '6' (with value 54) and the byte '5' (with value 53) from the second file, and interpret those as digits 6 and 5 of a base ten number, and store the value 65 in the variable. It would fail on the first file, because the symbol 'A' is not used for writing numbers in base ten.

The behaviour of operator>> depends on the type of the destination variable. This is a deliberate design decision made so that the operator can "do the right thing" and interpret "human-readable" (a term often confused with "text") data. However, for all primitive types (and you'll need to remember this for later), the operator will skip past any "leading" whitespace at the current position in the stream, but leave any "trailing" whitespace behind, if a value is successfully read.

if you read into a 'char', for example, the stream will skip any whitespace, and then (assuming there is any data left in the source at all) read the first non-whitespace byte of the file into the variable. (Remember, a char is a byte, even when a char is not 8 bits - i.e. a byte is not necessarily 8 bits in C++! That is only a minimum value; bytes are allowed to be larger. But "on sufficiently normal systems", they are not larger.)

Quote:
Except getline() wasn't working for me. The very first getline() read an empty string. And then after that it read cartoon1, cartoon2, etc.


std::getline() reads up to the first delimiter ('\n', by default) that it finds in the file. An empty string is a perfectly valid line of text - an empty line. Thus, if the stream is at a point where the very next character is '\n', then an empty line is read.

Now, consider what happens if your file contains a human-readable number, immediately followed by '\n', and then you use operator>> to read the human-readable number. The trailing whitespace is left alone, and '\n' is a kind of whitespace, so the stream is now at a point where the very next character is '\n'. :)

Quote:
So then I switched to the >> operator and that worked.


The operator>>, when told to read into a string (either char* or std::string), will only read up to the next whitespace. If you want to read multiple words, it will not work.

Quote:
I wonder if I did it Apoch's way now in binary if getline() will work.


Whether the file is binary or not has almost no bearing on what will happen when you call std::getline(). The only thing that happens is that in text mode, \r\n sequences in the file data will be interpreted as if the file actually contained \n (so the lines that you read in will not contain \r).

Quote:
It basically worked before, it's just in text mode somewhere between the 19, the '\n' and cartoon1 it goofed up and read an empty string.


I still don't think you were understanding this properly (streams don't spontaneously change mode, which is what it sounds like you're saying), but hopefully the above has cleared things up.




If you want to avoid the problem with mixing operator>> and std::getline(), the standard recommendation is to:

1) Always use std::getline() for initial reads from the file; and then
2) If you need to extract human-readable numbers (or other stuff) out of a line, construct a std::stringstream object from the read-in line, and call operator>> on the stringstream.

This also gains you some robustness in the face of corrupt data. (Axiom: input is always harder to do than output. Of course, the part in between is usually even harder. :) ) If a line is supposed to contain a number but doesn't, the std::stringstream will "fail", but the file stream is unaffected.

Of course, that doesn't work so well if you have an operator>> overload for a class that expects to read multiple lines ;) In general, you have to Think(TM). Sorry! :)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement