Convenient way to read data from binary files (source included)

Started by
9 comments, last by Void 16 years, 2 months ago
I have been working on a piece of code that reads data from a binary file using the std::ifstream class. While working on the code, I couldn't help but feel that there must be a more convenient way to work with binary files, so I came up with this little class:

class BinaryFile : public std::ifstream {
public:
	explicit BinaryFile(const std::string &name):std::ifstream(name.c_str(), std::ios::binary) {}

	void readChunk(void *data, int size) {
		std::ifstream::read((char*)data, size);
	}

	// This hides the base class version of read()
	template <class T>
	void read(T &data, int count = 1) {
		readChunk(&data, count * sizeof(T));
	}

	template <class T>
	void read(std::vector<T> &vec, int count) {
		vec.resize(count);
		read(vec[0], count);
	}
};


Using this class, the following code:

ifstream file("file.dat", ios::binary);
int i;
file.read((char*)&i, sizeof(int));
float f[50];
file.read((char*)&f, 50 * sizeof(float));
vector<double> vec;
vec.resize(20);
file.read((char*)&vec[0], 20 * sizeof(double));
becomes this:

BinaryFile file("file.dat");
int i;
file.read(i);
float f[50];
file.read(f, 50);
vector<double> vec;
file.read(vec, 20);
Much more convenient and less error-prone. Hopefully this code will be useful to someone. Any comments are appreciated. Also, is there a library that achieves something similar? EDIT: Ironically, I just now thought about looking at boost.serialization, which seems to have the benefits of my class, and much more. I suppose the only possible advantages of my class are simplicity and shorter compile times (I heard that boost.serialization significantly increases compile times). Of course, boost.serialization achieves a completely different goal (namely, serialization), so the comparison is rather stupid, but for the simple task of reading binary files, the above advantages may be relevant. Still, any comments are appreciated, and I still hope that someone will find this useful. [Edited by - Gage64 on February 7, 2008 7:34:16 AM]
Advertisement
That's actually a somewhat dangerous way for read() to work. If you want to read into an array, passing a reference to the first element is counterintuitive. Also, a non-destructive read into containers would be nicer.

Also, I would accept char* in readChunk, and do the cast when passing to it... void*, just say no. Actually, then you can just use the base version. It's hidden, you say? That's why you use it. ;)

Off the top of my head:

struct BinaryFile : std::ifstream {	explicit BinaryFile(const std::string &name):std::ifstream(name.c_str(), std::ios::binary) {}	using std::ifstream::read;	template <typename T>	void read(T& data) {		read(reinterpret_cast<char*>(&data), sizeof(T));	}	// Er, I *think* this template will be preferred for pointer types...	// I hope you weren't hoping to read address values from file ;)	template <typename T>	void read(T* data, int count = 1) {		read(reinterpret_cast<char*>(data), sizeof(T));	}	template <typename C, typename T>	void appendTo(C& container, int count) {		// my <algorithm>-fu fails me here :(		for (int i = 0; i < count; ++i) {			T data;			read(data);			container.push_back(data);		}	}	// Specialized for performance...	template <typename T>	void appendTo(std::vector<T>& container, int count) {		int oldsize = container.size()		container.resize(oldsize + count);		read(&(container[oldsize]), count);	}};

1) I explicitly avoided the using statement because I don't want the base class version to be available (that's why I added the comment). I don't really have a strong reason for this, I just prefered the readChunk() method.

2) Using a void* instead of a char means the user doesn't have to cast to char*, which makes it more convenient to use. In this case, I don't think using char* will make it any safer.

3) The function that takes a T* doesn't allow you to read an array without specifying it's size (because T resolves to T* and not T[known size]). Also, I don't understand what you mean by "passing the first element of the array". You pass the array's address (i.e., it's name), just like you would with the base class version of read() (see the source snippet in my post).
Your code isn't endian-safe. Although the code will work flawlessly, it'll produce invalid results if files created between different platforms are used.

There is no error checking. You'll happily be reading, blissfully ignorant of the fact that read is failing. After you're done, everything will be fine, but your variables will contain garbage.

Quote:// my <algorithm>-fu fails me here :(


Inserters are useful for generic use. Unfortunately they might not be supported equivalently. For certain containers, they might be somewhat less efficient.

For containers, you can also use begin() and end() to read ranges. It depends on style.
BinaryFile file("file.dat");std::string s;file.read(s); // boom
Everything said so far is true, but I just want to say that this class was created for the very specific purpose of simplifying code that reads binary files (such as a model loader, which is what I was using it for), therefore I didn't think about things like being endian-safe or reading into std::strings because I didn't need it (I don't have an excuse for ommiting error checking :).

So, as you have pointed out, this class is basically a toy, but one that can be useful in some limited situations. My loading function has become much cleaner since I converted to this class, and I plan on using it in the future. I still think it can be useful despite it's limitations, mostly because of it's simplicity.

If you have any more comments or suggestions for improvement, I would be happy to hear them.
At the very least you should add a BOOST_STATIC_ASSERT(boost::type_traits::is_pod<T>::value) to your read function.

Also, consider refactoring your class around a non-member function interface to ease composition for additional types.
Quote:Original post by SiCrane
At the very least you should add a BOOST_STATIC_ASSERT(boost::type_traits::is_pod<T>::value) to your read function.


That looks very cool [smile], but does is_pod work with non built-in types (i.e., simple structs)? If so, how can the template identify something like that? (just out of curiosity)

Quote:Also, consider refactoring your class around a non-member function interface to ease composition for additional types.


That sounds like a good idea, but I'm not sure how to do that. Did you mean something like a global function template that can be specialized for specific types?
Quote:Original post by Gage64
That looks very cool [smile], but does is_pod work with non built-in types (i.e., simple structs)? If so, how can the template identify something like that? (just out of curiosity)
You might take a look at the documentation for is_pod<>. It includes some information on what types is_pod<> can and cannot accurately classify, and under what conditions.
Regular function overloading would probably be best. The idea is to created a unified extensible interface that can be used by clients. For example, let's say you removed the read() member functions. You could instead specify the interface like:
void read(BinaryFile & bf, char & data) {  bf.readChunk(&data, count * sizeof(char));}void read(BinaryFile & bf, short & data) {  bf.readChunk(&data, count * sizeof(short));}//... more definitions for primitive types.

Now you can build read functions for new objects in terms of primitives. Ex:
template <typename T1, typename T2>void read(BinaryFile & bf, std::pair<T1, T2> & data) {  read(bf, data.first);  read(bf, data.second);}template <typename T>void read(BinaryFile & bf, std::vector<T> & data) {  size_t elements;  read(bf, elements);  data.resize(elements);  for (unsigned i = 0; i < elements; i++) {    read(bf, data);  }}

If you didn't inherit publicly from std::ifstream you could even rename the read() functions operator>>().

This topic is closed to new replies.

Advertisement