Sign in to follow this  
Lode

Why was the way to load files made the way it is in C++?

Recommended Posts

If you want to read or write a file with C++, what most people probably want is functionality like string or vector = loadFile("filename.txt"), writeFile(string or vector, "filename.txt"), appendToFile(string or vector, "filename.txt") and getFileSize("filename.txt"). Why doesn't the C++ standard define functions like that, instead of the clumsy thing they now have with weird class names, unintuitive way to get the size of the file, ...? Or at least added something like this on top of the other system? I see no argument against more intuitive ways to get such things done.

Share this post


Link to post
Share on other sites
Quote:
Original post by Lode
If you want to read or write a file with C++, what most people probably want is functionality like string or vector = loadFile("filename.txt"), writeFile(string or vector, "filename.txt"), appendToFile(string or vector, "filename.txt") and getFileSize("filename.txt").

Why doesn't the C++ standard define functions like that, instead of the clumsy thing they now have with weird class names, unintuitive way to get the size of the file, ...? Or at least added something like this on top of the other system? I see no argument against more intuitive ways to get such things done.


Not everyone is using text files... How would that work with binary saved files? Or how would you make a binary file in the first place?

EDIT: Wrong word [embarrass]

Share this post


Link to post
Share on other sites

std::ifstream ifs("listofnumbers.txt");
std::vector<int> numbers( (std::istream_iterator<int>(ifs)),
(std::istream_iterator<int>()));

If you want to create a wrapper function to give it an intuitive name, then go right ahead.

Share this post


Link to post
Share on other sites
std::vector<T> vector = loadFile("filename.txt") would never work without passing lots of policies to the loadFile. How should a string be converted to T? What defines the end of an element? How is errors handled? etc. We already have something like writeFile, it's called operator<< and is used because it fits with the rest of the iostream library. If you had a writeFile function you wouldn't be able to properly do polymorphism. We don't write a vector because how should it be written? What about a string? Should it be 0-terminated or should it store its own length? zero termination could lead to problems when reading files and prefixing the length of the string could lead to readability errors. If you want functionality like this you could do this (this is just to show the idea, and it contains lots of errors):

template<class T>
class writeable_vector : public vector<T>
{
public:
friend std::ostream& operator<<(std::ostream& stream,const writeable_vector<T>& out)
{
if( this->empty() )
return stream;
for( std::vector<T>::const_iterator iter = out.begin();
iter != (out.end()-1);
++iter )
{
stream << iter << ", ";
}
stream << iter;
}
};



We don't need appendToFile either, we just use operator<< to a file which we append to. Also we don't have getFileSize because C++ doesn't have a way to get info about the file system, which is needed, it only has the ability to open and close files, and read and write to them.

I don't think your methods is any more intuitive.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Lode
If you want to read or write a file with C++, what most people probably want is functionality like string or vector = loadFile("filename.txt"), writeFile(string or vector, "filename.txt"), appendToFile(string or vector, "filename.txt") and getFileSize("filename.txt").

Why doesn't the C++ standard define functions like that, instead of the clumsy thing they now have with weird class names, unintuitive way to get the size of the file, ...? Or at least added something like this on top of the other system? I see no argument against more intuitive ways to get such things done.


You are probably new to C++ and don't seem to realize that there's a whole number of different ways to deal with the corresponding data, so a uniform way isn't necessarily ideally suited or even appropriate to be used in the first place.
However, if you are mainly interested in doing generic file I/O without wanting to spend too much time making up file formats and I/O policies, you should probably check out the various existing serialization libraries for C++ (i.e. in boost). Depending on the scope of your problem, dumping an arbitrary object to a file can be extremely simple an unverbose. Likewise, reading data from files into objects back again, can usually also be done pretty quickly.




Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
You are probably new to C++ and don't seem to realize that there's a whole number of different ways to deal with the corresponding data, so a uniform way isn't necessarily ideally suited or even appropriate to be used in the first place.
However, if you are mainly interested in doing generic file I/O without wanting to spend too much time making up file formats and I/O policies, you should probably check out the various existing serialization libraries for C++ (i.e. in boost). Depending on the scope of your problem, dumping an arbitrary object to a file can be extremely simple an unverbose. Likewise, reading data from files into objects back again, can usually also be done pretty quickly.



Actually I'm really not new to C++ (well, I don't know if using it 4 years is considered new or not). I just have always avoided weird data types. To me a file is BYTES, so I use either std::strings or std::vectors<unsigned char> to represent files, and immediatly made wrappers with the names above for it (and some extra functions to turn an int/float/... into 2/4/8/... bytes added to the end of the string or vector and vica versa), so that I never had to look back at the original C++ way and am using "loadFile" and such since the beginning I started to use C++.

I'd never trust something that somehow places ints instead of bytes into a file by itself.

So whenever I see the standard C++ way I still find it weird, and finally I had to make a post about it.

Share this post


Link to post
Share on other sites
In that case, your problem is that the C++ standards committee doesn't believe in promoting inefficient or otherwise inappropriate programming techniques. There's no single function to read a file into a string or std::vector<char> for much the same reason that there's no push_front() member function for std::vector<>: it would be making it too easy to do something that is largely inefficient. Reading the entire file as bytes into a byte oriented container like std::vector<char> would involve multiple copies of the same data into immediate subsequent buffers. C++ file stream I/O interfaces are built around the concept that the file stream itself will handle the buffering and that the user will extract the data as needed.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
In that case, your problem is that the C++ standards committee doesn't believe in promoting inefficient or otherwise inappropriate programming techniques.

Let me offer the opinion that the C++ standards committee is a bunch of nitwits who will be the first against the wall when the revolution comes.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
In that case, your problem is that the C++ standards committee doesn't believe in promoting inefficient or otherwise inappropriate programming techniques. There's no single function to read a file into a string or std::vector<char> for much the same reason that there's no push_front() member function for std::vector<>: it would be making it too easy to do something that is largely inefficient. Reading the entire file as bytes into a byte oriented container like std::vector<char> would involve multiple copies of the same data into immediate subsequent buffers. C++ file stream I/O interfaces are built around the concept that the file stream itself will handle the buffering and that the user will extract the data as needed.


I resize the std::vector to the size of the file first. Then I fill in the bytes one by one, using []. I think the problem of constant copying you mentioned doesn't happen then, right?

Share this post


Link to post
Share on other sites
No, it does. First the streambuf asks the operating system for the data, which it reads into it's own buffer and then you read from that buffer into the vector. And if you're using operator[] to fill in your vector, you've probably got a lot more unnecessary copies floating around than just that happening, as it's fairly inefficient to read from a fstream one byte at a time.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
No, it does. First the streambuf asks the operating system for the data, which it reads into it's own buffer and then you read from that buffer into the vector. And if you're using operator[] to fill in your vector, you've probably got a lot more unnecessary copies floating around than just that happening, as it's fairly inefficient to read from a fstream one byte at a time.


So... is this method here very inefficient? Sorry, I was wrong about saying that I read the bytes in one by one (though maybe it does it internally...), I just give the address of the first element.


ifstream file("file.bla", ios::in|ios::binary|ios::ate);

//get filesize
file.seekg(0, ios::end);
int size = file.tellg();
file.seekg(0, ios::beg);
size -= file.tellg();

//read contents of the file into vector
std::vector<unsigned char> buffer;
buffer.resize(size);
file.read((char*)(&buffer[0]), size);
file.close();



Share this post


Link to post
Share on other sites
Well, it's nowhere as inefficient as what I thought you were doing when you said that you were reading them in a byte at a time, but it does do the double buffer copying that I was talking about, and also suffers from the problem that vector<>::resize() zeros out the vector's data when you resize it like that, so you're writing to your vector buffer twice: once when you call resize() and once when you call read().

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Well, it's nowhere as inefficient as what I thought you were doing when you said that you were reading them in a byte at a time, but it does do the double buffer copying that I was talking about, and also suffers from the problem that vector<>::resize() zeros out the vector's data when you resize it like that, so you're writing to your vector buffer twice: once when you call resize() and once when you call read().


How to resize the vector in a way that it doesn't write zeros there? First I thought maybe with reserve(), but then I remembered reserve() doesn't actually set the size. Thanks a lot for your interesting info.

Share this post


Link to post
Share on other sites
Quote:
Original post by Arild Fines
Quote:
Original post by SiCrane
In that case, your problem is that the C++ standards committee doesn't believe in promoting inefficient or otherwise inappropriate programming techniques.

Let me offer the opinion that the C++ standards committee is a bunch of nitwits who will be the first against the wall when the revolution comes.


Consider that the origin of the spec dates back over 10 years. A lot of decisions back then made today would be different I believe - we've learned (in part from the mistakes of the C++ standards committee), we've gotten more powerful computers. I'd trifle more over std::vector< bool > specialization allowances, slow integration of the boost libraries into the main spec, etc. - instead of a safety feature, which while I've never used that I remember with the STL, has allowed me to avoid some slow operations with boost::mpl for example (where a certain container implements pop_front but not pop_back). It's easier to pull this kind of stuff in C# as well, where everything is GCed and strings are immutable. In C++ where everything is copied and strings can be modified, it suddenly becomes a bit harder to do in a timely fashion - fewer invariants you can rely on, fewer options for low cost shared data (std::auto_ptr dosn't cut the mustard, and other smart pointers have yet to make it out of tr and into std).

Quote:
Original post by Lode
How to resize the vector in a way that it doesn't write zeros there? First I thought maybe with reserve(), but then I remembered reserve() doesn't actually set the size. Thanks a lot for your interesting info.


Simple: One uses one of the container's member functions that assigns the data at the same time as changing size().

See:
std::vector::
...assign
...insert
...push_back

As well as the constructor( iterator , iterator ) version as used in SiCrane's snippet earlier.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this