[Solved] std::vectors and reading from files

Started by
11 comments, last by MENTAL 17 years, 6 months ago
OK, I've discovered a bottleneck in my model loading code. At the moment, I am currently reading in each element (vertex, triangle, bone, etc) one at a time and storing them in a vector. As you can see from the code below, this isn't very efficient as it causes a lot of hard drive thrashing and lots of memory allocation inside m_object_list. Example code

MyObject object;    // Simple struct with no inheritance.

fread (&object, sizeof (object), 1, file);

m_object_list.push_back (object);


The memory realocation issue can be solved by calling reserve on m_object_list with the number of objects it is going to contain. This leaves the disk thrashing issue. I am unsure how to solve this one without the creation of a lot of temporary memory or a lot of wasted copies. The only thing I can come up with is to create an array of MyObject using new, fread the data into that object, and then go through each object and call m_object_list.push_back on it (obviously call reserve on m_object_list first). However, this still quite slow, as you are having to duplicate the object data and then delete the temporary array afterwards. Ideally, I would like a way of telling m_object_list to set its data as the temporary array, providing you tell it how many elements are in the array. Something like:

MyObject *objects = new MyObject [header.object_count];

fread (objects, sizeof (MyObject), header.object_count, file);

m_object_list.setdata (objects, header.object_count);    // Made-up function.


There would be no need to delete objects as the ownership of that memory has been passed to m_object_list. Does anyone have any ideas? When loading a couple of models the performance issues are negligable, but some scenes have over 200 :/. I also have other binary file formats and am experiencing the same issues with them too. Thanks in advance [Edited by - MENTAL on September 28, 2006 11:26:23 AM]
Advertisement
std::vector<Object> objects(count_of_objects);
fread(&objects[0], sizeof(Object), count_of_objects, file);

or

std::vector<Object> objects(count_of_objects);
for (int i=0; i<count_of_objects; i++)
{
fread(&objects, sizeof(Object), 1, file);
}
Instead of calling .reserve(), you could instead call .resize() on the vector, and then pass the address of the first element to fread(). std::vector<>s are guaranteed to be in one contiguous chunk, so this is perfectly safe.
m_object_list.resize(header.object_count);fread (&m_object_list[0], sizeof (MyObject), m_object_list.size(), file);
If your default constructor does nothing, the .resize() call shouldn't really cost much at all.

As a side note, it really concerns me seeing an object being read straight from a file. If the object is plain-old-data, and doesn't contain pointers, I suppose it's okay, but it just sets my warning alarm off. Also, I'd suggest using std::fstream instead, and then perhaps overloading the >> operator so that you don't care if the file is an exact binary representation of the object or not; the overloaded >> operator would take care of the details.

Also, I honestly have to say that I have some doubt that multiple fread() calls are really going to be much slower than a single long fread() call. Perhaps it's worth a try, but I wouldn't be too optimistic.
"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke
Reading the data from the file is a bottleneck a few orders of magnitude more severe than copying data from one region of memory to another. At any rate, unless you request it not be so, file reads are buffered. Individual calls to fread do not cause individual file accesses... although you should use an ifstream here anyway. They are explicitely buffered.

Regarding the container itself, if you do not know the number of objects in advance, loading them into a deque is faster than loading into a vector -- although that probably won't help much for model data that you expect to pass directly to graphic APIs (since they need to be in a contiguous buffer). Moving the data from the deque to the vector once you're done loading may give you a speed improvement by reducing the number of memory allocations (as well as limit memory fragmentation), but in most implementations, it shouldn't really save on copying (if vector doubles its storage on reallocation). reserve() obviously works better if you know the number of elements

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
Argh I completely overlooked the resize function!!

Thank you all for the very swift replies. I know using fread is less safe than using fstreams, but I'm currently porting pre-STL/Boost code to STL/Boost, and the freads are the last of my worries compared to the state the rest of the code is in.

As for the objects, are indeed simple plain old data - nothing but floats and the very occasional int.

Thanks again :)
Hello,

To me, the problem comes from the fact that you are storing into your std::vector a structure that was allocated on the stack : The vector will then have to create a copy of this structure in order to keep track of it.
In your scenario, your structure will be allocated once in your function's stack, then another time by your vector, and then freed when you exit your function.

As you mentioned, a better approach might be to store the structures' pointer into your std::vector --rather than the structure themselves. I must admit that I do not understand when you say "(...)this still quite slow, as you are having to duplicate the object data and then delete the temporary array afterwards." though...
To me, you do not duplicate anything if you proceed like this:

MyObject *object = malloc...
std::vector<MyObject *> m_object_list;
m_object_list.push_back(object);

(...)
free(object);

The only things you can duplicate are the MyObject pointers values...
My two cents.
StratBoy61
Quote:Original post by StratBoy61
Hello,

To me, the problem comes from the fact that you are storing into your std::vector a structure that was allocated on the stack : The vector will then have to create a copy of this structure in order to keep track of it.
In your scenario, your structure will be allocated once in your function's stack, then another time by your vector, and then freed when you exit your function.

As you mentioned, a better approach might be to store the structures' pointer into your std::vector --rather than the structure themselves. I must admit that I do not understand when you say "(...)this still quite slow, as you are having to duplicate the object data and then delete the temporary array afterwards." though...
To me, you do not duplicate anything if you proceed like this:

MyObject *object = malloc...
std::vector<MyObject *> m_object_list;
m_object_list.push_back(object);

(...)
free(object);

The only things you can duplicate are the MyObject pointers values...
My two cents.
StratBoy61


Oh my god, please tell me you're kidding.

0) The duplication the OP was talking about would arise from reading into a single allocated chunk, and then copying the chunk into the vector's memory allocation.
1) Previous posters illustrated that you can fread directly into the vector's memory allocation, which is guaranteed to be contiguous.
2) This is C++; we don't use malloc() any more.
3) You're going to add, per object, an extra pointer's worth of memory, and also totally destroy compatibility if he actually needs a vector of actual structures (quite likely if working with a graphics API).
4) Your solution to avoiding a copy from the stack to an already-allocated chunk of memory, is to dynamically allocate a tiny chunk, write to the chunk directly, and set a pointer in the already-allocated chunk of memory to the new allocation. That's actually going to be much slower, and cause huge amounts of fragmentation.
5) You free 'object', which rather suggests freeing immediately, which would be a huge mistake.

Quote:Original post by Agony
If the object is plain-old-data, and doesn't contain pointers, and you'll never have to worry about endianness, and your struct padding is consistent, and you don't mind casting the structure's data members in stone that much more (consider what happens if you want to add a cache for a frequently calculated value) I suppose it's okay, but it just sets my warning alarm off.


[smile]
Quote:Original post by Zahlman
Oh my god, please tell me you're kidding.

0) The duplication the OP was talking about would arise from reading into a single allocated chunk, and then copying the chunk into the vector's memory allocation.
1) Previous posters illustrated that you can fread directly into the vector's memory allocation, which is guaranteed to be contiguous.
2) This is C++; we don't use malloc() any more.
3) You're going to add, per object, an extra pointer's worth of memory, and also totally destroy compatibility if he actually needs a vector of actual structures (quite likely if working with a graphics API).
4) Your solution to avoiding a copy from the stack to an already-allocated chunk of memory, is to dynamically allocate a tiny chunk, write to the chunk directly, and set a pointer in the already-allocated chunk of memory to the new allocation. That's actually going to be much slower, and cause huge amounts of fragmentation.
5) You free 'object', which rather suggests freeing immediately, which would be a huge mistake.

Yeah, I am kidding...
No, seriously folks, the problem is that we do not know what the structure MENTAL mentions looks like ; does it make 1 byte, 1Ko, 1Mo ?
My solution was just to avoid duplicating the structure allocated on the stack.
That's all folks, I did not mean anything else... obvisouly, if the structure is 1 int and 1 float, yeah I am kidding, I do not see the point of dynamically allocating it. It all depends on the nb_of_elements/size_of_struct ratio.
Nevertheless, it still makes sense to me if we create an array of this structure and store the pointers to each elements rather than copying them... which is to me pretty much what freading into a vector does.

As for malloc and the structure, please let me know what sophisticated developers use.
Cheers
StratBoy61
> As for malloc and the structure,
> please let me know what sophisticated developers use.

new/delete?
StratBoy61, I believe you are confused...

The std::vector<MyObject> is created on the stack, but all the data contained in the vector is created on the heap (the array of MyObject's). The vector class will most likely only take up a few bytes on the stack (enough for a pointer, size variable, and maybe some other small data).
<a href="http://www.slimcalcs.com>www.slimcalcs.com

This topic is closed to new replies.

Advertisement