boost::ptr_vector performace problems :

Started by
15 comments, last by jpetrie 16 years, 4 months ago
I think you're confusing packed arrays and interleaved arrays.
Advertisement
I don't think so...what gives you that impression?
Quote:
If you are loading from a file format like obj that stores "vertex data" like position, normal, and textureCoords separately then it is necessary to store them in an intermediate format as independent vector/arrays...in addition it seems easier to do calculations such as dynamic changes to the mesh or calculation of vertex tangents.

This gets into specific requirements, so things start to get up in the air. However, in general, data that is intended to be used for rendering should be stored in a format that is efficient for rendering, which is generally a format you should maintain as close to the native format of the render subsystem as possible, to minimize swizzling stuff around prior to render. While loading geometry from a file format, it may be neccessary to store data in intermediate forms, but that doesn't mean that should be the final form of the data if efficient rendering is the goal.

If the goal of the application is to manipulate mesh data, then yes, certain other representations may be more useful.

Quote:
Storing the data in a custom structure like "Vector3" makes it a lot easier to do arithmetic, which would be the only purpose of saving that data that I can think of. But if you don't use a vector of pointers, then you have to invoke a lot of extra constructors/assignment-ops, which is a lot less efficient...and you also cannot conveniently use the constructor when adding new objects like you can with pointers. So, it seems to me that a vector of pointers is the way to go.

You can store a contiguous array of "Vector3" objects and have all the benefits of having a nice class for your vectors, as long as the class is suitably designed and implemented such that it's representation in memory is correct. That is, something like
struct Vector3{  float x,y,z;};

with no virtuals and appropriate compiler-specific options applied to ensure tightly packed, unpadded organization, et cetera. This is quite common. There is no reason, in general, why a collection of such objects needs to be stored as a vector of pointer-to-Vector3. It's not like these things are expensive to copy, and even then, you're faced with the problem of "which hurts more, lots of copies of a couple bytes to probably-in-cache memory, or lots of copies of a slightly smaller number of bytes but frequently references to out-of-cache memory." Most modeller tools, that would be doing the frequent manipulation of mesh data the way you describe, use higher-order representations of the data anyway. Only primitive vertex-only-based modellers do extensive per-vertex manipulation in random access. Most games, when they process their vertices, do it in a highly sequential fashion that would cause the pointer-to-Vector3 approach to cache miss all over.

EDIT: Also, what do you mean by "can't use the constructor" when adding new elements? Of course you can (push_back(Vector3(x,y,z))).

Quote:
If you use D3DPOOL_MANAGED then there is no need to maintain a contiguous container in GPU form because you'll never need to pass them manually to the graphics card more than once, so there would be no point to maintaining an array in that form anyway.

There is, in fact, no point to maintaining the storage at all in many cases. But you still need a suitable representation to provide to the API that one time.
Quote:Original post by jpetrie
You can store a contiguous array of "Vector3" objects and have all the benefits of having a nice class for your vectors, as long as the class is suitably designed and implemented such that it's representation in memory is correct. That is, something like

with no virtuals and appropriate compiler-specific options applied to ensure tightly packed, unpadded organization, et cetera. This is quite common.

....
EDIT: Also, what do you mean by "can't use the constructor" when adding new elements? Of course you can (push_back(Vector3(x,y,z))).


This is something that I have debated mentally for a while, so I'm glad to hear your input.

Thanks for your input. If there is a better solution than what I'm doing, I'm all ears.

It is my understanding that in order to get those benefits, it must be a POD type, meaning that it has no non-trivial constructor. If it doesn't have this, then you can't do "push_back(Vector3(x,y,z))"

Secondly, if you do "push_back(Vector3(x,y,z))", then you first call Vector3 constructor when creating it and then call Vector3 copy constructor when assigning it. In previous time trials I have done, I found that calling the Vector3 constructor occupied a very large percentage of my processing time...far far more expensive than any of the internal vector arithmetic, so I have attempted to minimize the necessary constructor calls.

I started out by using POD types for Vector3, but it is such an inconvenience to use without a normal constructor when you want to insert into a vector like this.

Oops, I have to go...no time to finish these thoughts.
Quote:
It is my understanding that in order to get those benefits, it must be a POD type,

The "benefits" (having the type compact) need to be manually ensured regardless of whether or not the type is POD. A POD type may be aligned or padded by the compiler the same way a non-POD type may be. You don't have to worry about adding a virtual table pointer for a POD type, but you do need to worry about the other stuff. In this case there is really little reason to prefer POD to non-POD, as non-POD gives you the constructor you want.

Quote:
Secondly, if you do "push_back(Vector3(x,y,z))", then you first call Vector3 constructor when creating it and then call Vector3 copy constructor when assigning it.

Correct. You will construct the object, then the container will copy it. This happens regardless of whether or not the type is POD or non-POD. The other comparison you've been making is pointer-to-type versus non-pointer-to-type, so let's examine that case. It would look like vec.push_back(new Vector3(x,y,z)). Here we have an allocation, which involves a walk through the heap to find an appropriate block, an invocation of the appropriate constructor, and we also copy the resulting pointer within the container. The allocation is expensive, but may be avoided if we're using a fancy pooling allocation scheme somewhere, or whatever. We still have the same overhead from the construction, so that washes. Then we have the copy of the pointer.

In effect, the non-pointer option involves a copy of sizeof(float) * 3 bytes (probably 12) and the pointer option involves a copy of sizeof(Vector3*) bytes (probably 4) and a potentially expensive allocation. You're concerned about 8 bytes?

Quote:
In previous time trials I have done, I found that calling the Vector3 constructor occupied a very large percentage of my processing time

Without the actual benchmark this doesn't mean much, but I would even contend that it's information that washes. Constructing the vector must be done, period, in both the non-pointer and pointer approaches. Given the same implementation of Vector3 for both tests, if the test is structured correctly, then you should spend the same amount of time constructing the Vector3. Copying the Vector3 will be more efficient on the order of (typically) eight bytes per copy. I'm skeptical that the performance impact of eight extra bytes on a collection of vertices is worthwhile, especially if you are going to be doing extensive linearized processing of the vertices.

And again, the bulk of the processing done on buffers of vertices is linear traversal. It is rare that you are actually moving vertices around within the vertex buffer itself with high frequency. That kind of operation tends to be present in simplistic point-and-poly based modelling tools, and even then you'd probably be better off moving the indices in an index buffer around, rather than the vertices themselves, for most operations. Even then there is little use for moving the vertices around beyond adding and removing them, except for mesh optimization, which is almost always going to be an offline process where speed is irrelevant and code clarity is more important. This is why modelling tools typically use different internal representations to actually work on, and if that's what you're doing, this discussion doesn't quite pertain.

Quote:
I started out by using POD types for Vector3, but it is such an inconvenience to use without a normal constructor when you want to insert into a vector like this

Now I'm confused again. If the type is POD, it can't have a constructor. But earlier you said "and you also cannot conveniently use the constructor when adding new objects like you can with pointers" which leads me to believe you've got a disconnect somewhere. Whether or not you're involving pointers is orthogonal to being a POD type. If the type is a POD, is has no constructors you can use, even if you allocate it via new.
Quote:This is why modelling tools typically use different internal representations to actually work on, and if that's what you're doing, this discussion doesn't quite pertain.


In fact this is what I am doing, storing the data in a format that is easiest for dynamic changes to the mesh -- such as optimizations, deformations, etc.

You say that in this case the discussion does not pertain, but if not doing this, then it seems there is no reason to maintain the vertex data in a std::vector at all...since it would be stored in a vertex buffer or display list that is managed by the graphics API already.

I have just done some timing tests, and you are absolutely right. In fact, it turns out that populating a std::vector of Vector3 is twice as fast as with Vector3*, and additionally, accessing the Vector3 data is twice as fast when not using pointers as well...not to mention that there is no need to delete that memory or deal with additional pointer dereferencing.
Quote:
You say that in this case the discussion does not pertain, but if not doing this, then it seems there is no reason to maintain the vertex data in a std::vector at all...since it would be stored in a vertex buffer or display list that is managed by the graphics API already.

The concept of a "managed pool" in the same sense that D3D9 has does not exist in all graphics APIs, and furthermore is not always the most optimal storage location for vertex data. In which case, the buffer may need to be kept around.

This topic is closed to new replies.

Advertisement