[c++] Find item position in a set

Started by
5 comments, last by Promethium 13 years, 3 months ago
I'm parsing a .obj file to create directx10 vertex and index buffers.

The .obj format consists of lists of vertex positions, normals and texture coords, followed by a list of triangles made up of indices into these lists. A triangle is defined by three groups of three ints, indices into the lists for vertex position, normal and uv for each of the three corners.

It's easy to do if I create a new vertex for each triangle corner, but this results in a lot of duplicate vertices. I'm trying to optimise my vertex buffers by only including unique vertices but it's proving difficult.

The way I was thinking was to iterate through each triangle corner, putting the indices into a std::set of structs holding the three ints. I would then end up with a set of unique vertices (or at least unique structs of indices into arrays holding the data from which I can create the vertices). I then iterate through the triangle list again, finding the appropriate item in the set and recording it's position in the set which would give me my indices for the index buffer. All well and good.

However...... looking at the documentation for std::set it doesn't look like I'm able find the position of an item in a set. Is there any way to do this - create a set of unique structs and then find their final position in the set without having to iterate over the whole set every time?

Or is there a completely different, better way of doing it that I've not considered?
Advertisement
Sets don't support random access, so you'd have to use something like std::distance, which as you mentioned can get quite expensive if used over and over since it's essentially a linear search for each element -- O(n2). Alternatively, knowing that the set elements are ordered, you can copy the data into a vector and look up each vertex using std::lower_bound, which has logarithmic complexity and will be cheaper -- O(n ln n). Plus you probably want your final vertex list in a vector anyway, so you can just keep it around for later.
I've had a bit more of a think and came up with this:

Create my set as before, so that I only have unique vertices (rememeber at this point a 'vertex' is just a structure holding a group of indices into the data arrays, it's not holding the actual data), then iterate through this copying the items into a std::map<vertex, int> where the int is the position of the iterator for that item, and also into a std::vector<vertex>.

I can then iterate through the triangle list, using each vertex to get the index number quickly from the map (so I have my list of indices to put into the index buffer) and when I come to create my (proper, data-filled) vertices for the vertex buffer I can use these indices to index quickly into the vector.

This way, I only need to iterate through an entire collection once, when I use the set to populate the map and vector.
You'll want something along the lines of....

struct VertexFaceIndex{  inline VertexFaceIndex(uint32_t _v, uint32_t _n, uint32_t _t){ v=_v; n=_n; t=_t; newIndex=0; }  inline VertexFaceIndex(const VertexFaceIndex& toCopy){ v=toCopy.v; n=toCopy.n; t=toCopy.t; newIndex=toCopy.newIndex; }     uint32_t v;  union  {    uint64_t combined;     struct     {      uint32_t n;      uint32_t t;    };  };  uint32_t newIndex;    inline bool operator < (const VertexFaceIndex& vfi) const  {    if(v<vfi.v)       return true;    return combined < vfi.combined;  }};typedef std::vector< std::set<VertexFaceIndex> > VertexFaceMap;typedef std::vector< VertexFaceIndex > VertexFaceIndices;// temporary look up structure, just used when generating the new arraysVertexFaceMap vertexMap;// the final output vertex and index arraysVertexFaceIndices finalVerts;std::vector<uint32_t> finalIndices;// call this to initialise the above stuctsvoid initialise(uint32_t numVertices){  // use the 'v' as a key to find previously used elements - therefore we need one entry for each vertex  vertexMap.resize(numVertices);    // just reserve some mem here. The final arrays will end up much bigger   // (i.e. they will be re-allocated at least once), however by setting an initial size it   // should reduce the number of re-allocations to something sane (performance wise)  finalVerts.reserve(numVertices);  finalIndices.reserve(numVertices);}// call this for every v/n/t combinationvoid insertVertexFace(uint32_t v, uint32_t n, uint32_t t){  // construct a temp vertex face reference  VertexFaceIndex temp(v, n, t);    // see if this set matches any previous entry that used the specified vertex index  std::set<VertexFaceIndex>::iterator it = vertexMap[v].find( temp );    // if not found....   if( it == vertexMap[v].end() )  {    // generate a new entry in the final vertex list (and keep track of the index used)    temp.newIndex = (uint32_t)finalVerts.size();    finalVerts.push_back(temp);        // store the index for this entry in the index list    finalIndices.push_back(temp.newIndex);        // add an entry into our map (to enable us to find it quickly again)    vertexMap[v].insert(temp);  }  else  {    // an entry for the element exists, so we just need to re-use the previous index...     finalIndices.push_back(it->newIndex);  }}// done!
Why not use the string containing the face indices as an index into a map of indices? The face index string is garanteed to be unique if the vertex is unique.

Here is what I use in my OBJ loader:
for( size_t e = 0; e < tokens.size() - 1; ++e ) {	std::string const& token = tokens[1 + e];	std::tr1::unordered_map< std::string, size_t >::const_iterator index_it = face_map.find( token );	if( index_it == face_map.end() ) {		if( token.empty() ) {			continue;		}		*** Parse face ***		size_t index = vertices.size();		face_map[token] = index;		face_indices.push_back( index );		vertices.push_back( unique_vertex );	} else {		face_indices.push_back( index_it->second );	}}

Here tokens is the "f a/b/c ..." line split by whitespace. It's maybe not the fastest way to do it, but parsing OBJ files is slow anyway and this should be done as an offline pre-processing step where the OBJ is converted to a easier-to-use format.
Quote:Original post by Promethium
Why not use the string containing the face indices as an index into a map of indices? The face index string is garanteed to be unique if the vertex is unique.


It would work, but by god is that in-efficient! Let's say we have a model of 100,000 triangles. That's a potential maximum of 300,000 std::strings you'll be storing as keys into a map. Every one of those entries will be performing a new/delete, and a memcpy above what is actually required.

let's then assume our worst case key would be "100000/100000/100000". That's 20 bytes to store 12 bytes of data. So each strcmp will then be performing upto 20 comparisons for each character in the string, compared to the 2 comparisons I used in my method. That additional amount of work done will soon start to add up when you tally up all of the assets used within a modestly sized game.....

Quote:Original post by Promethium
It's maybe not the fastest way to do it, but parsing OBJ files is slow anyway and this should be done as an offline pre-processing step where the OBJ is converted to a easier-to-use format.


No, it's the *slowest* way to do it ;)

Whilst I agree it should be offline, I disagree that it's something you can code sloppily. It may make next to no difference to you, however any artist(s) working with you are soon going to tire of the turn around times to update any assets they are working on. It will become a bottle neck in *their* productivity.....

If given a choice between a 'relatively efficient' and 'screaminginly in-efficient' method, if the difference in the amount of code required between either method is negligible, you may as well plump for the more efficient method.

There is also one final point I'll add. Comparing the n and t indices more often than not, doesn't compress the array as much as you'd think. There are a lot of modelling packages that spit out duplicate normals (Maya tends to do this a lot!). You can compress the data further by comparing uvs[t] and normals[n] instead.... That's something that can't be done if you are comparing the strings only. ;)
@RobTheBloke I won't bother making a big discussion out of this, but I have to disagree with some of your points.


Quote:Original post by Promethium
Why not use the string containing the face indices as an index into a map of indices? The face index string is garanteed to be unique if the vertex is unique.


It would work, but by god is that in-efficient! Let's say we have a model of 100,000 triangles. That's a potential maximum of 300,000 std::strings you'll be storing as keys into a map. Every one of those entries will be performing a new/delete, and a memcpy above what is actually required.

let's then assume our worst case key would be "100000/100000/100000". That's 20 bytes to store 12 bytes of data. So each strcmp will then be performing upto 20 comparisons for each character in the string, compared to the 2 comparisons I used in my method. That additional amount of work done will soon start to add up when you tally up all of the assets used within a modestly sized game.....

Yeah, true, I use a lot more memory (but who cares about memory when doing offline processing on a PC?) and I do more comparisons. BUT, I do absolutely NO work on indices that I will end up rejecting, whereas you will have to tokenize and convert three strings to ints and then just throw all that work away. I actually did a quick test* where I rewrote my code to use a 64bit integer hash (21 bits per vertex/texcoord/normal index) and stored that instead of the string and it was SLOWER than storing the string, because of the extra work done.

Let's analyse this a bit: In a smooth, closed mesh each vertex will be shared by at least three triangles. So at least 2/3 of the indices in such an OBJ are rejected. In a more complex mesh with non-smooth parts the ratio may be lower, e.i. less vertices are shared, but it is still reasonable to assume that on average each vertex will be shared by at least 2 triangles and probably more. So that is at least 50% wasted work you are doing when parsing face lines.

Also, at least in the STL version used by microsoft, std::string have an constant buffer of 16 char's and only allocates if the length of the stored string is greater than 16. So for smallindices new/delete will not be called.


Quote:Original post by Promethium
It's maybe not the fastest way to do it, but parsing OBJ files is slow anyway and this should be done as an offline pre-processing step where the OBJ is converted to a easier-to-use format.


No, it's the *slowest* way to do it ;)

My test shows otherwise. Do you have any numbers to back up your claim?


Whilst I agree it should be offline, I disagree that it's something you can code sloppily. It may make next to no difference to you, however any artist(s) working with you are soon going to tire of the turn around times to update any assets they are working on. It will become a bottle neck in *their* productivity.....

If given a choice between a 'relatively efficient' and 'screaminginly in-efficient' method, if the difference in the amount of code required between either method is negligible, you may as well plump for the more efficient method.

I disagree in calling my method sloppy. It is the simplest (KISS) and it is still faster than the (slightly) more complex integer hash method. Screaminginly in-efficient? Hardly.


There is also one final point I'll add. Comparing the n and t indices more often than not, doesn't compress the array as much as you'd think. There are a lot of modelling packages that spit out duplicate normals (Maya tends to do this a lot!). You can compress the data further by comparing uvs[t] and normals[n] instead.... That's something that can't be done if you are comparing the strings only. ;)

This is a completely different argument. I think I would do this in a post-processing step, since the uniqueness of a vertex depends on all three components.

* I tested using the Sponza model, which is 20MB, ~450,000 text lines. When mapping on the string it took ~12s to parse (all included, also materials), with the integer hash it took ~16s. The largest index token is 19 char's.

This topic is closed to new replies.

Advertisement