Vertex to Integer Key

Started by
3 comments, last by KreK 18 years ago
Hello all. I'm working on a 3D file loader and want to quickly discard duplicated vertices as they are being loaded. For this purpose, every vertex is being placed into an std::map. This requires me to create a 'key' value for every inserted vertex. After a bit of experimenting, I found out that this:

__int64 key =	((__int64&)px ^ 333667 * (__int64&)py ^ 333667 * (__int64&)pz ^ 333667) *
		((__int64&)nx ^ 333667 * (__int64&)ny ^ 333667 * (__int64&)nz ^ 333667) ^
		((__int64&)tu ^ 333667 * (__int64&)tv ^ 333667);
position (px,py,pz) normal (nx,ny,nz) texturing (tu,tv) seems to work for a limited set of my 3D files, but I have doubts about that method. The magic numbers are just some primes that seem to work. The problem is that the above key works only for position/normal/texture coordinates combo, and needs to be changed significantly for different types of vertices, you cannot just cull away certain attributes for simpler vertices. Can anyone please point me at the right way (fast, simple and robust) to convert a vertex with arbitrary data into integer key value? Or perhaps the above method is the right way?
Advertisement
Do you have more duplicate vertices than unique ones? If not, it's not worth it performance-wise to find the duplicates, as your hardware can (in most cases) render a reasonably sized model faster than you can do the duplicate searching.

If memory consumption is an issue, then I would do the duplicate removal in authoring phase already, and only if that's not possible, then at load time.

Niko Suni

Using X file Panda exporter for 3ds max, exporting a teapot with some insane tessellation may produce a file where over 50% of all vertices are just duplicates. Culling duplicates while loading such a file makes the difference between staying below the 16bit index range and having a model with hundreds of thousands of vertices.

And the above method with inserting vertices into an std::map is -fast-. Before using a map, I was inserting vertices into a vector, comparing each new vertex with all the previous ones. With the map, the loading of a 250 000 vertex model went down to less than 3 seconds compared to over 15 seconds for manual comparison of vertices per vertex attributes.

The problem with the above key is that vertices have to be -exact- on binary level. Ideally, with float values, we would like to cull vertices within a certain tolerance threshold. But the method works fine as a first step, I can perform additional 'vertex welding' on the resulting (and much smaller) vertex buffer.

Off course, I agree with you, this loading, culling and welding is part of our editor application. The final client will use binary dumps of DirectX vertex buffers for fastest possible loading.

I was just wandering what methods other people are using to convert vertices to such hash key values. From the limited number of test files, the above key value seems to work, but I would hate if one day our clients discover deformed objects or missing parts thanks to an incorrect key.

Quote:Original post by KreK
I was just wandering what methods other people are using to convert vertices to such hash key values. From the limited number of test files, the above key value seems to work, but I would hate if one day our clients discover deformed objects or missing parts thanks to an incorrect key.


Just a thought here: are you using your hash as only classifier?

Hashing functions by definition reduce the domain size. As such, the calculated hash will, and never can, be unique.

Seeing you use 64 bit ints, are the vertices really ints, or are you using just a convenient cast for what is actually floats?

The duplicate check as I would do it, would be something like this:
-calculate hash H(O) for object O
-if hash H(O) exists in table:
--- for every object x with hash H(x) identical to H compare x with O
----- if x is identical to O, discard it
-else add {H(O), O} to table

Another improvement over original might be, using the knowledge about individual objects. The original value domain for coordinates, normals and textures will vary greatly. Coordinates will be in range -10000..10000, normals in -1.0 .. 1.0, and textures most likely 0.0 .. 1.0. If that is the case, can you use this knowledge to create 3-tiered hash table? For example, normals will most likely be evenly spaced across the surface of sphere. Texture coordinates will probably have many elements with values of (0,0) , (0,1) , (1,0( and (1,1). This can skew hashing distribution.

First tier is hashed according to coordinate. That table contains second table, hashed according to normals, which contain table hashed according to textures, that one contains list of all objects with identical hash.

When optimizing, looking at distribution of lengths of this last table will give you direct feedback at how efficient your hash functions are.

Your ideal hashing will seek to minimize the difference between lengths of last tier list.
Antheus, thanks for the tips.

I like your idea about a 3-tiered hash table, I'll have to do some further research on that subject.

And yes, in the example of my integer key, I am casting vertex float attributes to their integer representation.

This topic is closed to new replies.

Advertisement