fast vector class (source) * not so fast

Started by
78 comments, last by Abominacion 17 years, 1 month ago
Quote:Original post by _swx_
And it's much better to use templates for creating vector classes, then you can have vectors of arbitrary size and only need to code a single class :)


Yes, more flexible, but you'd lose all the hardcoded and highly optimized SSE code (and that's the whole point).

Advertisement
^^^

Not only that, but how many differently sized vectors have you used? Almost everything is done with 3 & 4 dim, some with 2, and I've never had the need for anything over 4. If I did, I'd probably be doing some sort of heavy math related work, in which case I would just some some random math package like pari/gp or matlab or something.

You're not even realy getting flexability by using templates.
Quote:
And RE: the overloaded == function, you may want to take an epsilon into account:
Quote:
i dont quite understand what that would be good for?



Well, in a simple example (and assuming in the context of a 3D game), what's the difference between A and B?:

vec3 A(0.000000089f, 0.00000012f, 0.0000000f);vec3 B(0.0f, 0.0f, 0.0f);


A and B are so close that they can be considered to be the same. Your code does not allow for that.

And now for a slightly more complex example:

...typedef vertex_t vector_t;struct normal_vertex_t{// GL_N3F_V3F   vector_t n;   vertex_t v;};...void Model_md2::GenerateShadowVolume(){   bool* bEdgeUsed = NULL;   int i = 0;   bEdgeUsed = new bool[3 * nTriangles];   for (int j = 0; j < 3 * nTriangles; bEdgeUsed[j++] = false);   volume = new triangle_face_normals_t[4 * nTriangles];   for (int t1 = 0; t1 < nTriangles; ++t1)   {      vector_t n1 = CalculateNormal(vertices[triangles[t1].vertex[0]].v, vertices[triangles[t1].vertex[1]].v, vertices[triangles[t1].vertex[2]].v);      volume.vertex[<span class="cpp-number">0</span>] = triangles[t1].vertex[<span class="cpp-number">0</span>];<br>      volume.n[<span class="cpp-number">0</span>]      = n1;<br>      volume.vertex[<span class="cpp-number">1</span>] = triangles[t1].vertex[<span class="cpp-number">1</span>];<br>      volume.n[<span class="cpp-number">1</span>]      = n1;<br>      volume.vertex[<span class="cpp-number">2</span>] = triangles[t1].vertex[<span class="cpp-number">2</span>];<br>      volume[i++].n[<span class="cpp-number">2</span>]      = n1;<br><br>      <span class="cpp-keyword">for</span> (<span class="cpp-keyword">int</span> t2 = (t1 + <span class="cpp-number">1</span>); t2 &lt; nTriangles; ++t2)<br>      {<br>         vector_t n2 = CalculateNormal(vertices[triangles[t2].vertex[<span class="cpp-number">0</span>]].v, vertices[triangles[t2].vertex[<span class="cpp-number">1</span>]].v, vertices[triangles[t2].vertex[<span class="cpp-number">2</span>]].v);<br><br>         <span class="cpp-keyword">for</span> (<span class="cpp-keyword">int</span> v1 = <span class="cpp-number">0</span>; v1 &lt; <span class="cpp-number">3</span>; ++v1)<br>         {<br>            <span class="cpp-keyword">if</span> (bEdgeUsed[t1 * <span class="cpp-number">3</span> + v1] == <span class="cpp-keyword">false</span>)<br>            {<br>               <span class="cpp-keyword">for</span> (<span class="cpp-keyword">int</span> v2 = <span class="cpp-number">0</span>; v2 &lt; <span class="cpp-number">3</span>; ++v2)<br>               {<br>                  <span class="cpp-keyword">if</span> (bEdgeUsed[t2 * <span class="cpp-number">3</span> + v2] == <span class="cpp-keyword">false</span>)<br>                  {<br>                     <span class="cpp-keyword">int</span> v1_next = (v1 + <span class="cpp-number">1</span>) % <span class="cpp-number">3</span>; <span class="cpp-comment">/* Wrap around. */</span><br>                     <span class="cpp-keyword">int</span> v2_next = (v2 + <span class="cpp-number">1</span>) % <span class="cpp-number">3</span>; <span class="cpp-comment">/* Wrap around. */</span><br><br>                     normal_vertex_t pEdge1[<span class="cpp-number">2</span>], pEdge2[<span class="cpp-number">2</span>];<br>                     pEdge1[<span class="cpp-number">0</span>] = vertices[triangles[t1].vertex[v1]];<br>                     pEdge1[<span class="cpp-number">1</span>] = vertices[triangles[t1].vertex[v1_next]];<br>                     pEdge2[<span class="cpp-number">0</span>] = vertices[triangles[t2].vertex[v2]];<br>                     pEdge2[<span class="cpp-number">1</span>] = vertices[triangles[t2].vertex[v2_next]];<br><br>                     <span class="cpp-keyword">if</span> (pEdge1[<span class="cpp-number">1</span>].v == pEdge2[<span class="cpp-number">0</span>].v &amp;&amp; pEdge1[<span class="cpp-number">0</span>].v == pEdge2[<span class="cpp-number">1</span>].v) <span class="cpp-comment">//// This test is where it really matters! //////</span><br>                     {<br>                     <span class="cpp-comment">/* Build a quad. */</span><br>                        volume.vertex[<span class="cpp-number">0</span>] = triangles[t1].vertex[v1]; <span class="cpp-comment">/* Triangle 1. */</span><br>                        volume.n[<span class="cpp-number">0</span>]      = n1;<br>                        volume.vertex[<span class="cpp-number">1</span>] = triangles[t2].vertex[v2_next];<br>                        volume.n[<span class="cpp-number">1</span>]      = n2;<br>                        volume.vertex[<span class="cpp-number">2</span>] = triangles[t2].vertex[v2];<br>                        volume[i++].n[<span class="cpp-number">2</span>]      = n2;<br>                        volume.vertex[<span class="cpp-number">0</span>] = triangles[t2].vertex[v2]; <span class="cpp-comment">/* Triangle 2. */</span><br>                        volume.n[<span class="cpp-number">0</span>]      = n2;<br>                        volume.vertex[<span class="cpp-number">1</span>] = triangles[t1].vertex[v1_next];<br>                        volume.n[<span class="cpp-number">1</span>]      = n1;<br>                        volume.vertex[<span class="cpp-number">2</span>] = triangles[t1].vertex[v1];<br>                        volume[i++].n[<span class="cpp-number">2</span>]      = n1;<br><br>                        bEdgeUsed[t1 * <span class="cpp-number">3</span> + v1] = <span class="cpp-keyword">true</span>;<br>                        bEdgeUsed[t2 * <span class="cpp-number">3</span> + v2] = <span class="cpp-keyword">true</span>;<br>                        <span class="cpp-keyword">break</span>;<br>                     }<br>                  }<br>               }<br>            }<br>         }<br>      }                 <br>   }<br><br>   <span class="cpp-keyword">delete</span> [] bEdgeUsed;<br><br>   nVolume = i;<br><br>   <span class="cpp-keyword">return</span>;<br>}<br><br></pre></div><!–ENDSCRIPT–> 
Quote:Original post by x452Alba
Quote:Original post by _swx_
And it's much better to use templates for creating vector classes, then you can have vectors of arbitrary size and only need to code a single class :)


Yes, more flexible, but you'd lose all the hardcoded and highly optimized SSE code (and that's the whole point).


So use template specialization for the functions that you can produce better code for than the compiler. And IMO it's kinda ugly to overload the == operator for other things than exact equality, especially if you dont use the proper epsilon value. Use a function that accepts a relative error value to compare vectors for near equality.

And if you want a really fast vector class you should be using expression templates.
Quote:Original post by x452Alba
Yes, more flexible, but you'd lose all the hardcoded and highly optimized SSE code (and that's the whole point).


You don't really lose any of the custom hardcoded optimizations with templates:

template< typename BaseType >class TypeTraits{public:...    typedef BaseType Type;    typedef TypeTraits< Type > MyType;...public:...    static void Add (Type & sum, Type a, Type b);    static void AddArray (Type * sum, Type const * a, Type const * b, unsigned int size);...};template< typename BaseType, unsigned int Dimension, typename BaseTypeTraits = TypeTraits< BaseType > >class Vector{public:...    typedef BaseType Type;    typedef BaseTypeTraits Traits;    typedef Vector< Type, Dimension, Traits > MyType;...private:...    Type m_fields[Dimension];...public:...    friend MyType const operator+(MyType const & a, MyType const & b)    {        MyType result;        Traits::AddArray(result.m_fields, a.m_fields, b.m_fields, Dimension);        return result;    }...};

there would be a default TypeTraits template that could be used with any type. After you get some profiler results, you can specialize the TypeTraits template and define the hard coded, highly optimized SSE code. Though, I would leave the optimizations for the compiler - it is pretty awesome what it can do! :)
Quote:Original post by _swx_
the epsilon code is wrong :P

Sorry, but your code does work well for numbers between 0 and 1, or negative numbers. FLT_EPSILON yields the smallest number such that 1.0f + FLT_EPSILON != 1.0f.
Arguing on the internet is like running in the Special Olympics: Even if you win, you're still retarded.[How To Ask Questions|STL Programmer's Guide|Bjarne FAQ|C++ FAQ Lite|C++ Reference|MSDN]
Quote:Original post by stanlo
I'm a lazy bum so I'll ask: have you benchmarked this against say, the DirectX vectors?


no... i dont use directx... but if somebody tires it id like to know the results
Quote:Original post by Dragon_Strike
Quote:Original post by stanlo
I'm a lazy bum so I'll ask: have you benchmarked this against say, the DirectX vectors?


no... i dont use directx... but if somebody tires it id like to know the results


How about against a similar implimentation that doesn't use SSE?
Quote:Original post by stanlo
Quote:Original post by Dragon_Strike
Quote:Original post by stanlo
I'm a lazy bum so I'll ask: have you benchmarked this against say, the DirectX vectors?


no... i dont use directx... but if somebody tires it id like to know the results


How about against a similar implimentation that doesn't use SSE?


well i got it to about 30-40% faster...
You should use intrinsics (if supported by your compiler) instead of straight assembly. That way you get the generated instructions you want without having to worry about instruction scheduling, register allocation and loading/storing from/to memory, all of which will cause speed penalties in your code when calling several inlined SSE-using methods in a row (which is a fairly common case with core math classes).

Regards.

This topic is closed to new replies.

Advertisement