Started by Jul 11 2006 12:38 AM

,
6 replies to this topic

Posted 11 July 2006 - 12:38 AM

Hello,
I want to use quaternions to represent my rotation of an object which is rendered using instancing, because a quaternion only takes 4 floats, while a rotation matrix uses 9. But what's the fastest way of multiplying a float3 (position) with a quaternion which represent the model rotation.
Regards,
Kenzo

Posted 11 July 2006 - 12:46 AM

Whatever it is it'll probably be slower than doing the vector-matrix multiply, GPUs are *really* good at that [smile]

Posted 11 July 2006 - 12:51 AM

yip .. but for the moment I am setting my per instance data as const values for the vertex shader and copying upto 21 floats per instance seems rather slow to me. (and I think I only have a 1024 constants so when I use 21 floats I can only render 48 instances with 1 drawcall, and if I use quaternions I "only have to use" 16 floats which gives me 64+ instances)

Regards,

Kenzo

Regards,

Kenzo

Posted 11 July 2006 - 01:12 AM

Rotating using Matrices is 6 additions and 9 multiplications. Rotating using quaternions has the following costs:

Generic quaternion multiplies: 24 add, 32 mul

specialized quaternion multiplies (best case): 17 add, 24 mul

convert to matrix, then transform: 18 add, 21 mul (including conversion cost of 12 add and 12 mul)

As you can see, the best solution is probably to convert the quaternion to a matrix and use that to rotate the vector. It's obviously slower than using a matrix directly, but as you say 4 floats is less than 9 (or 16 as the case may be)

Edit: Note: be sure to check out the cost of converting from matrix to quaternion in the pdf below.. It could get pretty expensive esp. since you have to do it on the CPU.

source: site: Geometric tools. link: Rotation Representations and Performance Issues

[Edited by - frostburn on July 11, 2006 7:12:20 AM]

Generic quaternion multiplies: 24 add, 32 mul

specialized quaternion multiplies (best case): 17 add, 24 mul

convert to matrix, then transform: 18 add, 21 mul (including conversion cost of 12 add and 12 mul)

As you can see, the best solution is probably to convert the quaternion to a matrix and use that to rotate the vector. It's obviously slower than using a matrix directly, but as you say 4 floats is less than 9 (or 16 as the case may be)

Edit: Note: be sure to check out the cost of converting from matrix to quaternion in the pdf below.. It could get pretty expensive esp. since you have to do it on the CPU.

source: site: Geometric tools. link: Rotation Representations and Performance Issues

[Edited by - frostburn on July 11, 2006 7:12:20 AM]

Posted 11 July 2006 - 01:55 AM

He's wanting to send the quaternions to reduce the number of constants sent to the GPU, so converting to a matrix on the CPU won't save him anything. Converting to a matrix on the GPU would be more expensive than just doing the quaternion*vector*inv(quaternion) required for the rotation. I think the cheapest way would be to do it like this:

So you've got 2 cross-products, 2 dot-products, 3 vector-scaler multiplies, 3 vector additions~~and 1 vector-scaler division~~.

EDIT: I *think* I did the working correctly for the q*v*inv(q), but no guarenttees [smile]

EDIT2: dot(q,q) should be 1

[Edited by - joanusdmentia on July 11, 2006 8:55:00 AM]

let q = rotation quaternion = (w,x) where x is a 3d-vector of the axis components

let v = 3d-vector to rotate

let a = cross(x,v) + w*v

v' = (cross(a,-x) + dot(x,v)*x + w*a)~~/ dot(q,q)~~

So you've got 2 cross-products, 2 dot-products, 3 vector-scaler multiplies, 3 vector additions

EDIT: I *think* I did the working correctly for the q*v*inv(q), but no guarenttees [smile]

EDIT2: dot(q,q) should be 1

[Edited by - joanusdmentia on July 11, 2006 8:55:00 AM]

Posted 11 July 2006 - 02:00 AM

AFAIK a single quaternion-vector product could be done on a CPU sequentially w/ 16 ADDs and 15 MULs, due to an algorithm using 2 cross products, a scalar addition, a vector scale, and 3 vector additions. On a GPU the vector scale and vector additions are single OPs, and I assume the cross product is provided by the GPU as well. So it may happen that a total of 8 OPs are sufficient on a GPU (but I'm not sure).

However, a quaternion-matrix conversion is needed to be done once for all vertices but only if the same matrix could be applied to all vertices. AFAIK this prohibits the conversion to be done on the GPU, since it isn't possible to pass parameters from one shader run to the next.

So, if the memory consumption and bandwidth wouldn't be a limitation, in fact this would become a question of where the break-through is w.r.t. the count of vertices to be transformed.

EDIT: joanusdmentia came to the same conclusion but a bit faster :)

However, a quaternion-matrix conversion is needed to be done once for all vertices but only if the same matrix could be applied to all vertices. AFAIK this prohibits the conversion to be done on the GPU, since it isn't possible to pass parameters from one shader run to the next.

So, if the memory consumption and bandwidth wouldn't be a limitation, in fact this would become a question of where the break-through is w.r.t. the count of vertices to be transformed.

EDIT: joanusdmentia came to the same conclusion but a bit faster :)

Posted 11 July 2006 - 03:29 AM

thanx for the help. So in the end it isn't so expensive to do the rotation then using a quaternion (a little bit more than 2 times more). It's for object of maximum 24 vertices so this shouldn't be the bottle neck then.

Gonna try it out later!

Regards,

Kenzo

Gonna try it out later!

Regards,

Kenzo