rotation using quaternions in a shader
Hello,
I want to use quaternions to represent my rotation of an object which is rendered using instancing, because a quaternion only takes 4 floats, while a rotation matrix uses 9. But what's the fastest way of multiplying a float3 (position) with a quaternion which represent the model rotation.
Regards,
Kenzo
Whatever it is it'll probably be slower than doing the vector-matrix multiply, GPUs are *really* good at that [smile]
yip .. but for the moment I am setting my per instance data as const values for the vertex shader and copying upto 21 floats per instance seems rather slow to me. (and I think I only have a 1024 constants so when I use 21 floats I can only render 48 instances with 1 drawcall, and if I use quaternions I "only have to use" 16 floats which gives me 64+ instances)
Regards,
Kenzo
Regards,
Kenzo
Rotating using Matrices is 6 additions and 9 multiplications. Rotating using quaternions has the following costs:
Generic quaternion multiplies: 24 add, 32 mul
specialized quaternion multiplies (best case): 17 add, 24 mul
convert to matrix, then transform: 18 add, 21 mul (including conversion cost of 12 add and 12 mul)
As you can see, the best solution is probably to convert the quaternion to a matrix and use that to rotate the vector. It's obviously slower than using a matrix directly, but as you say 4 floats is less than 9 (or 16 as the case may be)
Edit: Note: be sure to check out the cost of converting from matrix to quaternion in the pdf below.. It could get pretty expensive esp. since you have to do it on the CPU.
source: site: Geometric tools. link: Rotation Representations and Performance Issues
[Edited by - frostburn on July 11, 2006 7:12:20 AM]
Generic quaternion multiplies: 24 add, 32 mul
specialized quaternion multiplies (best case): 17 add, 24 mul
convert to matrix, then transform: 18 add, 21 mul (including conversion cost of 12 add and 12 mul)
As you can see, the best solution is probably to convert the quaternion to a matrix and use that to rotate the vector. It's obviously slower than using a matrix directly, but as you say 4 floats is less than 9 (or 16 as the case may be)
Edit: Note: be sure to check out the cost of converting from matrix to quaternion in the pdf below.. It could get pretty expensive esp. since you have to do it on the CPU.
source: site: Geometric tools. link: Rotation Representations and Performance Issues
[Edited by - frostburn on July 11, 2006 7:12:20 AM]
He's wanting to send the quaternions to reduce the number of constants sent to the GPU, so converting to a matrix on the CPU won't save him anything. Converting to a matrix on the GPU would be more expensive than just doing the quaternion*vector*inv(quaternion) required for the rotation. I think the cheapest way would be to do it like this:
So you've got 2 cross-products, 2 dot-products, 3 vector-scaler multiplies, 3 vector additions and 1 vector-scaler division.
EDIT: I *think* I did the working correctly for the q*v*inv(q), but no guarenttees [smile]
EDIT2: dot(q,q) should be 1
[Edited by - joanusdmentia on July 11, 2006 8:55:00 AM]
let q = rotation quaternion = (w,x) where x is a 3d-vector of the axis componentslet v = 3d-vector to rotatelet a = cross(x,v) + w*vv' = (cross(a,-x) + dot(x,v)*x + w*a) / dot(q,q)
So you've got 2 cross-products, 2 dot-products, 3 vector-scaler multiplies, 3 vector additions and 1 vector-scaler division.
EDIT: I *think* I did the working correctly for the q*v*inv(q), but no guarenttees [smile]
EDIT2: dot(q,q) should be 1
[Edited by - joanusdmentia on July 11, 2006 8:55:00 AM]
AFAIK a single quaternion-vector product could be done on a CPU sequentially w/ 16 ADDs and 15 MULs, due to an algorithm using 2 cross products, a scalar addition, a vector scale, and 3 vector additions. On a GPU the vector scale and vector additions are single OPs, and I assume the cross product is provided by the GPU as well. So it may happen that a total of 8 OPs are sufficient on a GPU (but I'm not sure).
However, a quaternion-matrix conversion is needed to be done once for all vertices but only if the same matrix could be applied to all vertices. AFAIK this prohibits the conversion to be done on the GPU, since it isn't possible to pass parameters from one shader run to the next.
So, if the memory consumption and bandwidth wouldn't be a limitation, in fact this would become a question of where the break-through is w.r.t. the count of vertices to be transformed.
EDIT: joanusdmentia came to the same conclusion but a bit faster :)
However, a quaternion-matrix conversion is needed to be done once for all vertices but only if the same matrix could be applied to all vertices. AFAIK this prohibits the conversion to be done on the GPU, since it isn't possible to pass parameters from one shader run to the next.
So, if the memory consumption and bandwidth wouldn't be a limitation, in fact this would become a question of where the break-through is w.r.t. the count of vertices to be transformed.
EDIT: joanusdmentia came to the same conclusion but a bit faster :)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement