HLSL: sometimes float4, sometimes float3

Started by
7 comments, last by reltham 18 years, 2 months ago
Advertisement
kovacsp
Author
306
August 02, 2005 08:10 AM
Hi, I'm having a silly HLSL question again :) Some people use float4 types for everything, like normals, light and view directions, etc.. Others use only float3. I cannot see any reason for storing a fourth float in many places. A direction vector, meaning anything is good for me if it has 3 components only. But some intrinsic functions force me to use float4 data. Why? What to load in the fourth component if I have only 3? Casting from float3 to float4 and back always looks ugly. And why do I have to output a float4 Position in the end of my vertex shader? kp
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!
Illco
August 02, 2005 08:35 AM
Homogenous coordinates. I could go out of my way explaining some but here it is said just well enough in 2D; you can think up the 3D consequences I suppose.

Greetz,

Illco
krum
255
August 02, 2005 08:45 AM
I use float4 pretty much everywhere. This is primarily because my vertex buffers all have float4s for position, normal, tangent. I do this because I can use 64-bit aligned SIMD instructions to copy data from the geometry buffers in my scene graph into the vertex buffers as fast as possible. It also allows me to use the SSE min/max instructions to compute bounding volumes of dynamic geometry extremely quickly. Also, it makes doing software-based SIMD skinning extremely fast because I can stream vertices into the tranform code. No way I could do these things if I were storing data in 3-float vectors.

I have thought of packing the values into float3s for static geometry simply to conserve memory, but so far haven't had the need to do this.
DrGUI
August 02, 2005 09:02 AM
Doesn't the GPU do all operations on float4s in one cycle anyway?
kovacsp
Author
306
August 02, 2005 09:24 AM
hi all,

I know homogenous coordinates more or less. but what is confusing me: why do I need these for simple direction vectors? I can visually imagine my 3-component vectors, and if using one more coordinate, over which I do not really have control, I have the feeling that something will go wrong...
eg. if I do a float3 stream of positions and normals in RenderMonkey, but use float4 in my input structure, it will be filled in with "something". Then, I carry this something all over my shader.
I know the reason for float4x4 matrices, that's ok. But I could simply live with float3 vectors everywhere. And this is not possible in most cases (or I have to do many-many casts).

anyway.. yes, afaik the GPU will do it in one cycle either way.

kp
------------------------------------------------------------Neo, the Matrix should be 16-byte aligned for better performance!
reltham
August 02, 2005 12:15 PM
It's generally best to use the smallest type that holds your data.

Using float4 for everything makes it harder for the compiler/driver to duel issue instructions.

Modern GPUs can do a float3 and a scaler or two float2 ops together. If all your stuff is float4 even if you are only using 2 or 3 components then you are possibly losing out on some optimizations that the compiler/driver can do.

You should also specify the swizzles in every case. (e.g. v3Result.x = v3Pos.x * fXOffset instead of v3Result = v3Pos.x * fXOffset)
circlesoft
August 02, 2005 01:15 PM
Quote:Original post by reltham
It's generally best to use the smallest type that holds your data.

Sometimes, it is actually helpful to increase data size, though. One such case is krum's implementation. Another similar situation is padding vertex structures to increase GPU performance. The optimal size is 32 or 64 bytes.
Dustin Franklin ( circlesoft :: KBase :: Mystic GD :: ApolloNL )
sirob
1,181
August 02, 2005 05:43 PM
So, making your vertex size LARGER will make the app go FASTER? Or just the same speed? Sounds weird :).
Sirob Yes.» - status: Work-O-Rama.
reltham
August 02, 2005 10:26 PM
My post above was in reference to HLSL code (shaders on the GPU).

The Vertex size thing is a CPU->BUS->GPU thing, and it really depends on how you access the vertices and if it's static (on the card) or dynamic (in AGP/system memory).

From my experience, if you access your vertex data sequentially (or mostly so) then the size isn't going to matter as much (other than smaller is generally better). Also, if the data is static (which is usually in card memory) then, again, the size is less important.

If you are using dynamic vertex buffers with non-sequential reads then generally you want your vertex size to be a multiple of 32 bytes and be 32 byte aligned (which I believe D3D ensures). This is because of the cache line size and the AGP read/write size.

So, yes, if your vertex is smaller than 32 bytes, then you might see a performance gain if you pad it out to 32 bytes.

This topic is closed to new replies.

Advertisement