Sign in to follow this  
kovacsp

HLSL: sometimes float4, sometimes float3

Recommended Posts

Hi, I'm having a silly HLSL question again :) Some people use float4 types for everything, like normals, light and view directions, etc.. Others use only float3. I cannot see any reason for storing a fourth float in many places. A direction vector, meaning anything is good for me if it has 3 components only. But some intrinsic functions force me to use float4 data. Why? What to load in the fourth component if I have only 3? Casting from float3 to float4 and back always looks ugly. And why do I have to output a float4 Position in the end of my vertex shader? kp

Share this post


Link to post
Share on other sites
I use float4 pretty much everywhere. This is primarily because my vertex buffers all have float4s for position, normal, tangent. I do this because I can use 64-bit aligned SIMD instructions to copy data from the geometry buffers in my scene graph into the vertex buffers as fast as possible. It also allows me to use the SSE min/max instructions to compute bounding volumes of dynamic geometry extremely quickly. Also, it makes doing software-based SIMD skinning extremely fast because I can stream vertices into the tranform code. No way I could do these things if I were storing data in 3-float vectors.

I have thought of packing the values into float3s for static geometry simply to conserve memory, but so far haven't had the need to do this.

Share this post


Link to post
Share on other sites
hi all,

I know homogenous coordinates more or less. but what is confusing me: why do I need these for simple direction vectors? I can visually imagine my 3-component vectors, and if using one more coordinate, over which I do not really have control, I have the feeling that something will go wrong...
eg. if I do a float3 stream of positions and normals in RenderMonkey, but use float4 in my input structure, it will be filled in with "something". Then, I carry this something all over my shader.
I know the reason for float4x4 matrices, that's ok. But I could simply live with float3 vectors everywhere. And this is not possible in most cases (or I have to do many-many casts).

anyway.. yes, afaik the GPU will do it in one cycle either way.

kp

Share this post


Link to post
Share on other sites
It's generally best to use the smallest type that holds your data.

Using float4 for everything makes it harder for the compiler/driver to duel issue instructions.

Modern GPUs can do a float3 and a scaler or two float2 ops together. If all your stuff is float4 even if you are only using 2 or 3 components then you are possibly losing out on some optimizations that the compiler/driver can do.

You should also specify the swizzles in every case. (e.g. v3Result.x = v3Pos.x * fXOffset instead of v3Result = v3Pos.x * fXOffset)

Share this post


Link to post
Share on other sites
Quote:
Original post by reltham
It's generally best to use the smallest type that holds your data.

Sometimes, it is actually helpful to increase data size, though. One such case is krum's implementation. Another similar situation is padding vertex structures to increase GPU performance. The optimal size is 32 or 64 bytes.

Share this post


Link to post
Share on other sites
So, making your vertex size LARGER will make the app go FASTER? Or just the same speed? Sounds weird :).

Share this post


Link to post
Share on other sites
My post above was in reference to HLSL code (shaders on the GPU).

The Vertex size thing is a CPU->BUS->GPU thing, and it really depends on how you access the vertices and if it's static (on the card) or dynamic (in AGP/system memory).

From my experience, if you access your vertex data sequentially (or mostly so) then the size isn't going to matter as much (other than smaller is generally better). Also, if the data is static (which is usually in card memory) then, again, the size is less important.

If you are using dynamic vertex buffers with non-sequential reads then generally you want your vertex size to be a multiple of 32 bytes and be 32 byte aligned (which I believe D3D ensures). This is because of the cache line size and the AGP read/write size.

So, yes, if your vertex is smaller than 32 bytes, then you might see a performance gain if you pad it out to 32 bytes.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this