Object.Load equivalent in DX9

#1 Tispe   Members   -  Reputation: 1403


Posted 13 March 2012 - 07:01 AM


I am trying to implement Vertex Texture Fetching on SM3.0. My motivation is that I don't want to split up my 120 bone humanoid to get room for skinning matrices in the registers, and not complicating my vertices. I wish to Copy all 120 Bone Matrices into a texture and sample it in the vertex shader.

A DX10 sample uses this method:
// Read a matrix(3 texture reads) from a texture containing animation data
float4x4 loadBoneMatrix(uint3 animationData,float bone)
    // Calculate a UV for the bone for this vertex
    float4x4 rval = g_Identity;
    // animationData.x and .y are linear offsets.  Combine into a single linear offset.
    uint baseIndex = animationData.x + animationData.y;
    baseIndex += (4*bone);    // 4*bone is since each bone is 4 texels to form a float4x4
    // Now turn linear offset into 2D coords
    uint baseU = baseIndex%g_InstanceMatricesWidth;
    uint baseV = baseIndex/g_InstanceMatricesWidth;
    // Note that we assume the width of the texture(and just add texels) is an even multiple of the # of texels per bone,
    //	 otherwise we'd have to recalculate the V component per lookup
    float4 mat1 = g_txAnimations.Load( uint3(baseU,baseV,0));
    float4 mat2 = g_txAnimations.Load( uint3(baseU+1,baseV,0));
    float4 mat3 = g_txAnimations.Load( uint3(baseU+2,baseV,0));
    // only load 3 of the 4 values, and deocde the matrix from them.
    rval = decodeMatrix(float3x4(mat1,mat2,mat3));
    return rval;

How would this be implemented in SM3.0?

I have looked around and tex2D or tex2Dlod seems to do what I want, but I am not sure how I get Bone[59] when the UVs are between 0.0f and 1.0f. Would I have to make the UV value = 59/120? Would this sample the correct vector, POINT filtering ofc.


#2 Hodgman   Moderators   -  Reputation: 42176


Posted 13 March 2012 - 07:19 AM

Yes, tex2D/tex2Dlod are what you're looking for

In UV-space, a u value of 0 is the left hand side of the left-most texel. A u value of 1 is the right hand side of the right-most texel.
If your texture is 120 pixels wide, and you want to fetch the 60th texel (texel number 59), the u coordinate for the centre of that texel is:
(59.0 + 0.5) / 120.0.

If you used 59.0/120.0, you would be exactly between texel #58 and #59. This is probably defined to round-up to #59 (when using point filtering), but I never trust floating-point math that much, so I always make sure that I specify the centre of each texel as above.

#3 Tispe   Members   -  Reputation: 1403


Posted 13 March 2012 - 12:19 PM

Is it possible to make a 16 byte texel? How would I fetch that one?

#4 Hodgman   Moderators   -  Reputation: 42176


Posted 13 March 2012 - 07:01 PM

D3DFMT_A32B32G32R32F / DXGI_FORMAT_R32G32B32A32_FLOAT are 16-bytes per texel (i.e. float4).

You fetch every type of texture format in the same way.

#5 Tispe   Members   -  Reputation: 1403


Posted 14 March 2012 - 01:35 AM

Oh gosh, I ment 64 bytes :P

#6 Hodgman   Moderators   -  Reputation: 42176


Posted 14 March 2012 - 01:58 AM

yeah I thought you meant for a 16-float matrix, but wasn't sure Posted Image
if you need more than a float4, then use multiple texels per data-element. E.g. in your DX10 sample above, they're fetching their matrix from 3 texels.

One option is to use one row of pixels for each row of the matrix, so for a 4x4 matrix, you'd have to sample 4 pixels.
float u = (bone+0.5)*oneOverNumBones;
float4 row0 = tex2D( buffer, float2( u, 0.5/4.0 ) );
float4 row1 = tex2D( buffer, float2( u, 1.5/4.0 ) );
float4 row2 = tex2D( buffer, float2( u, 2.5/4.0 ) );
float4 row3 = tex2D( buffer, float2( u, 3.5/4.0 ) );
N.B. usually skinning matrices are reduced to 4x3 (or 3x4) sized matrices though to save space, or to a different representation such as an axis+angle float4 and a position+scale float4, or dual-quaternions, etc...

#7 Tispe   Members   -  Reputation: 1403


Posted 02 April 2012 - 01:34 PM

OK, I need some help. The CompileShader errorlog returns: error X4532: cannot map expression to vertex shader instruction set

If I comment out my LoadBoneMatrix() function calls in vs_main the shader works. Using SM3.0

// Global variables
float4x4 World;
float4x4 ViewProj;
sampler2D Tex1 : register(s1);
float2 OneOverVertTexWidthHeight;
float EntityID;
// Vertex shader input structure
struct VS_INPUT
float4 Position0 : POSITION0;
float4 Position1 : POSITION1;
float2 Texture0  : TEXCOORD0; //UVs
float2 BoneIndex : TEXCOORD1; //Bone Indices for Vertex Texture

// Vertex shader output structure
struct VS_OUTPUT
float4 Position   : POSITION;
float2 Texture	: TEXCOORD0;
float4x4 LoadBoneMatrix(float bone)
float TexRowV = (EntityID + 0.5) * OneOverVertTexWidthHeight.y;  //Row# = EntityID, +0.5 to land in the middle of pixel
float TexPixU0 = ((bone*4) + 0.5) * OneOverVertTexWidthHeight.x; //Each Bone/Matrix = 4 pixels
float TexPixU1 = ((bone*4) + 1.5) * OneOverVertTexWidthHeight.x;
float TexPixU2 = ((bone*4) + 2.5) * OneOverVertTexWidthHeight.x;
float TexPixU3 = ((bone*4) + 3.5) * OneOverVertTexWidthHeight.x;
float4x4 rval = float4x4(tex2D(Tex1,float2(TexPixU0, TexRowV)),
		tex2D(Tex1,float2(TexPixU1, TexRowV)),
		tex2D(Tex1,float2(TexPixU2, TexRowV)),
		tex2D(Tex1,float2(TexPixU3, TexRowV)));
return rval;
// Name: Simple Vertex Shader
// Type: Vertex shader
// Desc: Vertex transformation and texture coord pass-through
VS_OUTPUT vs_main( in VS_INPUT In )
VS_OUTPUT Out;					  //create an output vertex
float4 WeightedPosition = 0;
float4x4 Bone1 = 0;
float4x4 Bone2 = 0;
float4x4 WorldViewProj = mul(World, ViewProj);

if(In.BoneIndex.x == In.BoneIndex.y)   //1 Bone
  Bone1 = LoadBoneMatrix(In.BoneIndex.x);  //Fetch the first Bone from Vertex Texture
  WeightedPosition = mul(In.Position0, Bone1);
else		   //2 Bones
  Bone1 = LoadBoneMatrix(In.BoneIndex.x);  //Fetch the first Bone from Vertex Texture
  Bone2 = LoadBoneMatrix(In.BoneIndex.y);  //Fetch the second Bone from Vertex Texture
  WeightedPosition = mul(In.Position0, Bone1) + mul(In.Position1, Bone2);
Out.Position = mul(WeightedPosition, WorldViewProj); //apply vertex transformation
Out.Texture  = In.Texture0;		//copy original texcoords
return Out;						 //return output vertex

#8 MJP   Moderators   -  Reputation: 14499


Posted 02 April 2012 - 03:11 PM

You have to use tex2Dlod or tex2Dgrad in a vertex shader to sample textures. tex2D, like Sample, automatically determines the mip level by calculating the gradients of the texture coordinate in screenspace using the neighboring pixels in the same 2x2 quad. Since you don't have 2x2 quads in a vertex shader you can't calculate gradients this way, consequently you have to either supply the gradients or the mip level manually.

#9 Tispe   Members   -  Reputation: 1403


Posted 03 April 2012 - 12:41 AM

Ok, follow up.

tex2D(s, t), where t is a float2
tex2Dlod(s, t), where t is float4. *The mipmap LOD is specified in t.w

What is t.z for, and if there are no mipmaps should t.w be 0 or 1?

#10 MJP   Moderators   -  Reputation: 14499


Posted 03 April 2012 - 02:33 AM

As far as I know z is unused. It doesn't matter what you pass if there are no mipmaps, but 0 corresponds to the top mip level so you can use that if you'd like.

#11 Tispe   Members   -  Reputation: 1403


Posted 05 April 2012 - 12:35 PM

As expected nothing is showing atm. Which means the calculations are probably doing something wrong.

I opended in PIX and somehow I think the vertex shader is not beeing used.

// Generated by Microsoft (R) HLSL Shader Compiler 9.27.952.3022
// Parameters:
//   float4x4 ViewProj;
//   float4x4 World;
// Registers:
//   Name		 Reg   Size
//   ------------ ----- ----
//   World	    c0	   4
//   ViewProj	 c4	   4
    dcl_position v0
    dcl_texcoord v1
    dcl_position o0
    dcl_texcoord o1.xy
    mov r0, c1
    mul r1, r0, c4.y
    mov r2, c0
    mad r1, r2, c4.x, r1
    mov r3, c2
    mad r1, r3, c4.z, r1
    mov r4, c3
    mad r1, r4, c4.w, r1
    dp4 o0.x, v0, r1
    mul r1, r0, c5.y
    mad r1, r2, c5.x, r1
    mad r1, r3, c5.z, r1
    mad r1, r4, c5.w, r1
    dp4 o0.y, v0, r1
    mul r1, r0, c6.y
    mad r1, r2, c6.x, r1
    mad r1, r3, c6.z, r1
    mad r1, r4, c6.w, r1
    dp4 o0.z, v0, r1
    mul r0, r0, c7.y
    mad r0, r2, c7.x, r0
    mad r0, r3, c7.z, r0
    mad r0, r4, c7.w, r0
    dp4 o0.w, v0, r0
    mov o1.xy, v1
// approximately 25 instruction slots used

Alot of my constant table is not shown here. I don't know how to get PIX to show the HLSL not only disassembly. It did a clean solution and Rebuild All. Nothing. What could be wrong?

#12 MJP   Moderators   -  Reputation: 14499


Posted 05 April 2012 - 02:54 PM

You need to compile the shader with the DEBUG flag, and you'll probably also want to disable optimizations.

That shader isn't sampling a texture, so you must not be using the result of your texture fetch since it got optimized away.

