deathkrush

Members
  • Content count

    626
  • Joined

  • Last visited

Community Reputation

350 Neutral

About deathkrush

  • Rank
    Advanced Member
  1. changing screen color

    Are you familiar with Alpha Blending? You can do a lot of cool effects by rendering a white full screen quad and playing with blend equations. For example, to invert colors you can set the following render states: AlphaBlendEnable = True BlendOp = Add SrcBlend = InvDestColor DestBlend = Zero
  2. Maybe that card doesn't even support 3D textures in hardware. That would explain why CPU usage goes up, since OpenGL will have to emulate that in software. acp693, try implementing your effect using 2D textures. Instead of using a 3D 16x16x16 texture you could pack the same data into a 2D 256x16 texture and then just play with UVs to get the values you want.
  3. Try this: return float4(saturate( (texel.xyz * ( AmbientLightColor + DiffuseColor * LightDiffuseColor * diffuseLighting ) * 0.6) + // Use light diffuse vector as intensity multiplier (SpecularColor * LightSpecularColor * specLighting * 0.5) // Use light specular vector as intensity multiplier ), texel.w);
  4. Detecting hardware

    Checking for ATI or NVIDIA is a bad way to go, IMHO. Newer ATI cards support the NVIDIA shadow mapping method and it's preferred over the older fetch4 method. Also, there is a possibility that future cards will drop the extension in favor of something else, which happened before with the NVIDIA RAWZ extension. This site documents all the various hacks that vendors exposed in Direct3D 9: http://aras-p.info/texts/D3D9GPUHacks.html. Before using an extension, first check if it's supported with CheckDeviceFormat. There are some code snippets that show how to do that, check out the links at the bottom of the page.
  5. How would I display only the edges of a mesh?

    For wireframe rendering you can set D3DRS_FILLMODE render state to D3DFILL_WIREFRAME. Is this what you meant by "only the edges"?
  6. Proper UINT32 colour arithmetic

    Whoops, I misinterpreted the funky notation in the book. This should work: unsigned int MultibyteAdd( unsigned int x, unsigned int y ) { unsigned int s = ( x & 0x7f7f7f7f ) + ( y & 0x7f7f7f7f ); s = ( ( x ^ y ) & 0x80808080 ) ^ s; return s; }
  7. Proper UINT32 colour arithmetic

    Here is multi-byte addition (from the Hacker's Delight book) unsigned int MultibyteAdd( unsigned int a, unsigned int b ) { unsigned int s = ( x & 0x7f7f7f7f ) + ( y & 0x7f7f7f7f ); s = ( x + y ) & 0x80808080 + s; return s; } Division and multiplication is more complicated but for the special cases of dividing or multiplying by a power of two you can use shift operators. The problem is that when bits overflow from one element to another you have a bug. The safest thing is to convert each component to float, do your math and convert it back to a packed integer. This will work safely for all arithmetic operations.
  8. Here is the complete shader written in HLSL, it compiles into assembly that looks close to the one you posted. Follow the math that reconstructs local space position from input UBYTE4 position and tangent. #pragma pack_matrix( row_major ) struct VertexOut { float4 Position : POSITION; float4 W : TEXCOORD; }; struct VertexIn { float4 Position : POSITION; float4 Tangent : TANGENT; int4 Blendindices : BLENDINDICES; float4 Blendweight : BLENDWEIGHT; }; float3x4 Bones[64] : register( c60 ); float4 c56 : register( c56 ); float4 c5 : register( c5 ); float4 c9 : register( c9 ); float4x4 WorldViewProjection : register( c0 ); VertexOut main( VertexIn IN ) { VertexOut OUT = (VertexOut)0; float4 r0, r1; r0.x = IN.Tangent.y >= 128 ? 1 : 0; // sge r0.x, v1.y, c4.x r0.x = IN.Tangent.y - r0.x * 128; // mad r0.x, r0.x, -c4.x, v1.y r0.yzw = float3(128, 1, 256); // mov r0.yzw, c4.xyzy r1 = r0.yzyz * c56.w; // mul r1, r0.yzyz, c56.w r0.x *= r1.w; // mul r0.x, r0.x, r1.w r0.z = IN.Tangent.x * r1.z + r0.x; // mad r0.z, v1.x, r1.z, r0.x r1 = r1 * IN.Position; // mul r1, r1, v0 r0.xy = r1.yw + r1.xz; // add r0.xy, r1.ywzw, r1.xzzw r0.xyz += c56.xyz; // add r0.xyz, r0, c56 float4 pos0 = r0; int4 indices = IN.Blendindices; float3x4 blendMatrix = IN.Blendweight.x * Bones[indices.x]; blendMatrix += IN.Blendweight.y * Bones[indices.y]; blendMatrix += IN.Blendweight.z * Bones[indices.z]; blendMatrix += IN.Blendweight.w * Bones[indices.w]; float4 pos1; pos1.xyz = mul( blendMatrix, pos0 ); pos1.xyz -= c9.xyz; pos1.w = 1; OUT.Position = mul( WorldViewProjection, pos1 ); OUT.W.xyzw = OUT.Position.w * c5.w; return OUT; }
  9. OK, let's see if I can write pieces of this shader in HLSL. Might not be totally correct but you should get the gist: vs_3_0 def c4, 128, 1, 256, 3 dcl_position v0 dcl_tangent v1 dcl_blendindices v2 dcl_blendweight v3 dcl_position o0 dcl_texcoord o1 sge r0.x, v1.y, c4.x // float offset = input.tangent.y >= 128 ? 1.0f : 0.0f; mad r0.x, r0.x, -c4.x, v1.y // offset = input.tangent.y - offset * 128; mov r0.yzw, c4.xyzy // float2 scale0 = float2(128, 1); mul r1, r0.yzyz, c56.w // float4 scale1 = scale0.xyxy * c56.w; mul r0.x, r0.x, r1.w // offset *= scale1.w; mad r0.z, v1.x, r1.z, r0.x // float posZ = input.tangent.x * scale1.z + offset; mul r1, r1, v0 // float4 pos0 = input.position * scale1; add r0.xy, r1.ywzw, r1.xzzw // pos0.xy = pos0.yw + pos0.xz; pos0.z = posZ; pos0.w = 256; add r0.xyz, r0, c56 // pos0.xyz += c56.xyz; mul r1, c4.w, v2 // float4 indices0 = 3 * input.blendindices; mova a0, r1 mul r1, v3.y, c60[a0.y] // float4 row0 = input.blendweights.y * c60[indices.y]; mad r1, v3.x, c60[a0.x], r1 // row0 += input.blendweights.x * c60[indices.x]; mad r1, v3.z, c60[a0.z], r1 // row0 += input.blendweights.z * c60[indices.z]; mad r1, v3.w, c60[a0.w], r1 // row0 += input.blendweights.w * c60[indices.w]; dp4 r1.x, r1, r0 // float4 pos1; pos1.x = dot( row0, pos0 ); mul r2, v3.y, c61[a0.y] // float4 row1 = input.blendweights.y * c61[indices.y]; mad r2, v3.x, c61[a0.x], r2 // row1 += input.blendweights.x * c61[indices.x]; mad r2, v3.z, c61[a0.z], r2 // row1 += input.blendweights.z * c61[indices.z]; mad r2, v3.w, c61[a0.w], r2 // row1 += input.blendweights.w * c61[indices.w]; dp4 r1.y, r2, r0 // pos1.y = dot( row1, pos0 ); mul r2, v3.y, c62[a0.y] // float4 row2 = input.blendweights.y * c62[indices.y]; mad r2, v3.x, c62[a0.x], r2 // row2 += input.blendweights.x * c62[indices.x]; mad r2, v3.z, c62[a0.z], r2 // row2 += input.blendweights.z * c62[indices.z]; mad r2, v3.w, c62[a0.w], r2 // row2 += input.blendweights.w * c62[indices.w]; dp4 r1.z, r2, r0 // pos1.z = dot( row2, pos0 ); add r0.xyz, r1, -c9 // pos1.xyz -= c9.xyz; mov r0.w, c4.y // pos1.w = 1; dp4 o0.x, c0, r0 // output.position.x = dot( c0, pos1 ); dp4 o0.y, c1, r0 // output.position.y = dot( c1, pos1 ); dp4 o0.z, c2, r0 // output.position.z = dot( c2, pos1 ); dp4 r0.x, c3, r0 // output.position.w = dot( c3, pos1 ); mul o1, r0.x, c5.w mov o0.w, r0.x
  10. Yes, UBYTE4 will be converted to a float4 in the shader. The conversion is a simple cast to float, which is what you are doing in your unpacking code. The shader you posted does some funky math to convert this encoded value to a model space position and eventually screen space. Try stuffing an identity matrix into c0-c3, capture a frame with PIX and look at how positions get converted to model space. Also, you can try PIX shader debugger.
  11. Looks like positions are encoded is some kind of fixed point format. W may be a scale or offset or both. Take a look in the vertex shader code to see how positions get reconstructed from UBYTE4.
  12. OpenGL Emulating Tex Gen

    DirectX doesn't force its vertices to be packed. Are you using vertex declarations or FVF? Vertex declarations allow you to split your data into multiple streams, just like OpenGL.
  13. If you want to do it in-place, without a temporary render target, you can use the stencil buffer. Just do two passes: one to mark pixels in the stencil buffer and another to overwrite those pixels. Like others said, there isn't a way to read pixels directly from the current render target in DirectX. It's possible on some embedded platforms that are not restricted by high level APIs, but even then it doesn't work most of the time because you get random flickering.
  14. Is SSE slower than normal code?

    Quote:Original post by cdoty Switching between floating point and MMX/SSE code could potentially cause slowdowns, as the MMX/SSE registers are mapped on top of the normal floating point registers. MMX yes, but SSE was designed to overcome inefficiencies of MMX and doesn't have this problem.
  15. Just noticed that SetMatrixArray sets 4x4 matrices (4 vector constants), but the SetVertexShaderConstantF snippet sets 4x3 matrices (3 vector constants). That's your problem, you have to set 3 constants at a time for each matrix, not 4.