Geometry shader-generated camera-aligned particles seemingly lacking Z writing

Eric Nevala · 2014-09-30T21:09:58

I had this problem last week and I was struggling to figure out what exactly was going on. Then I found this super awesome article by Shawn Hargreaves: http://blogs.msdn.com/b/shawnhar/archive/2009/02/18/depth-sorting-alpha-blended-objects.aspx He pretty much explains in perfect detail what the problem is and provides some solutions. IF you are using alpha blending, the "best" solution is to sort your objects based on their distance from the camera and draw them from back to front. I too decided to write my billboards into HLSL code. Unfortunately since I'm using XNA, I only have the vertex shader and pixel shader at my disposal, so I can't add extra verticies as you would be able to do in DX10+. This means that I just have to include the vertex positions in the quad which gets rendered instead of inferring the vertices from a point positition. In my implementation, I take advantage of instancing which allows me to send a huge batch of vertex instance data to the GPU once, and then I let the shader process the primitive data and draw updates, using just one draw call. Here is the HLSL code I'm using for DX9: float4x4 World; float4x4 View; float4x4 Projection; float3 CameraPosition; float3 CameraUp; float CurrentTime; //current time in seconds float4 Tint : COLOR0 = (float4)1; bool UseWorldTransforms; float4 Ambient : COLOR = (float4)1; //the ambient light color in the scene float3 LightDirection; //a vector3 for the directional light float4 LightColor : COLOR; //the directional light color const float Tau = 6.283185307179586476925286766559; //------- Texture Samplers -------- Texture g_Texture; sampler TextureSampler = sampler_state { texture = <g_Texture>; magfilter = LINEAR; minfilter = LINEAR; mipfilter=LINEAR; AddressU = wrap; AddressV = wrap;}; struct QuadTemplate { float3 Position : POSITION0; float2 TexCoord : TEXCOORD0; }; struct QuadInstance { float3 Position : POSITION1; float3 Velocity : POSITION2; float3 Normal : NORMAL0; float3 Up : NORMAL1; float3 ScaleRot : POSITION3; float3 EndScaleRot : POSITION4; float2 Time : POSITION5; float4 Color : COLOR0; float4 EndColor : COLOR1; }; struct LineSegmentInput { float3 Position : POSITION0; float4 StartColor : COLOR0; float4 EndColor : COLOR1; float3 Velocity : TEXCOORD0; float2 Time : TEXCOORD1; float2 UV : TEXCOORD2; }; struct VSOUT { float4 Position : POSITION0; //screen space coordinate of pixel float4 Color : COLOR; //vertex color float2 TexCoord : TEXCOORD0; //texture coordinate }; float3x3 CreateRotation(float myAngle, float3 rotAxis) { float c = cos(myAngle); float s = sin(myAngle); float3 u = rotAxis; return float3x3( c + (u.x*u.x)*(1-c), u.x*u.y*(1-c) - u.z*s, u.x*u.z * (1-c) + u.y*s, u.y*u.x*(1-c) + u.z * s, c + u.y*u.y*(1-c), u.y*u.z*(1-c) - u.x*s, u.z*u.x*(1-c) - u.y * s, u.z*u.y*(1-c)+u.x*s, c + u.z*u.z*(1-c) ); } //3D TEXTURED////////////////////////////////////////////////////////////////////////////////// VSOUT VS_3DTex(QuadTemplate input) { VSOUT output = (VSOUT)0; //float4 worldPosition = mul(input.Position, World); //float4 viewPosition = mul(worldPosition, View); //output.Position = mul(viewPosition, Projection); output.Position = (float4)0; output.Color = (float4)1; //output.Color.r = input.TextureCoord.x; //output.Color.g = input.TextureCoord.y; //output.TexCoord = input.TextureCoord; output.TexCoord = (float2)0; return output; } //3D Textured thick lines///////////////////////////////////////////////////////////////// VSOUT VS_3DTexturedLine(LineSegmentInput input) { VSOUT output = (VSOUT)0; float age = CurrentTime - input.Time.x; float lifeAmt = 0; //the life amount is a percentage between birth and death, if we're not -1 if(input.Time.y != -1.0f) { lifeAmt = saturate(age / instance.Time.y); } float4 pos = (float4)1; pos.xyz = input.Position + (input.Velocity * age); if(UseWorldTransforms == false) pos.xyz += CameraPosition; output.Position = mul(mul(pos, View), Projection); output.Color = lerp(input.StartColor, input.EndColor, lifeAmt); output.TexCoord = input.UV; return output; } //3D colored 0px lines//////////////////////////////////////////////////////////////////// VSOUT VS_3DLineSegment(LineSegmentInput input) { VSOUT output = (VSOUT)0; float age = CurrentTime - input.Time.x; float lifeAmt = 0; //the life amount is a percentage between birth and death, if we're not -1 if(input.Time.y != -1.0f) { lifeAmt = saturate(age / instance.Time.y); } float4 pos = (float4)1; pos.xyz = input.Position + (input.Velocity * age); if(UseWorldTransforms == false) pos.xyz += CameraPosition; output.Position = mul(mul(pos, View), Projection); output.Color = lerp(input.StartColor, input.EndColor, lifeAmt); return output; } //3D Textured Quads/////////////////////////////////////////////////////////////////////// VSOUT VS_3DQuadTex(QuadTemplate input, QuadInstance instance) { float age = CurrentTime - instance.Time.x; float lifeAmt = 0; //the life amount is a percentage between birth and death, if we're not -1 if(instance.Time.y != -1.0f) { lifeAmt = saturate(age / instance.Time.y); } float3 m_scale = lerp(instance.ScaleRot, instance.EndScaleRot, lifeAmt); //linear interpolate the scale values to get current scale float m_rotation = instance.ScaleRot.z + (instance.EndScaleRot.z * age); //current rotation is initial rotation + sum of rotational speed over time float3 m_center = instance.Position; //this is the transformed center position for the quad. m_center += (instance.Velocity * age); //TODO: Handle the case where the normal is set to (0,1,0) or (0,-1,0) //Note: this is done in the application, not the shader. float3 m_normal = instance.Normal; //the normal is going to be given to us and is fixed. //float3 m_up = float3(0,1,0); //the up vector is simply a cross of the left vector and normal vector float3 m_up = instance.Up; float3 m_left = cross(m_normal, m_up); //the left vector can be derived from the camera orientation and quad normal m_up = cross(m_left, m_normal); float3x3 m_rot = CreateRotation(-m_rotation, m_normal); //Create a rotation matrix around the object space normal axis by the given radian amount. //This rotation matrix must then be applied to the left and up vectors. m_left = mul(m_left, m_rot) * m_scale.x; //apply rotation and scale to the left vector m_up = mul(m_up, m_rot) * m_scale.y; //apply rotation and scale to the up vector //Since we have to orient our quad to always face the camera, we have to change the input position values based on the left and up vectors. //the left and up vectors are in untranslated space. We know the translation, so we just set the vertex position to be the translation added to //the rotated and scaled left/up vectors. float3 pos = (float)0; if(input.Position.x == -1 && input.Position.y == -1) //bottom left corner { pos = m_center + (m_left - m_up); } else if(input.Position.x == -1 && input.Position.y == 1) //top left corner { pos = m_center + (m_left + m_up); } else if(input.Position.x == 1 && input.Position.y == 1) //top right corner { pos = m_center - (m_left - m_up); } else //bottom right corner { pos = m_center - (m_left + m_up); } //Since we've already manually applied our world transformations, we can skip that matrix multiplication. //note that we HAVE to use a Vector4 for the world position because our view & projection matrices are 4x4. //the matrix multiplication function isn't smart enough to use a vector3. The "w" value must be 1. float4 worldPosition = 1.0f; worldPosition.xyz = pos; if(UseWorldTransforms == false) worldPosition.xyz += CameraPosition; VSOUT output; output.Position = mul(mul(worldPosition, View), Projection); output.Color = lerp(instance.Color, instance.EndColor, lifeAmt); output.TexCoord = input.TexCoord; return output; } //3D Textured point sprites/////////////////////////////////////////////////////////////////////// VSOUT VS_3DPointSpriteTex(QuadTemplate input, QuadInstance instance) { /* SUMMARY: A point sprite is a special type of quad which will always face the camera. The point sprite can be scaled and rotated around the camera-sprite axis (normal) by any arbitrary angle. Because of these special behaviors, we have to apply some special instructions beyond just multiplying a point by the world matrix. */ float age = CurrentTime - instance.Time.x; float lifeAmt = 0; //the life amount is a percentage between birth and death, if we're not -1 if(instance.Time.y != -1.0f) { lifeAmt = saturate(age / instance.Time.y); } float3 m_scale = lerp(instance.ScaleRot, instance.EndScaleRot, lifeAmt); //linear interpolate the scale values to get current scale float m_rotation = (instance.ScaleRot.z * Tau) + (instance.EndScaleRot.z * Tau * age); //current rotation is initial rotation + sum of rotational speed over time float3 m_center = instance.Position; //this is the transformed center position for the quad. m_center += (instance.Velocity * age); float3 m_normal = normalize(CameraPosition - m_center); //the normal is going to be dependent on the camera position and the center position float3 m_left = cross(m_normal, CameraUp); //the left vector can be derived from the camera orientation and quad normal float3 m_up = cross(m_left, m_normal); //the up vector is simply a cross of the left vector and normal vector float3x3 m_rot = CreateRotation(m_rotation, m_normal); //Create a rotation matrix around the object space normal axis by the given radian amount. //This rotation matrix must then be applied to the left and up vectors. m_left = mul(m_left, m_rot) * m_scale.x; //apply rotation and scale to the left vector m_up = mul(m_up, m_rot) * m_scale.y; //apply rotation and scale to the up vector //Since we have to orient our quad to always face the camera, we have to change the input position values based on the left and up vectors. //the left and up vectors are in untranslated space. We know the translation, so we just set the vertex position to be the translation added to //the rotated and scaled left/up vectors. float3 pos = (float)0; if(input.Position.x == -1 && input.Position.y == -1) //bottom left corner { pos = m_center + (m_left - m_up); } else if(input.Position.x == -1 && input.Position.y == 1) //top left corner { pos = m_center + (m_left + m_up); } else if(input.Position.x == 1 && input.Position.y == 1) //top right corner { pos = m_center - (m_left - m_up); } else //bottom right corner { pos = m_center - (m_left + m_up); } //Since we've already manually applied our world transformations, we can skip that matrix multiplication. //note that we HAVE to use a Vector4 for the world position because our view & projection matrices are 4x4. //the matrix multiplication function isn't smart enough to use a vector3. The "w" value must be 1. float4 worldPosition = 1.0f; worldPosition.xyz = pos; if(UseWorldTransforms == false) worldPosition.xyz += CameraPosition; VSOUT output; output.Position = mul(mul(worldPosition, View), Projection); output.Color = lerp(instance.Color, instance.EndColor, lifeAmt); output.TexCoord = input.TexCoord; return output; } //3D Textured Billboard/////////////////////////////////////////////////////////////////////// VSOUT VS_3DBillboardTex(QuadTemplate input, QuadInstance instance) { /* SUMMARY: A billboard is a special type of quad which will always face the camera, but is constrained along the y-axis. The billboard can be scaled and rotated around the camera-sprite axis (normal) by any arbitrary angle. Because of these special behaviors, we have to apply some special instructions beyond just multiplying a point by the world matrix. */ float age = CurrentTime - instance.Time.x; //total elapsed time since birth float lifeAmt = 0; //the age is a percentage between birth and death, if we're not -1 if(instance.Time.y != -1.0f) { lifeAmt = saturate(age / instance.Time.y); } float3 m_scale = lerp(instance.ScaleRot, instance.EndScaleRot, lifeAmt); //linear interpolate the scale values to get current scale float m_rotation = (instance.ScaleRot.z * Tau) + (instance.EndScaleRot.z * Tau * age); //current rotation is initial rotation + sum of rotational speed over time float3 m_center = instance.Position; //this is the transformed center position for the quad. m_center += (instance.Velocity * age); float3 m_normal = CameraPosition - m_center; //the normal is going to be dependent on the camera position and the center position m_normal.y = 0; m_normal = normalize(m_normal); float3 m_up = float3(0,1,0); //the up vector is simply the unit Y value float3 m_left = cross(m_normal, m_up); //the left vector can be derived from the camera orientation and quad normal float3x3 m_rot = CreateRotation(m_rotation, m_normal); //Create a rotation matrix around the object space normal axis by the given radian amount. //This rotation matrix must then be applied to the left and up vectors. m_left = mul(m_left, m_rot) * m_scale.x; //apply rotation and scale to the left vector m_up = mul(m_up, m_rot) * m_scale.y; //apply rotation and scale to the up vector //Since we have to orient our quad to always face the camera, we have to change the input position values based on the left and up vectors. //the left and up vectors are in untranslated space. We know the translation, so we just set the vertex position to be the translation added to //the rotated and scaled left/up vectors. float3 pos = (float)0; if(input.Position.x == -1 && input.Position.y == -1) //bottom left corner { pos = m_center + (m_left - m_up); } else if(input.Position.x == -1 && input.Position.y == 1) //top left corner { pos = m_center + (m_left + m_up); } else if(input.Position.x == 1 && input.Position.y == 1) //top right corner { pos = m_center - (m_left - m_up); } else //bottom right corner { pos = m_center - (m_left + m_up); } //Since we've already manually applied our world transformations, we can skip that matrix multiplication. //note that we HAVE to use a Vector4 for the world position because our view & projection matrices are 4x4. //the matrix multiplication function isn't smart enough to use a vector3. The "w" value must be 1. float4 worldPosition = 1.0f; worldPosition.xyz = pos; if(UseWorldTransforms == false) worldPosition.xyz += CameraPosition; VSOUT output; output.Position = mul(mul(worldPosition, View), Projection); output.Color = lerp(instance.Color, instance.EndColor, lifeAmt); output.TexCoord = input.TexCoord; return output; } //3D Vertex colors only/////////////////////////////////////////////////////////////////////////// VSOUT VS_3D(float4 inPosition : POSITION, float4 inColor : COLOR) { VSOUT output; float4 worldPosition = mul(inPosition, World); float4 viewPosition = mul(worldPosition, View); output.Position = mul(viewPosition, Projection); output.Color = inColor; output.TexCoord = 0; return output; } VSOUT VS_2D(float4 inPos : POSITION, float4 inColor : COLOR) { VSOUT Output = (VSOUT)0; Output.Position = inPos; Output.Color = inColor; return Output; } //PIXEL SHADERS/////////////////////////////////////////////////////////////////////////// float4 PS_3D(VSOUT output) : COLOR0 { //Output.Color = tex2D(TextureSampler, vs_output.TextureCoord); //Output.Color.rgb *= saturate(PSIn.LightingFactor) + xAmbient; //return tex2D(TextureSampler, output.TexCoord); return output.Color * Tint * Ambient; } float4 PS_2D(VSOUT vs_output) : COLOR0 { return vs_output.Color * Tint; } float4 PS_3DTex(VSOUT input) : COLOR0 { float4 c = tex2D(TextureSampler, input.TexCoord); if(c.a <= 0.1) discard; //return c * (input.Color + Ambient); float3 Up = (float3)0; Up.y = 1; float LightFactor = dot(Up, -LightDirection); return (c * input.Color * Tint) * saturate(LightColor + Ambient); //return ((c * input.Color ) * saturate((LightColor * LightFactor) + Ambient)) ; } //TECHNIQUES/////////////////////////////////////////////////////////////////////////// technique Technique2D { pass Pass0 { VertexShader = compile vs_2_0 VS_2D(); PixelShader = compile ps_2_0 PS_2D(); } } technique VertexColor3D { pass Pass0 { VertexShader = compile vs_2_0 VS_3D(); PixelShader = compile ps_2_0 PS_3D(); } } technique Textured3D { pass Pass0 { VertexShader = compile vs_2_0 VS_3DTex(); PixelShader = compile ps_2_0 PS_3DTex(); } } technique TexturedQuad3D { pass Pass0 { VertexShader = compile vs_3_0 VS_3DQuadTex(); PixelShader = compile ps_3_0 PS_3DTex(); } } technique TexturedPointSprite3D { pass Pass0 { VertexShader = compile vs_3_0 VS_3DPointSpriteTex(); PixelShader = compile ps_3_0 PS_3DTex(); } } technique TexturedBillboard3D { pass Pass0 { VertexShader = compile vs_3_0 VS_3DBillboardTex(); PixelShader = compile ps_3_0 PS_3DTex(); } } technique LineSegment3D { pass Pass0 { VertexShader = compile vs_3_0 VS_3DLineSegment(); PixelShader = compile ps_3_0 PS_3D(); } } technique TexturedLine3D { pass Pass0 { VertexShader = compile vs_3_0 VS_3DTexturedLine(); PixelShader = compile ps_3_0 PS_3DTex(); } } And here is my complete rendering code for drawing quads, billboards, and point sprites using the above HLSL: public void Render(Camera3D camera, coreTime worldTime) { //...snipped irrelevant code... if (m_settings.PainterSort) { PainterSort(camera.Position); } //rebuild the vertex and index buffers if the collection has been changed. if (m_dirtyBuffers > 0) RebuildBuffers(); //activate our buffers //m_settings.GraphicsDevice.SetVertexBuffer(m_psVB); RasterizerState rs = m_settings.GraphicsDevice.RasterizerState; if (m_settings.DoubleSided == true) { RasterizerState rs2 = new RasterizerState(); rs2.CullMode = CullMode.None; m_settings.GraphicsDevice.RasterizerState = rs2; } m_settings.GraphicsDevice.DepthStencilState = DepthStencilState.DepthRead; m_settings.GraphicsDevice.Indices = m_IB; m_settings.GraphicsDevice.BlendState = m_settings.BlendState; m_effect.Parameters["g_Texture"].SetValue(m_settings.Texture); m_effect.Parameters["UseWorldTransforms"].SetValue(m_settings.UseWorldTransforms); m_effect.Parameters["View"].SetValue(camera.View); m_effect.Parameters["Projection"].SetValue(camera.Projection); m_effect.Parameters["CameraPosition"].SetValue(camera.Position); m_effect.Parameters["CameraUp"].SetValue(camera.Up); m_effect.Parameters["CurrentTime"].SetValue((float)worldTime.TotalWorldTime.TotalSeconds); m_effect.Parameters["Tint"].SetValue(m_settings.Tinting.ToVector4()); if (m_settings.UseWorldLights) { m_effect.Parameters["Ambient"].SetValue(BaseSettings.AmbientLight.ToVector4()); m_effect.Parameters["LightColor"].SetValue(BaseSettings.AllDirLights[0].Color.ToVector4()); m_effect.Parameters["LightDirection"].SetValue(BaseSettings.AllDirLights[0].Direction); } #region Draw Quads if (m_quadVB != null && m_quadVB.VertexCount > 0) { m_effect.CurrentTechnique = m_effect.Techniques["TexturedQuad3D"]; m_settings.GraphicsDevice.SetVertexBuffers( new VertexBufferBinding(m_VB, 0, 0), new VertexBufferBinding(m_quadVB, 0, 1)); foreach (EffectPass pass in m_effect.CurrentTechnique.Passes) { pass.Apply(); //m_settings.GraphicsDevice.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, m_psList.Count * 4, 0, m_psList.Count * 2); m_settings.GraphicsDevice.DrawInstancedPrimitives(PrimitiveType.TriangleList, 0, //base vertex 0, //min vertex index 4, //vertex count 0, //start index 2, //primitive count m_quadVB.VertexCount //instance count ); } } #endregion #region Draw Point sprites if (m_psVB != null && m_psVB.VertexCount > 0) { m_effect.CurrentTechnique = m_effect.Techniques["TexturedPointSprite3D"]; m_settings.GraphicsDevice.SetVertexBuffers( new VertexBufferBinding(m_VB, 0, 0), new VertexBufferBinding(m_psVB, 0, 1)); foreach (EffectPass pass in m_effect.CurrentTechnique.Passes) { pass.Apply(); //m_settings.GraphicsDevice.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, m_psList.Count * 4, 0, m_psList.Count * 2); m_settings.GraphicsDevice.DrawInstancedPrimitives(PrimitiveType.TriangleList, 0, 0, 4, //vertex count 0, //start index 2, //primitive count m_psVB.VertexCount //instance count ); } } #endregion #region Draw billboards if (m_bbVB != null && m_bbVB.VertexCount > 0) { m_effect.CurrentTechnique = m_effect.Techniques["TexturedBillboard3D"]; m_settings.GraphicsDevice.SetVertexBuffers( new VertexBufferBinding(m_VB, 0, 0), new VertexBufferBinding(m_bbVB, 0, 1)); foreach (EffectPass pass in m_effect.CurrentTechnique.Passes) { pass.Apply(); //m_settings.GraphicsDevice.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, m_psList.Count * 4, 0, m_psList.Count * 2); m_settings.GraphicsDevice.DrawInstancedPrimitives(PrimitiveType.TriangleList, 0, 0, 4, //vertex count 0, //start index 2, //primitive count m_bbVB.VertexCount //instance count ); } } #endregion m_settings.GraphicsDevice.DepthStencilState = DepthStencilState.Default; m_settings.GraphicsDevice.RasterizerState = rs; m_settings.GraphicsDevice.BlendState = BlendState.Opaque; } Here are both custom vertex definitions I came up with which are necessary for drawing the instanced primitives using HLSL:QuadInstanceVertex: public struct QuadInstanceVertex { //Optimize only if you have been able to profile a problem with the vertex byte size. Don't prematurely optimize and over-engineer. /// <summary> /// Offset from origin /// </summary> public Vector3 Position; /// <summary> /// Particle velocity /// </summary> public Vector3 Velocity; /// <summary> /// Normal direction for the quad face /// Quad: Set value; PointSprite: (0,0,0); BillBoard: (0,0,0) /// </summary> public Vector3 Normal; /// <summary> /// We actually have to include the up vector for quads because we just can't derive it within the shader. /// </summary> /// <remarks> /// The problem with trying to derive an up direction within the shader is that the shader compiler will UNROLL /// all of your branching logic. If you try to write any code to avoid dividing by zero, one of the branches will /// take that path anyways and divide by zero, causing *visual studio* to crash. So, rather than trying to run /// logic in the shader, we have to run it in the application. /// </remarks> public Vector3 Up; /// <summary> /// The starting width, length, and normal-axis rotation /// </summary> public Vector3 Scale; /// <summary> /// the end width, length and normal-axis rotational speed. /// length and width with be lerp'd, rotational speed will be added to initial value /// </summary> public Vector3 EndScale; /// <summary> /// Crucial timing values for the vertex shader. /// X = Spawn time of the particle/quad /// Y = lifespan of the quad; /// -1: always alive /// 0: dead /// 0+: alive /// </summary> /// <remarks>Your quad manager is responsible for removing a quad/particle when the lifespan reaches zero.</remarks> public Vector2 Time; /// <summary> /// The starting color. /// </summary> public Color Color; /// <summary> /// The ending color. Current value will be lerp'd between this and start color. /// </summary> public Color EndColor; /// <summary> /// Creates a particle with the following properties. /// </summary> /// <param name="pos">The position offset from the origin</param> /// <param name="norm">the normal of the face</param> /// <param name="scaleRot">initial length and width</param> /// <param name="color">initial color</param> /// <param name="zrot">initial rotation around the z-axis</param> /// <param name="endScaleRot">end length and width</param> /// <param name="t">x = spawn time; y = lifespan (-1: infinite, 0: dead; gt 0: alive)</param> /// <param name="dz">rotational speed</param> /// <param name="vel">velocity of the particle (if you want it to move)</param> /// <param name="endColor">end color of the particle</param> public QuadInstanceVertex(Vector3 pos, Vector3 vel, Vector3 norm, Vector3 up, Vector3 scaleRot, Vector3 endScaleRot, Color color, Color endColor, Vector2 t) { Position = pos; Velocity = vel; Normal = norm; Up = up; Scale = scaleRot; EndScale = endScaleRot; Color = color; EndColor = endColor; Time = t; } /// <summary> /// Creates a quad instance with the given properties. /// </summary> /// <param name="pos"></param> /// <param name="norm"></param> /// <param name="scaleRot"></param> /// <param name="color"></param> /// <param name="zrot"></param> public QuadInstanceVertex(Vector3 pos, Vector3 norm, Vector3 up, Vector3 scaleRot, Color color) { Position = pos; Velocity = Vector3.Zero; Normal = norm; Up = up; Scale = scaleRot; EndScale = scaleRot; Color = color; EndColor = color; Time = new Vector2(0,-1); } /*Note: The semantic usage index must be unique across ALL vertex buffers. The geometry vertex buffer already uses Position0 and TexCoord0.*/ public static readonly VertexDeclaration VertexDeclaration = new VertexDeclaration( new VertexElement( 0, VertexElementFormat.Vector3, VertexElementUsage.Position, 1), //12 (pos) new VertexElement(12, VertexElementFormat.Vector3, VertexElementUsage.Position, 2), //12 (velocity) new VertexElement(24, VertexElementFormat.Vector3, VertexElementUsage.Normal, 0), //12 (norm) new VertexElement(36, VertexElementFormat.Vector3, VertexElementUsage.Normal, 1), //12 (up) new VertexElement(48, VertexElementFormat.Vector3, VertexElementUsage.Position, 3), //12 (scale/rot) new VertexElement(60, VertexElementFormat.Vector3, VertexElementUsage.Position, 4), //12 (end scale/rot) new VertexElement(72, VertexElementFormat.Vector2, VertexElementUsage.Position, 5), //8 (time) new VertexElement(80, VertexElementFormat.Color, VertexElementUsage.Color, 0), //4 (color) new VertexElement(84, VertexElementFormat.Color, VertexElementUsage.Color, 1) //4 (end color) ); public const int SizeInBytes = 76; } QuadVertex: /// <summary> /// A vertex structure with Position and Texture coordinate data /// </summary> public struct QuadVertex : IVertexType { /*So, we're gonna get funky here. The R,G,B components of the color denote any color TINT for the quad. Since we also have an alpha channel, we're going to store the CornerID of the vertex within it!*/ public Vector3 Position; public Vector2 UV; /// <summary> /// Creates a vertex which contains position, normal, color, and texture UV info /// </summary> /// <param name="position">The position of the vertex, relative to the center</param> /// <param name="uv">The UV texture coordinates</param> /// <param name="rotation">A radian value indicating rotation around the normal axis</param> public QuadVertex(Vector3 position, Vector2 uv) { Position = position; UV = uv; } public static readonly VertexDeclaration VertexDeclaration = new VertexDeclaration( new VertexElement(0, VertexElementFormat.Vector3, VertexElementUsage.Position, 0), //12 bytes new VertexElement(12, VertexElementFormat.Vector2, VertexElementUsage.TextureCoordinate, 0) //8 bytes ); public const int SizeInBytes = 20; VertexDeclaration IVertexType.VertexDeclaration { get { return VertexDeclaration; } } } Quad Class: public class Quad { static QuadVertex[] m_verts; static int[] m_indices; public int Key = -1; //This is a vertex which contains all of our instance info. You can use it as either a data container //or as a vertex to be used by a vertex shader. public QuadInstanceVertex Info; private void Init(Vector3 position, Vector3 velocity, Vector3 normal, Vector3 up, Vector3 startSize, Vector3 endSize, Color startColor, Color endColor, Vector2 time) { BuildVerts(); BuildIndices(); Info = new QuadInstanceVertex(position, velocity, normal, up, startSize, endSize, startColor, endColor, time); } public Quad() { BuildVerts(); BuildIndices(); } /// <summary> /// Creates a quad based on the given values /// </summary> /// <param name="position">The center position of the quad</param> /// <param name="normal">The normal direction indicates the facing direction for the quad</param> /// <param name="orientation">This is the orientation of the quad around the normal axis</param> /// <param name="size">This is the scaled size of the quad</param> public Quad(Vector3 position, Vector3 normal, Vector3 up, float size, float orientation = 0) { Init(position, Vector3.Zero, normal, up, new Vector3(size,size, orientation), new Vector3(size,size, orientation), Color.White, Color.White, new Vector2(0,-1)); } /// <summary> /// Creates a POINT SPRITE at the given location. Use HLSL code for the rest. /// </summary> /// <param name="center">the center position of the point sprite</param> /// <param name="size">the size of the point sprite</param> /// <param name="orientation">the rotation around the normal axis for the point sprite</param> public Quad(Vector3 position, float size, float orientation = 0) { Init(position, Vector3.Zero, Vector3.Zero, Vector3.Up, new Vector3(size, size, orientation), new Vector3(size, size, orientation), Color.White, Color.White, new Vector2(0, -1)); } /// <summary> /// Creates a generalized quad for use with hardware instancing. /// </summary> /// <param name="position">This is the position in the game world</param> /// <param name="velocity">This is how much the quad moves each frame</param> /// <param name="normal">QUAD Only: This is the facing direction of the quad. Point sprites and billboards will derive this value based on camera position.</param> /// <param name="startSize">Starting scale and rotation: X = width, Y = height, Z = initial radian rotation</param> /// <param name="endSize">Ending scale and rotation: X = width, Y = height, Z = change in rotation over time</param> /// <param name="startColor">The starting color values for tinting. Use Color.White if you don't want tinting</param> /// <param name="endColor">The ending color values for tinting. Use Color.White if you don't want tinting</param> /// <param name="time">X = Birth time in gametime seconds. Y = lifespan in seconds. Set lifespan to -1 if the quad is static. Default: (0, -1)</param> public Quad(Vector3 position, Vector3 velocity, Vector3 normal, Vector3 up, Vector3 startSize, Vector3 endSize, Color startColor, Color endColor, Vector2 time) { Init(position, velocity, normal, up, startSize, endSize, startColor, endColor, time); } static void BuildIndices() { if (m_indices == null) { m_indices = new int[6]; //create the indicies for this quad. Note: winding order is in clockwise order. m_indices[0] = 0; m_indices[1] = 1; m_indices[2] = 2; m_indices[3] = 0; m_indices[4] = 2; m_indices[5] = 3; } } /// <summary> /// This gets six indicies for this quad. /// The indicies can then be inserted into an index buffer. /// </summary> /// <returns>Six indicies for drawing a triangle list</returns> public static int[] Indicies { get { if (m_indices == null) BuildIndices(); return m_indices; } } static void BuildVerts() { if (m_verts == null) { m_verts = new QuadVertex[4]; m_verts[0] = new QuadVertex(new Vector3(-1, -1, 0), new Vector2(0, 1)); //bottom left corner m_verts[1] = new QuadVertex(new Vector3(-1, 1, 0), new Vector2(0, 0)); //top left corner m_verts[2] = new QuadVertex(new Vector3(1, 1, 0), new Vector2(1, 0)); //top right corner m_verts[3] = new QuadVertex(new Vector3(1, -1, 0), new Vector2(1, 1)); //bottom right corner } } public static QuadVertex[] Verts { get { if (m_verts == null) BuildVerts(); return m_verts; } } }

Graphics and GPU Programming Programming

Started by Husbj September 06, 2014 11:10 PM

10 comments, last by slayemin 9 years, 7 months ago

Husbj

658

Author

September 06, 2014 11:10 PM

I find myself having another relatively baffling issue when playing around with billboard GPGPU-based particle rendering.

This time I've got a proper render in all but excessive z-fighting that seems to occur due to the order in which each billboard (particle) is being drawn varies between frames. I tried to record a video reference of the issue but for whatever reason Fraps decided to only record a black screen tonight. I can try to get a properly recorded video up later if needed but I thought I would post this before bed tonight still.

If I disable my alpha testing and go purely with alpha blending it appears that indeed the completely transparent pixels of closer particles will sometimes overwrite opaque pixels of particles being drawn behind those, suggesting that completely transparent pixel writes seem to fill out the depth buffer. The billboard quads aren't particularly close to each others at all so this cannot be a normal z-fighting problem as far as I can tell.

May it be that I'm forgetting some render state I ought to set? Or is this a common "problem" that has to be solved by ensuring that my individual quads are created back-to-front from my geometry shader?

kauna

2,925

September 07, 2014 01:22 AM

Dare to show some shader code? Is your projection matrix properly set with good znear and zfar values? Is the problem affecting only particles, and other things render correctly?

Cheers!

Hodgman

52,718

September 07, 2014 01:59 AM

Sounds like plain old alpha blending issues - they have to be sorted/drawn back to front. You'd have to sort your particles (using a compute shader, etc) before this billboard pass.
Alternatively you can disable depth-writes, and instead of z-test artifacts, deal with blend order artifacts instead.

. 22 Racing Series .

Husbj

658

Author

September 07, 2014 08:23 AM

Dare to show some shader code? Is your projection matrix properly set with good znear and zfar values? Is the problem affecting only particles, and other things render correctly?

The code is rather messy at the moment but basically it goes like

Using a pointlist topology, get each single vertex' id from the GS and use it to index into a StructuredBuffer previously built by two compute shader programs, Update and Emit.
Create a quad as an array of four vertices. The quad is made to align with the camera by calculating its right vector as the cross product of the up vector (static (0, 1, 0)) and (quadCenterPos - EyePos).
Project the vertex positions using a world-view-projection matrix. There is nothing wrong with this one, it does render other things just well and if you move the beholding position around you can see that there is indeed space in-between the individual particles as there should be.
Append the four vertices (bottom left, top left, bottom right, top right) to an output TriangleStream from the GS.

I reached Hodgman's conclusion that they will indeed have to be independently sorted; after all I do sort my individual transparent meshes by view depth already.

Then the next problem will be finding an efficient sorting algorithm that can be parallelized. I'm sure that's just a google search away though.

Disabling depth writing would probably cause similarly obvious artifacts since the order of the particles can currently change from one frame to another depending on how the update threads finish copying data over from the old to the new state buffers, or am I wrong to make that assumption?

Thanks,

Husbjörn

Husbj

658

Author

September 08, 2014 05:15 PM

Hmm... so I did a quick test of implementing a recursive mergesort knockoff in a compute shader, performing a separate dispatch call for each split.

Unfortunately this seems to be very inefficient (on average it seems my implementation sorts one million integers in slightly over half a second).

The following is a simple, dirty HLSL program for doing the sorting:


cbuffer SortCountData : register(b0) {
	uint ArraySize;
	uint ElemCount;
};

struct sElemData {
	int id;
};

StructuredBuffer<sElemData>	In  : register(t0);
RWStructuredBuffer<sElemData>	Out : register(u0);

[numthreads(64, 1, 1)]
void MergeSort(uint3 threadId : SV_DispatchThreadId) {
	int leftOffset	= threadId.x * ElemCount * 2;
	int rightOffset	= leftOffset + ElemCount;
	int leftSize	= ElemCount;
	int rightSize	= (rightOffset + ElemCount >= ArraySize) ? ArraySize - rightOffset : ElemCount;
	int subSize	= ElemCount * 2;
	int leftId	= 0;
	int rightId	= 0;

	if((uint)leftOffset >= ArraySize)
		return;
	for(int n = 0; n < subSize; n++) {
		if(leftId >= leftSize) {
			// Add all remaining elements from the (sorted) right list
			while(n < subSize)
				Out[leftOffset + n++].id = In[rightOffset + rightId++].id;
			return;
		} else if(rightId >= rightSize) {
			// Add all remaining elements from the (sorted) left list
			while(n < subSize)
				Out[leftOffset + n++].id = In[leftOffset + leftId++].id;
			return;
		}
		if(In[leftOffset + leftId].id <= In[rightOffset + rightId].id)
			Out[leftOffset + n].id = In[leftOffset + leftId++].id;
		else
			Out[leftOffset + n].id = In[rightOffset + rightId++].id;
	}

}

I'm swapping the In and Out buffers between dispatches so that the shader always works with merging two individually sorted sub-lists.

The number of thread groups for each dispatch is determined as ceil((totalBufferElementCount / (subBufferElementCount * 2)) / 2.0f) and subBufferElementCount = pow(2, pass) where pass goes from zero to the rounded-up log2() of the total buffer element count.

I tried removing the first passes by doing an initial simple O(n^2) sort on the items into 64-element sub buffers so that the compute shader wouldn't have to start with single element buffers, but that didn't seem to increase the efficiency in any noticible way, which indicates that the majority of the slowdown would come from the last passes where finally a single thread will have to go through the entire buffer. However I can think of no other, more parallellized way of sorting an entire list; it cannot be entirely done in separate passes (threads)?

I suppose there might be other algorithms that lend themselves better to this type of use, however I haven't been able to find any adequate descriptions of things like bitonic and radix sorts which are mentioned in various papers but never really defined.

Since this must doubtlessly be a rather common problem to solve, I was wondering if anybody might point out something obvious I've overlooked, a better way to parallelize mergesort (or some other type of sorting) or perhaps provide a (informative, not "buy the whole paper with ambiguous content by clicking here" ) source on the afforementioned networked sorting algorithms?

ankhd

2,304

September 10, 2014 07:50 AM

Hi.

You should be able to fix the problem with blend states. Im using what

you are gpu partilces and mine all work good.

maybe a image you need to show us.

And I noticed some strange z stuff when I was messing with the blend state once.

Husbj

658

Author

September 10, 2014 11:42 AM

Are you sure about that; doesn't blending work just by blending with the current backbuffer value at each pixel, so if you don't draw things back-to-front one of your frontal particles may end up blending with the render target clear colour and thus draw that on top of other particles that should appear behind it?

Still I would be interested in hearing your blend state settings if you believe that might be good enough :)

I rewrote my sorting algorithm to this which performs quite better (although still not at a desirable rate, but it should be "good enough" for a reasonable particle count I guess):


cbuffer GlobalData : register(b0) {
	uint BufferSize;
};

cbuffer PassData : register(b1) {
	uint SubSize;
};

Buffer<int>	In  : register(t0);
RWBuffer<int>	Out : register(u0);



// This program sorts 2-element subarrays by comparing and swapping their elements; can be used as a first pass
[numthreads(64, 1, 1)]
void Sort2(uint3 threadId : SV_DispatchThreadId) {
	uint offset = threadId.x * 2;
	if(offset < BufferSize - 1) {
		if(Out[offset] > Out[offset + 1]) {
			int tmp = Out[offset + 1];
			Out[offset + 1] = Out[offset];
			Out[offset] = tmp;
		}
	}
}


// A mergesort implementation; works in steps of SubSize * 2 per thread
[numthreads(64, 1, 1)]
void MergeSort(uint3 threadId : SV_DispatchThreadId) {
	uint offset = threadId.x * SubSize * 2;
	uint pLeft  = offset;
	uint pRight = offset + SubSize;
	uint lLeft  = pRight;
	uint lRight = min(SubSize, BufferSize - pRight);

	if(offset < BufferSize) {
		// Elements left in both lists?
		while(pLeft < lLeft && pRight < lRight) {
			if(In[pLeft] <= In[pRight]) {
				Out[offset++] = In[pLeft++];
			} else {
				Out[offset++] = In[pRight++];
			}
		}
		// When we get here one list has been exhausted; add the remaining elements in the other one (which is already sorted) to the output
		while(pLeft < lLeft) {
			Out[offset++] = In[pLeft++];
		}
		while(pRight < lRight) {
			Out[offset++] = In[pRight++];
		}
	}
}

However I just discovered that the only way I can dispatch the MergeSort shader for the appropriate number of passes (log2(BufferSize)) is to indeed read the append buffer's element count back to the CPU which I was hoping I shouldn't have to do. Is there any way around this?

ankhd

2,304

September 10, 2014 12:57 PM

mabe depth buffer writes

I have this set in shader

DepthStencilState DepthWrites
{
DepthEnable = TRUE;
DepthWriteMask = ZERO;
};

I Only have dust and flame thrower and a explosion type and they look fine on the terrain and they not render through the terrain when there is a hill.

the back particles may indeed blend wrong but can't see it in a explostion. not yet any way.

heres some fire

link

Can we see a image.

Husbj

658

Author

September 10, 2014 05:12 PM

True, doing that gets rid of the clear colour forming a rectangle around the individual particles and everything looks fine on a per-frame basis.

Because of that showing an image doesn't help much; the individual frame capture images look just fine. However because of the way my particles are updates their draw order will vary from frame to frame and this is what causes issues; in one frame particle A is drawn before particle B and in the next frame particle B gets drawn before particle A. This causes quite noticible flickering when particles overlap. The problem wouldn't be very apparent if the particles used the same single colour, but as this is just a test to ensure I'll get proper results with multiple colours, all of my individual particles are blended with a random colour.

Fraps is still refusing to record anything besides a black screen with its FPS watermark on top so unfortunately I cannot produce a video of the issue either. I guess I could upload an executable if you like?

Edit: my blend states are

SrcBlend = D3D11_BLEND_SRC_ALPHA

DstBlend = D3D11_BLEND_INV_SRC_ALPHA

SrcAlphaBlend = D3D11_BLEND_ONE

DstAlphaBlend = D3D11_BLEND_ZERO

BlendOp = D3D11_BLEND_OP_ADD

AlphaBlendOp = D3D11_BLEND_OP_ADD

ColorWriteMask = D3D11_COLOR_WRITE_ENABLE_ALL

by the way, in case that would affect anything.

neroziros

234

September 22, 2014 03:14 AM

Hmm... so I did a quick test of implementing a recursive mergesort knockoff in a compute shader, performing a separate dispatch call for each split.

Unfortunately this seems to be very inefficient (on average it seems my implementation sorts one million integers in slightly over half a second).

The following is a simple, dirty HLSL program for doing the sorting:
cbuffer SortCountData : register(b0) {
	uint ArraySize;
	uint ElemCount;
};

struct sElemData {
	int id;
};

StructuredBuffer<sElemData>	In  : register(t0);
RWStructuredBuffer<sElemData>	Out : register(u0);

[numthreads(64, 1, 1)]
void MergeSort(uint3 threadId : SV_DispatchThreadId) {
	int leftOffset	= threadId.x * ElemCount * 2;
	int rightOffset	= leftOffset + ElemCount;
	int leftSize	= ElemCount;
	int rightSize	= (rightOffset + ElemCount >= ArraySize) ? ArraySize - rightOffset : ElemCount;
	int subSize	= ElemCount * 2;
	int leftId	= 0;
	int rightId	= 0;

	if((uint)leftOffset >= ArraySize)
		return;
	for(int n = 0; n < subSize; n++) {
		if(leftId >= leftSize) {
			// Add all remaining elements from the (sorted) right list
			while(n < subSize)
				Out[leftOffset + n++].id = In[rightOffset + rightId++].id;
			return;
		} else if(rightId >= rightSize) {
			// Add all remaining elements from the (sorted) left list
			while(n < subSize)
				Out[leftOffset + n++].id = In[leftOffset + leftId++].id;
			return;
		}
		if(In[leftOffset + leftId].id <= In[rightOffset + rightId].id)
			Out[leftOffset + n].id = In[leftOffset + leftId++].id;
		else
			Out[leftOffset + n].id = In[rightOffset + rightId++].id;
	}

}
I'm swapping the In and Out buffers between dispatches so that the shader always works with merging two individually sorted sub-lists.

The number of thread groups for each dispatch is determined as ceil((totalBufferElementCount / (subBufferElementCount * 2)) / 2.0f) and subBufferElementCount = pow(2, pass) where pass goes from zero to the rounded-up log2() of the total buffer element count.

I tried removing the first passes by doing an initial simple O(n^2) sort on the items into 64-element sub buffers so that the compute shader wouldn't have to start with single element buffers, but that didn't seem to increase the efficiency in any noticible way, which indicates that the majority of the slowdown would come from the last passes where finally a single thread will have to go through the entire buffer. However I can think of no other, more parallellized way of sorting an entire list; it cannot be entirely done in separate passes (threads)?

I suppose there might be other algorithms that lend themselves better to this type of use, however I haven't been able to find any adequate descriptions of things like bitonic and radix sorts which are mentioned in various papers but never really defined.

Since this must doubtlessly be a rather common problem to solve, I was wondering if anybody might point out something obvious I've overlooked, a better way to parallelize mergesort (or some other type of sorting) or perhaps provide a (informative, not "buy the whole paper with ambiguous content by clicking here" ) source on the afforementioned networked sorting algorithms?

Hi man! Im having the same problem that you had. How did you calculate the SortCountData data to pass each frame? Thanks in advance for your time

Geometry shader-generated camera-aligned particles seemingly lacking Z writing

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Geometry shader-generated camera-aligned particles seemingly lacking Z writing

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines