Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 01 Jan 2009
Offline Last Active Aug 02 2014 05:24 AM

Topics I've Started

FBO Questions

04 July 2014 - 07:09 AM



I'm doing shadow mapping with exponential shadow maps (ESM) in OpenGL 4.3.

This requires blurring the shadow map, which I do with a normal 2-pass gaussian blur.

What I want to do is the following:


Shadow map -> (Vertical Blur Shader) -> Intermediate Texture -> (Horizontal Blur Shader) -> Shadow Map


The shadow map is a DepthComponent32f texture and the intermediate texture uses R32f.

The first pass works fine, but for the second pass, where I want to write back to the shadow map, I can't seem to use the shadow map as a FBO color attachment, so I'm unable to write back to it.

I've also noticed, completely by accident, that I can sample from a texture that I am currently writing to, without any ill results.
For example I can do:
Texture -> (Vertical Blur Shader) -> Texture
To recap:
  1. Is there a way to use a DepthComponent texture as a color attachment in a FBO?
  2. Why can I sample a texture that I'm currently writing to? Is this legal in OpenGL 4.3, or is the behavior undefined? What happens behind the scenes? Does it internally create a new texture to write to, and then discard the old one when the draw call finishes?


glDrawArraysInstanced performance on Intel hd2000/3000

31 March 2014 - 01:32 PM



I'm currently working on a game project which uses OpenTK targeting OpenGL 3.1.

After implementing instanced rendering ( instanced vertex attributes using glDrawArraysInstanced ) I've noticed a horrible performance drop on our intel test machine ( Intel HD3000 ).

It went from 2ms frame time to over 2000ms.

All other machines (using Amd and Nvidia cards) are performing better with instancing.

I've checked the intel site to make sure that the HD3000 supports OpenGL 3.1 and that it has the latest drivers.


Do you have any ideas what could be causing this issue?


Buffer Structures:

[StructLayout( LayoutKind.Sequential )]
struct VertexData
    public Vector3 Position;
    public Vector3 Normal;
    public Vector2 TexCoord;

    public static readonly int SizeInBytes = Marshal.SizeOf( new VertexData() );
    public VertexData( Vector3 position, Vector3 normal, Vector2 texcoord )
        Position = position;
        Normal = normal;
        TexCoord = texcoord;

[StructLayout( LayoutKind.Sequential )]
public struct InstanceData
    public Vector4 SpriteRect;
    public Vector4 DestinationRect;
    public Color4 Color;
    public Vector4 Scissors;

    public static readonly int SizeInBytes = Marshal.SizeOf( new InstanceData() );

VAO Creation:

float v1 = -1f;
float v2 = 1f;

VertexData[] Vertices = new VertexData[]
    new VertexData(
        new Vector3(v1, v2, 0),
        new Vector3(0, 0, 1),
        new Vector2(0, 0)),
    new VertexData(
        new Vector3(v2, v2, 0),
        new Vector3(0, 0, 1),
        new Vector2(1, 0)),
    new VertexData(
        new Vector3(v1, v1, 0),
        new Vector3(0, 0, 1),
        new Vector2(0, 1)),
    new VertexData(
        new Vector3(v1, v1, 0),
        new Vector3(0, 0, 1),
        new Vector2(0, 1)),
    new VertexData(
        new Vector3(v2, v2, 0),
        new Vector3(0, 0, 1),
        new Vector2(1, 0)),
    new VertexData(
        new Vector3(v2, v1, 0),
        new Vector3(0, 0, 1),
        new Vector2(1, 1))

Buffer vertexBuffer = Buffer.CreateVertexBuffer( Vertices, VertexData.SizeInBytes );
InstanceBuffer = Buffer.CreateInstanceBuffer( InstanceData.SizeInBytes, 4096 );

GL.GenVertexArrays( 1, out VAOHandle );
GL.BindVertexArray( VAOHandle );

// Vertex Buffer
GL.VertexAttribPointer( 0, 3, VertexAttribPointerType.Float, false, VertexData.SizeInBytes, 0 );
GL.EnableVertexAttribArray( 0 );

GL.VertexAttribPointer( 1, 3, VertexAttribPointerType.Float, false, VertexData.SizeInBytes, Vector3.SizeInBytes );
GL.EnableVertexAttribArray( 1 );

GL.VertexAttribPointer( 2, 2, VertexAttribPointerType.Float, false, VertexData.SizeInBytes, Vector3.SizeInBytes * 2 );
GL.EnableVertexAttribArray( 2 );

// Instance Buffer
GL.VertexAttribPointer( 3, 4, VertexAttribPointerType.Float, false, InstanceData.SizeInBytes, 0 );
GL.EnableVertexAttribArray( 3 );
GL.VertexAttribDivisor( 3, 1 );

GL.VertexAttribPointer( 4, 4, VertexAttribPointerType.Float, false, InstanceData.SizeInBytes, Vector4.SizeInBytes );
GL.EnableVertexAttribArray( 4 );
GL.VertexAttribDivisor( 4, 1 );

GL.VertexAttribPointer( 5, 4, VertexAttribPointerType.Float, false, InstanceData.SizeInBytes, Vector4.SizeInBytes * 2 );
GL.EnableVertexAttribArray( 5 );
GL.VertexAttribDivisor( 5, 1 );

GL.VertexAttribPointer( 6, 4, VertexAttribPointerType.Float, false, InstanceData.SizeInBytes, Vector4.SizeInBytes * 3 );
GL.EnableVertexAttribArray( 6 );
GL.VertexAttribDivisor( 6, 1 );

GL.BindVertexArray( 0 );

Vertex Shader:

#version 140

// Vertex Data
in vec3 in_position;
in vec3 in_normal;
in vec2 in_texcoord;
// Instance Data
in vec4 in_spriteRect;
in vec4 in_destinationRect;
in vec4 in_color;
in vec4 in_scissors;

// Output
out vec3 vs_normal;
out vec2 vs_texcoord;
out vec4 vs_color;
out vec4 vs_scissors;

void main()
	// Texture Coordinates
	vs_texcoord = in_texcoord;
	vs_texcoord *= in_spriteRect.zw;
	vs_texcoord += in_spriteRect.xy;

	// Position
	vec4 Position = vec4( in_position, 1.0f );

	// Normalize to [0, 1]
	Position.xy = Position.xy * 0.5f + 0.5f;

	// Apply Destination Transform
	Position.xy *= in_destinationRect.zw;
	Position.xy += in_destinationRect.xy;

	// Normalize to [-1, 1]
	Position.xy = Position.xy * 2.0f - 1.0f;

	// In OpenGL -1,-1 is the bottom left screen corner
	// In DirectX -1,-1 is the top left screen corner
	Position.y += 2.0f - in_destinationRect.w * 2.0f;

	vs_normal = in_normal;
	vs_color = in_color;
	vs_scissors = in_scissors;

	gl_Position = Position;

Fragment Shader:

#version 140

uniform sampler2D Tex;

// Input
in vec3 vs_normal;
in vec2 vs_texcoord;
in vec4 vs_color;
in vec4 vs_scissors;

// Output
out vec4 out_frag_color;

bool ScissorTest()
	return	gl_FragCoord.x > vs_scissors.x &&
			gl_FragCoord.y > vs_scissors.y &&
			gl_FragCoord.x < vs_scissors.x + vs_scissors.z &&
			gl_FragCoord.y < vs_scissors.y + vs_scissors.w;

void main()
	out_frag_color = vec4(0, 0, 0, 0);

			out_frag_color = texture( Tex, vs_texcoord ) * vs_color;


InstanceBuffer.Write( InstanceDataCPU, InstanceData.SizeInBytes, InstanceCount );
GL.BindVertexArray( VAOHandle );
BindTexture( TextureHandle, 0, texture );
GL.UseProgram( ProgramHandle );
GL.DrawArraysInstanced( PrimitiveType.Triangles, 0, 6, InstanceCount );

GPU normal vector generation for high precision planetary terrain

27 September 2012 - 01:37 PM

I generate procedural planets pretty much entirely using compute shaders. (CPU manages a modified quad tree for LOD calculations)
The compute shader outputs vertex data on a per terrain patch basis, which is stored in buffers.
Normal vectors are calculated during this stage by using a sobel operator on the generated position data:

[source lang="cpp"]// Only operate on non-padded threadsif((GroupThreadID.x > 0) && (GroupThreadID.x < PaddedX - 1) && (GroupThreadID.y > 0) && (GroupThreadID.y < PaddedY - 1)){ // Generate normal vectors float3 C = VertexPosition; float3 T = GetSharedPosition(GroupThreadID.x, GroupThreadID.y + 1); float3 TR = GetSharedPosition(GroupThreadID.x + 1, GroupThreadID.y + 1); float3 R = GetSharedPosition(GroupThreadID.x + 1, GroupThreadID.y); float3 BR = GetSharedPosition(GroupThreadID.x + 1, GroupThreadID.y - 1); float3 B = GetSharedPosition(GroupThreadID.x, GroupThreadID.y - 1); float3 BL = GetSharedPosition(GroupThreadID.x - 1, GroupThreadID.y - 1); float3 L = GetSharedPosition(GroupThreadID.x - 1, GroupThreadID.y); float3 TL = GetSharedPosition(GroupThreadID.x - 1, GroupThreadID.y + 1); float3 v1 = normalize((TR + 2.0*R + BR) * 0.25 - C); float3 v2 = normalize((TL + 2.0*T + TR) * 0.25 - C); float3 v3 = normalize((TL + 2.0*L + BL) * 0.25 - C); float3 v4 = normalize((BL + 2.0*B + BR) * 0.25 - C); float3 N1 = cross(v1, v2); float3 N2 = cross(v3, v4); Normal = (N1 + N2) * 0.5; // Write Normal to Shared Memory SharedMemory[GroupIndex].Normal = Normal;}[/source]

This works very well in most situations.
Unfortunately, once I get to very high LOD levels, floating point precision causes quite a few issues.

In order to illustrate the problem I make the compute shader generate a sphere of radius 1.
I then use the following code to display the error rate of the generated normal vectors:
[source lang="cpp"]float3 NormalError = abs(Normal - normalize(PositionWS)) * 10.0;[/source]

LOD 16 - First signs of errors, no visual artifacts
Posted Image

LOD 20 - First visual artifacts. Can be masked with normal mapping or some perlin noise.
Posted Image

LOD 24 (highest lod): Visual artifacts are visible all over the terrain.

Posted Image

At this LOD, vertices are only 0.0000000596 units apart from each other, hence the problem with my current method for generating normal vectors.

I understand that I'm pushing the limits of floating point precision here, and not having that high of a terrain resolution isn't that big of an issue, but I was wondering if anyone had any ideas on how to squeeze out a little more detail?


Constant Buffer usage

16 September 2012 - 07:35 AM

I went over some of my code with a colleague yesterday and he was quite surprised by how I manage my constant buffers.
Except for a few rare and very specific situations, I only use 2 "global" constant buffers.

A per-frame buffer, which contains data which only needs to be updated once per frame.
[source lang="cpp"]cbuffer PerFrameCB : register (b0){ float4x4 CameraView : packoffset( c0.x); float4x4 CameraProjection : packoffset( c4.x); float4 CameraPosition : packoffset( c8.x); float4 SunDirection : packoffset( c9.x); float2 ViewportSize : packoffset(c10.x);}[/source]

And another buffer used for everything else.
This buffer is 1kb in size, and is updated whenever new data is needed, which is multiple times per frame.
Both of these buffers are always bound to the registers b0 and b1 for all shader stages.

I've been told that this is the "wrong" way to do it. I'm supposed to split this up into individual constant buffers.

However I don't understand why that is the case.
If I split my current b1 constant buffer into X buffers, not only do I still need to update these buffers, but I'll also need to bind a new constant buffer whenever new data is needed.

I don't see how my method is wrong, but I'm a little paranoid when I hear such claims because I am self-taught.
So I figured it's better to ask than potentially doing something wrong.


Structured buffer float compression

05 June 2012 - 04:12 AM

I have a computer shader that generates procedural planetary terrain and stores vertices in a structured buffer which has the following layout.
struct PlanetVertex
  float3 Position;
  float3 Normal;
  float Temperature;
  float Humidity;

That's 10 floats with 4 bytes per float -> 40 bytes per vertex.
A terrain node or patch contains 33x33 vertices, which is 43560 bytes.
At the highest quality setting, the compute shader will output up to 5000 nodes,
so the buffer needs to be 5000 * 43560 bytes, which is
217800000 bytes or ~207mb.

Due to the way I handle load balancing between rendering and terrain generation, I need to have this buffer in memory twice, so I use
~415mb only for vertex data.
This is okay I guess, since a planet is sort of the primary object, but I want to reduce this buffer size if possible.

For example the normal vector: It doesn't need 32bit precision per channel, 16 would be more than enough.
As for temperature and humidity, they could even fit in an 8bit unorm, but I doubt that's available here.

I found that there are f32tof16 and f16tof32 functions in hlsl, which I assume are what I need, but I cannot quite figure out how they're supposed to work:

It says here that f32to16 returns a uint, but isn't that 32 bit as well?