Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 04 Feb 2011
Offline Last Active Apr 19 2016 01:15 AM

#5278270 "check before flight" list - for OpenGL

Posted by Yours3!f on 26 February 2016 - 06:25 AM

hi there,


today I had the idea to write a list of things to check if there's nothing rendered on screen, for newbies.

I think this would tremendously help them, especially if there would be a piece of code that check each of these.


If you have any additions to this list, please comment it below.


-is there any opengl context?

-is any kind of swapbuffer called?

-is it called after everything has been rendered?

-what kind of frame buffer object are you rendering into?

-if you are using shaders, do you bind the correct one?

-are the shader uniforms bound?

-are shaders compiled and linked properly?

-is there any vertex buffer object, vertex array object bound?

-is there any index buffer object bound?

-if you are using textures, are the textures bound?

-are you using the depth state that you want to use?

-is blending enabled?

-are you using the blending function you wanted to use?

-is scissoring enabled?

-is backface culling enabled?




please contribute to this list :)

#5192158 want to create an fps

Posted by Yours3!f on 10 November 2014 - 04:01 PM

just to be clear about the complexity: I want to do something like this, except the asset complexity (ie. one weapon, one enemy, simple assets, no jumping or anything remotely complex), a map this simple would do fine.

#5185747 Texture Storage + Texture Views AND mipmapping

Posted by Yours3!f on 08 October 2014 - 08:16 AM

okay, found the error, see the code for the fix smile.png

//create immutable texture storage
glGenTextures( 1, &orig_tex );
float num_mips = std::log2( float( std::max( orig_tex_width, orig_tex_height ) ) ) + 1;
glBindTexture( GL_TEXTURE_2D, orig_tex );

//allocate texture storage (w/ mipmaps)
glTexStorage2D( GL_TEXTURE_2D, num_mips, GL_RGBA8, orig_tex_width, orig_tex_height );

//upload base texture
glTexSubImage2D( GL_TEXTURE_2D, 0, 0, 0, im.getSize().x, im.getSize().y, GL_RGBA, GL_UNSIGNED_BYTE, im.getPixelsPtr() );

//alternatively use manual mipmap generation here
glGenerateMipmap( GL_TEXTURE_2D );


//create texture views for the immutable storage
glGenTextures( 1, &tex );
glTextureView( tex, GL_TEXTURE_2D, orig_tex, GL_RGBA8, 0, num_mips, 0, 1 );

//FIX: need to bind the texview to set the sampler state... DSA pls?
glBindTexture( GL_TEXTURE_2D, texview ); 


I also wrote a small app for manual mipmap generation using point, bilinear, bicubic (with various interpolation) in the process. (got some help for bicubic from here: http://www.codeproject.com/Articles/236394/Bi-Cubic-and-Bi-Linear-Interpolation-with-GLSL)


CPU side:

void set_workgroup_size( vec2& gws, vec2& lws, vec2& dispatch_size, const uvec2& screen )
  //set up work group sizes
  unsigned local_ws[2] = {16, 16};
  unsigned global_ws[2];
  unsigned gw = 0, gh = 0, count = 1;
  while( gw < screen.x )
    gw = local_ws[0] * count;
  count = 1;
  while( gh < screen.y )
    gh = local_ws[1] * count;

  global_ws[0] = gw;
  global_ws[1] = gh;

  gws = vec2( global_ws[0], global_ws[1] );
  lws = vec2( local_ws[0], local_ws[1] );
  dispatch_size = gws / lws;

void gen_mipmaps( GLuint shader, GLuint texture, GLenum internal_format, uvec2 size, unsigned miplevels)
  glUseProgram( shader );

  size.x /= 2;
  size.y /= 2;

  for( int d = 1; d < miplevels; ++d )
    glBindImageTexture( 0, texture, d-1, GL_FALSE, 0, GL_READ_ONLY, internal_format );
    glBindImageTexture( 1, texture, d, GL_FALSE, 0, GL_WRITE_ONLY, internal_format );

    vec2 dispatch_size, gws, lws;
    set_workgroup_size( gws, lws, dispatch_size, size );

    glDispatchCompute( dispatch_size.x, dispatch_size.y, 1 );


    size.x /= 2;
    size.y /= 2;

GPU side:

#version 430 core

layout(binding=0, rgba8) readonly uniform image2D src_tex;
layout(binding=1) writeonly uniform image2D dst_tex;

layout(local_size_x = 16, local_size_y = 16) in; //local workgroup size

vec4 sample_point( vec2 coord, vec2 size )
  return imageLoad( src_tex, ivec2(coord * size) );

vec4 sample_bilinear( vec2 coord, vec2 size )
  ivec2 final_coord = ivec2(coord * size);
  vec4 s00 = imageLoad( src_tex, final_coord + ivec2(0, 0) );
  vec4 s01 = imageLoad( src_tex, final_coord + ivec2(1, 0) );
  vec4 s10 = imageLoad( src_tex, final_coord + ivec2(0, 1) );
  vec4 s11 = imageLoad( src_tex, final_coord + ivec2(1, 1) );
  float xval = fract( coord.x * size.x );
  float yval = fract( coord.y * size.y );
  return mix( mix( s00, s10, xval ), mix( s01, s11, xval ), yval );

float triangular( float f )
  f *= 0.5;
  return f < 0 ? f + 1 : 1 - f;

float bell( float f )
  f = f * (0.5 * 1.5); //rescale [-2...2] to [-1.5...1.5]
  if( f >= -1.5 && f < -0.5 )
    f += 1.5;
    return 0.5 * f * f;
  else if( f >= -0.5 && f < 0.5 )
    return 0.75 - ( f * f );
  else if( f >= 0.5 && f < 1.5 )
    f -= 1.5;
    return 0.5 * f * f;
    return 0;

float bspline( float f )
  f = abs(f);
  if( f >= 0 && f <= 1 )
    return (2/3.0) + 0.5 * ( f * f * f ) - ( f * f );
  else if( f > 1 && f <= 2 )
    f = 2 - f;
    return (1/6.0) * f * f * f;
    return 1;

vec4 sample_bicubic( vec2 coord, vec2 size )
  vec4 sum = vec4(0);
  vec4 weight_sum = vec4(0);
  ivec2 final_coord = ivec2(coord * size);
  float xval = fract( coord.x * size.x );
  float yval = fract( coord.y * size.y );
  for( int y = -1; y <= 2; ++y )
    for( int x = -1; x <= 2; ++x )
      vec4 s = imageLoad( src_tex, final_coord + ivec2(x, y) );
      float fx = bspline( x - xval ); //can use bell or triangular as functions
      float fy = bspline( -(y - yval) );
      float weight = fx * fy;
      sum += s * weight;
      weight_sum += weight;
  return sum / weight_sum;

void main()
	ivec2 global_id = ivec2( gl_GlobalInvocationID.xy );
  ivec2 global_size = imageSize( dst_tex ).xy;
  ivec2 src_size = imageSize( src_tex ).xy;
  vec2 texcoord = vec2(global_id) / vec2(global_size);
  if( global_id.x <= global_size.x && global_id.y <= global_size.y )
    //point filtering
    //imageStore( dst_tex, global_id, sample_point( texcoord, src_size ) );
    //bilinear filtering
    //imageStore( dst_tex, global_id, sample_bilinear( texcoord, src_size ) );
    //bicubic filtering
    imageStore( dst_tex, global_id, sample_bicubic( texcoord, src_size ) );

#5171276 Instanced font rendering (continued)

Posted by Yours3!f on 03 August 2014 - 11:02 AM

hi there,


I extended my font rendering lib so that you can color your characters any way you want them, in any size, highlight each character separately, add underlining, overlining, strikethrough, draw them at any location with correct line heights, gaps, and kerning (some of this is based on freetype-gl: https://github.com/rougier/freetype-gl).
All of this in one draw call, one vao, one texture, one shader, everything instanced, SUPER FAST :D
Licence is MIT, download here, feel free to use: https://github.com/Yours3lf/instanced_font_rendering

It should be super easy to integrate.

Requires GL3.x


best regards,


#5156474 Immediate mode text alternative.

Posted by Yours3!f on 28 May 2014 - 06:31 AM

look at my signature ;)

#5150070 Screenshot of your biggest success/ tech demo

Posted by Yours3!f on 28 April 2014 - 03:31 AM

not in the industry yet, but I've done some stuff in my homebrew engine :)


#5138686 octree view frustum culling

Posted by Yours3!f on 13 March 2014 - 07:49 AM

turns out I messed up somewhere else, and my original implementation worked without having to modify the culling.

I tried it out with these settings:
6.25 million cubes placed in a grid on the x-z plane

camera with 25 degrees vertical fov

near plane = 1

far plane = 100

1000+ fps when using octree + brute force culling, no popping, nothing at all.

the octree culling usually left 150-200 objects visible, and the brute force culling (that I ran on the remaining visible objects) left 50-100, so I'd say it turned out really well.

I'll need to optimize the amount of memory the octree takes, as the application took 700MB, most of which was the octree...

but that's future work.

#5135615 octree view frustum culling

Posted by Yours3!f on 01 March 2014 - 05:08 AM

hi there,


I'm trying to implement view frustum culling using an octree. I'm using aabb vs frustum testing to determine if the given octant is inside the view frustum or not. I used this tutorial to implement it:


it works well for small objects, however as the tutorial says there may be cases where the aabb's vertices are not inside the view frustum, or the view frustum's vertices are not inside the aabb, yet, the DO intersect. And I ran into exactly this problem, some of my octants are getting culled despite being clearly visible.


is there a common solution to this problem?


best regards,


#5116875 Instanced font rendering lib

Posted by Yours3!f on 14 December 2013 - 09:42 AM



I made a font rendering library originally based on Shikoba.


It uses one texture, one vao, 5 vbos, and one draw call to draw text.
A screen full of text performs around 1ms on my PC (A8-4500m apu, 64 bit Xubuntu 12.04, composition disabled). Same performance on Win7 64 bit.


get it here (MIT licence):



easy usage:

//load in the shaders with your method, get_shader() gives you a ref to the shader program
load_shader( font::get().get_shader(), GL_VERTEX_SHADER, "../shaders/font/font.vs" );
load_shader( font::get().get_shader(), GL_FRAGMENT_SHADER, "../shaders/font/font.ps" );

uvec2 screen = uvec2( 1280, 720 );

font_inst instance; //this holds your font type and the corresponding sizes
font::get().resize( screen ); //set screen size
font::get().load_font( "../resources/font.ttf", //where your font is
                       instance, //font will load your font into this instance
                       22 ); //the font size

vec3 color = vec3( 0.5, 0.8, 0.5 ); //rgb [0...1]
uvec2 pos = uvec2( 10, 20 ); //in pixels

std::wstring text = L"hello world\n"; //what to display

while(true) //your ordinary rendering loop
//optionally bind fbo here to render to texture
font::get().add_to_text( instance, text + L"_" ); //feed the font
font::get().render( instance, color, pos );


#5082442 low sample count screen space reflections

Posted by Yours3!f on 02 August 2013 - 05:24 AM


I defined the distance to be 50 units. This is again empirical value, and is probably highly dependent on the scene. This is probably not 5 meters, as I don't really know how much that would be in the real world, but something like that. According to blender the length of the blue curtain is 28 meters, and the length of the vase is 5 meters... I have downloaded the original file from cryengine, and it is like 10x bigger. So based on real world photos I scaled it down, so that the columns are like 2.2m high in Blender. This way 5 units (or meters now?) seemed to be fine.

I'm looking forward to your view space implementation!


I gave it a try to implement it but the view-space version is even worse than the screen-space version what goes for the broad-phase. This I did expect since I looked for screen-space to counter exactly this problem. The narrow-phase though I had to adjust and this one works better. Coverage calculation though is totally horrible and results in punctured geometry worse than before. The marked areas show this problem well.



So I guess the best solution is screen-space but with a modified narrow-phase calculation. Let's see if this works out.


Astonishing is that with the modified narrow-phase in the view-space version coverage actually fades out if samples turn not included in the image. If just the punctured pattern would go away it would be near optimal given the instable nature of the SSR algorithm to begin with.


okay, so you were right saying that they may look really similar. I think that these artifacts could be filtered out by 'checking' some things:
-check if the resulting ss vector is on the screen (fade out by applying some math)
-check if the resulting ray is within search distance (again fade)
-check if the original vs normal and view direction are good (too big angles are baaad, again fade)
-check if the resulting vs normal and reflection vector are good (again big angles are bad, fade)
-also there's one more thing, you should check if the raycast even succeed. I consider it successful, if the binary search is launched.

#5067138 Instance culling theory

Posted by Yours3!f on 03 June 2013 - 11:51 AM



I've implemented instance culling using instance cloud reduction described here:

This technique works by storing the instance positions in a VBO, then using a vertex shader to cull them (basically render them), then using the transform feedback mechanism and a geometry shader only store those in an output vbo that passed the visibility testing. Now this way the geometry shader outputs GL_POINTS (vec4). This way only position is considered.

Now in my app I'd like to use rotation and scaling as well. As far as I know, this can be best described by a matrix. My question is: how can I output these matrices with the geometry shader? Should I output more than one vertices (more EmitVertex();)? How do they get stored in the output vbo?

Plus how would the culling be done? should I just transform the bounding box by this matrix? then check that?


Best regards,


#5018224 need help with shadow mapping

Posted by Yours3!f on 06 January 2013 - 10:17 AM

ok, I think I got it.

I just needed to transform the clip space shadow coords to ndc, then to tex coords.


#version 330 core

layout(binding=0) uniform sampler2D texture0;
uniform vec3 light_pos, spot_dir;
uniform float radius, spot_exponent, spot_cos_cutoff;

in vec3 normal;
in vec3 vs_pos;
in vec4 ls_pos;

out vec4 color;

void main()
	vec3 n = normal;
	vec3 l = light_pos - vs_pos;
	float d = length(l);
	l = normalize(l);
	float att = (1 + d / radius);
	att = 1.0 / ( att * att );
	float n_dot_l = dot(n, l);
	if(n_dot_l > 0)
		vec3 h = 0.5 * (l + normalize(-vs_pos));
		float n_dot_h = max(dot(n, h), 0);
		float spot_effect = dot(-l, spot_dir);
		if(spot_effect > spot_cos_cutoff)
			spot_effect = pow( spot_effect, spot_exponent );
			att = spot_effect / att + 1;
		att -= 1;
		vec4 shadow_coord = ls_pos / ls_pos.w; //convert to ndc
		shadow_coord = shadow_coord * 0.5 + 0.5; //scale, bias to [0...1]
		float depth = texture( texture0, shadow_coord.xy ).x;
		att *= depth < shadow_coord.z - 0.005 ? 0 : 1; //add bias to get rid of shadow acne
		color += vec4(vec3((0.1 + n_dot_l + pow(n_dot_h, 20)) * att), 1);
	color.xyz = pow(color.xyz, vec3(1/2.2));
    //color = vec4(1);

#5017557 I need help understanding ARB_instanced_arrays.

Posted by Yours3!f on 04 January 2013 - 05:27 PM



this allows you to draw 4 models, each with different transformation matrix

glBindVertexArray( vao ); //bind vertex array object that contains all the vbos etc... as usual

glDrawArraysInstanced(GL_TRIANGLES, 0, num_of_polygons * 3, 4);


again I didnt test it.

#5017548 I need help understanding ARB_instanced_arrays.

Posted by Yours3!f on 04 January 2013 - 05:07 PM

GLuint vbo; //vertex buffer object containing the matrices
glGenBuffers( 1, &vbo ); //create the vbo

// I get the feeling that this line is the key. It is all important and it has the least amount of information available.
glBindBuffer(GL_ARRAY_BUFFER, vbo); //bind it

mat4 matrices[4];

fill matrices here...

//fill the vbo with data
glBufferData( GL_ARRAY_BUFFER, sizeof( float ) * 16 * 4, &matrices[0][0], GL_STATIC_DRAW );

//now that the matrices are on the GPU, lets go ahead and use them

//get the position of the vertex attribute to access the vbo
int pos = glGetAttribLocation(shader_instancedarrays.program, "transformmatrix");
int pos1 = pos + 0; // <- According to mhagain, this gets the first row of the matrix specified in "transformmatrix".
int pos2 = pos + 1; // ... second row...
int pos3 = pos + 2; // ... third row...
int pos4 = pos + 3; // ... fourth row
// Now I am really confused. If I understand what mhagain wrote, each of these specify one row of a matrix.
// BUT...
// Each is setting the stride to be 16 floats! Is this a typo? What is happening here? 

//set up the strides so that each vertex attribute location will point to each of the matrices.
//ie. pos1 --> matrices[0]
//pos2 --> matrices[1] etc...
glVertexAttribPointer(pos1, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(0));
glVertexAttribPointer(pos2, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(sizeof(float) * 4));
glVertexAttribPointer(pos3, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(sizeof(float) * 8));
glVertexAttribPointer(pos4, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(sizeof(float) * 12)); 

// This part I get...
glVertexAttribDivisor(pos1, 1);
glVertexAttribDivisor(pos2, 1);
glVertexAttribDivisor(pos3, 1);
glVertexAttribDivisor(pos4, 1);

I think this should work... haven't tested it though.

Specifies the byte offset between consecutive generic vertex attributes. If stride is 0, the generic vertex attributes are understood to be tightly packed in the array. The initial value is 0.
this tells the api how big each data pack (each matrix) is

Specifies an offset of the first component of the first generic vertex attribute in the array in the data store of the buffer currently bound to the GL_ARRAY_BUFFER target. The initial value is 0.
ie this tells the api where each matrix is located in the vbo

#5017312 I need help understanding ARB_instanced_arrays.

Posted by Yours3!f on 03 January 2013 - 06:08 PM

From the lack of views, posts and Google results, I get the feeling this is a taboo topic... Like talking about God in the Lounge... dry.png biggrin.png

I'm pretty sure it's not a taboo topic. The issue might be that this extension is designed to deliver good rendering performance, when rendering the same object like million times, with different transformation matrices. Therefore it is only used in huge projects, where it is needed (ie. games, and there aren't many OpenGL games).

I guess people who needed it got it right over time, and nobody made a decent tutorial.

Despite doing advanced stuff (mostly post-processing) I still haven't done this. Plus usually, when you get there to use this extension you already have a scene graph set up, etc.


But here's one of the extensions you may need:



a tutorial:



plus you have the docs and the specs. Read all these, and I think you'll get it right :)