Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 04 Feb 2011
Offline Last Active Sep 21 2014 05:26 AM

#5171276 Instanced font rendering (continued)

Posted by Yours3!f on 03 August 2014 - 11:02 AM

hi there,


I extended my font rendering lib so that you can color your characters any way you want them, in any size, highlight each character separately, add underlining, overlining, strikethrough, draw them at any location with correct line heights, gaps, and kerning (some of this is based on freetype-gl: https://github.com/rougier/freetype-gl).
All of this in one draw call, one vao, one texture, one shader, everything instanced, SUPER FAST :D
Licence is MIT, download here, feel free to use: https://github.com/Yours3lf/instanced_font_rendering

It should be super easy to integrate.

Requires GL3.x


best regards,


#5156474 Immediate mode text alternative.

Posted by Yours3!f on 28 May 2014 - 06:31 AM

look at my signature ;)

#5150070 Screenshot of your biggest success/ tech demo

Posted by Yours3!f on 28 April 2014 - 03:31 AM

not in the industry yet, but I've done some stuff in my homebrew engine :)


#5138686 octree view frustum culling

Posted by Yours3!f on 13 March 2014 - 07:49 AM

turns out I messed up somewhere else, and my original implementation worked without having to modify the culling.

I tried it out with these settings:
6.25 million cubes placed in a grid on the x-z plane

camera with 25 degrees vertical fov

near plane = 1

far plane = 100

1000+ fps when using octree + brute force culling, no popping, nothing at all.

the octree culling usually left 150-200 objects visible, and the brute force culling (that I ran on the remaining visible objects) left 50-100, so I'd say it turned out really well.

I'll need to optimize the amount of memory the octree takes, as the application took 700MB, most of which was the octree...

but that's future work.

#5135615 octree view frustum culling

Posted by Yours3!f on 01 March 2014 - 05:08 AM

hi there,


I'm trying to implement view frustum culling using an octree. I'm using aabb vs frustum testing to determine if the given octant is inside the view frustum or not. I used this tutorial to implement it:


it works well for small objects, however as the tutorial says there may be cases where the aabb's vertices are not inside the view frustum, or the view frustum's vertices are not inside the aabb, yet, the DO intersect. And I ran into exactly this problem, some of my octants are getting culled despite being clearly visible.


is there a common solution to this problem?


best regards,


#5116875 Instanced font rendering lib

Posted by Yours3!f on 14 December 2013 - 09:42 AM



I made a font rendering library originally based on Shikoba.


It uses one texture, one vao, 5 vbos, and one draw call to draw text.
A screen full of text performs around 1ms on my PC (A8-4500m apu, 64 bit Xubuntu 12.04, composition disabled). Same performance on Win7 64 bit.


get it here (MIT licence):



easy usage:

//load in the shaders with your method, get_shader() gives you a ref to the shader program
load_shader( font::get().get_shader(), GL_VERTEX_SHADER, "../shaders/font/font.vs" );
load_shader( font::get().get_shader(), GL_FRAGMENT_SHADER, "../shaders/font/font.ps" );

uvec2 screen = uvec2( 1280, 720 );

font_inst instance; //this holds your font type and the corresponding sizes
font::get().resize( screen ); //set screen size
font::get().load_font( "../resources/font.ttf", //where your font is
                       instance, //font will load your font into this instance
                       22 ); //the font size

vec3 color = vec3( 0.5, 0.8, 0.5 ); //rgb [0...1]
uvec2 pos = uvec2( 10, 20 ); //in pixels

std::wstring text = L"hello world\n"; //what to display

while(true) //your ordinary rendering loop
//optionally bind fbo here to render to texture
font::get().add_to_text( instance, text + L"_" ); //feed the font
font::get().render( instance, color, pos );


#5082442 low sample count screen space reflections

Posted by Yours3!f on 02 August 2013 - 05:24 AM


I defined the distance to be 50 units. This is again empirical value, and is probably highly dependent on the scene. This is probably not 5 meters, as I don't really know how much that would be in the real world, but something like that. According to blender the length of the blue curtain is 28 meters, and the length of the vase is 5 meters... I have downloaded the original file from cryengine, and it is like 10x bigger. So based on real world photos I scaled it down, so that the columns are like 2.2m high in Blender. This way 5 units (or meters now?) seemed to be fine.

I'm looking forward to your view space implementation!


I gave it a try to implement it but the view-space version is even worse than the screen-space version what goes for the broad-phase. This I did expect since I looked for screen-space to counter exactly this problem. The narrow-phase though I had to adjust and this one works better. Coverage calculation though is totally horrible and results in punctured geometry worse than before. The marked areas show this problem well.



So I guess the best solution is screen-space but with a modified narrow-phase calculation. Let's see if this works out.


Astonishing is that with the modified narrow-phase in the view-space version coverage actually fades out if samples turn not included in the image. If just the punctured pattern would go away it would be near optimal given the instable nature of the SSR algorithm to begin with.


okay, so you were right saying that they may look really similar. I think that these artifacts could be filtered out by 'checking' some things:
-check if the resulting ss vector is on the screen (fade out by applying some math)
-check if the resulting ray is within search distance (again fade)
-check if the original vs normal and view direction are good (too big angles are baaad, again fade)
-check if the resulting vs normal and reflection vector are good (again big angles are bad, fade)
-also there's one more thing, you should check if the raycast even succeed. I consider it successful, if the binary search is launched.

#5067138 Instance culling theory

Posted by Yours3!f on 03 June 2013 - 11:51 AM



I've implemented instance culling using instance cloud reduction described here:

This technique works by storing the instance positions in a VBO, then using a vertex shader to cull them (basically render them), then using the transform feedback mechanism and a geometry shader only store those in an output vbo that passed the visibility testing. Now this way the geometry shader outputs GL_POINTS (vec4). This way only position is considered.

Now in my app I'd like to use rotation and scaling as well. As far as I know, this can be best described by a matrix. My question is: how can I output these matrices with the geometry shader? Should I output more than one vertices (more EmitVertex();)? How do they get stored in the output vbo?

Plus how would the culling be done? should I just transform the bounding box by this matrix? then check that?


Best regards,


#5018224 need help with shadow mapping

Posted by Yours3!f on 06 January 2013 - 10:17 AM

ok, I think I got it.

I just needed to transform the clip space shadow coords to ndc, then to tex coords.


#version 330 core

layout(binding=0) uniform sampler2D texture0;
uniform vec3 light_pos, spot_dir;
uniform float radius, spot_exponent, spot_cos_cutoff;

in vec3 normal;
in vec3 vs_pos;
in vec4 ls_pos;

out vec4 color;

void main()
	vec3 n = normal;
	vec3 l = light_pos - vs_pos;
	float d = length(l);
	l = normalize(l);
	float att = (1 + d / radius);
	att = 1.0 / ( att * att );
	float n_dot_l = dot(n, l);
	if(n_dot_l > 0)
		vec3 h = 0.5 * (l + normalize(-vs_pos));
		float n_dot_h = max(dot(n, h), 0);
		float spot_effect = dot(-l, spot_dir);
		if(spot_effect > spot_cos_cutoff)
			spot_effect = pow( spot_effect, spot_exponent );
			att = spot_effect / att + 1;
		att -= 1;
		vec4 shadow_coord = ls_pos / ls_pos.w; //convert to ndc
		shadow_coord = shadow_coord * 0.5 + 0.5; //scale, bias to [0...1]
		float depth = texture( texture0, shadow_coord.xy ).x;
		att *= depth < shadow_coord.z - 0.005 ? 0 : 1; //add bias to get rid of shadow acne
		color += vec4(vec3((0.1 + n_dot_l + pow(n_dot_h, 20)) * att), 1);
	color.xyz = pow(color.xyz, vec3(1/2.2));
    //color = vec4(1);

#5017557 I need help understanding ARB_instanced_arrays.

Posted by Yours3!f on 04 January 2013 - 05:27 PM



this allows you to draw 4 models, each with different transformation matrix

glBindVertexArray( vao ); //bind vertex array object that contains all the vbos etc... as usual

glDrawArraysInstanced(GL_TRIANGLES, 0, num_of_polygons * 3, 4);


again I didnt test it.

#5017548 I need help understanding ARB_instanced_arrays.

Posted by Yours3!f on 04 January 2013 - 05:07 PM

GLuint vbo; //vertex buffer object containing the matrices
glGenBuffers( 1, &vbo ); //create the vbo

// I get the feeling that this line is the key. It is all important and it has the least amount of information available.
glBindBuffer(GL_ARRAY_BUFFER, vbo); //bind it

mat4 matrices[4];

fill matrices here...

//fill the vbo with data
glBufferData( GL_ARRAY_BUFFER, sizeof( float ) * 16 * 4, &matrices[0][0], GL_STATIC_DRAW );

//now that the matrices are on the GPU, lets go ahead and use them

//get the position of the vertex attribute to access the vbo
int pos = glGetAttribLocation(shader_instancedarrays.program, "transformmatrix");
int pos1 = pos + 0; // <- According to mhagain, this gets the first row of the matrix specified in "transformmatrix".
int pos2 = pos + 1; // ... second row...
int pos3 = pos + 2; // ... third row...
int pos4 = pos + 3; // ... fourth row
// Now I am really confused. If I understand what mhagain wrote, each of these specify one row of a matrix.
// BUT...
// Each is setting the stride to be 16 floats! Is this a typo? What is happening here? 

//set up the strides so that each vertex attribute location will point to each of the matrices.
//ie. pos1 --> matrices[0]
//pos2 --> matrices[1] etc...
glVertexAttribPointer(pos1, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(0));
glVertexAttribPointer(pos2, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(sizeof(float) * 4));
glVertexAttribPointer(pos3, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(sizeof(float) * 8));
glVertexAttribPointer(pos4, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat) * 4 * 4, (void*)(sizeof(float) * 12)); 

// This part I get...
glVertexAttribDivisor(pos1, 1);
glVertexAttribDivisor(pos2, 1);
glVertexAttribDivisor(pos3, 1);
glVertexAttribDivisor(pos4, 1);

I think this should work... haven't tested it though.

Specifies the byte offset between consecutive generic vertex attributes. If stride is 0, the generic vertex attributes are understood to be tightly packed in the array. The initial value is 0.
this tells the api how big each data pack (each matrix) is

Specifies an offset of the first component of the first generic vertex attribute in the array in the data store of the buffer currently bound to the GL_ARRAY_BUFFER target. The initial value is 0.
ie this tells the api where each matrix is located in the vbo

#5017312 I need help understanding ARB_instanced_arrays.

Posted by Yours3!f on 03 January 2013 - 06:08 PM

From the lack of views, posts and Google results, I get the feeling this is a taboo topic... Like talking about God in the Lounge... dry.png biggrin.png

I'm pretty sure it's not a taboo topic. The issue might be that this extension is designed to deliver good rendering performance, when rendering the same object like million times, with different transformation matrices. Therefore it is only used in huge projects, where it is needed (ie. games, and there aren't many OpenGL games).

I guess people who needed it got it right over time, and nobody made a decent tutorial.

Despite doing advanced stuff (mostly post-processing) I still haven't done this. Plus usually, when you get there to use this extension you already have a scene graph set up, etc.


But here's one of the extensions you may need:



a tutorial:



plus you have the docs and the specs. Read all these, and I think you'll get it right :)

#5014678 Porting to OpenGL

Posted by Yours3!f on 27 December 2012 - 06:16 AM

Hello people.


I have a Direct3D 9.0 engine and I want to add an OpenGL renderer.

The engine uses deferred lighting with Shader model 3.0 and unfortunately isn't written in abstraction level in mind. I mean, it uses DirectX types, objects and D3DX calls all over the code. 

My questions are :

1. What OpenGL version I should implement ? OpenGL 2.x , 3.x, 4.0 ? I need a smooth transition and easy port, that matches Direct3D 9.0 functionality well.

 2. Should I use glew, glut, SDL , other stuff like that ? I will most probably want to port it to other platforms like Linux, Mac ? I don't want to mess with extensions so probably glew will be of help.

3. Should I use nVidia CG shader language in order to use my existing HLSL shader code directly, without rewriting them to GLSL ?

4. Is there a good sample source code, tutorials, articles that show basic deferred rendering with a modern  OpenGL way  ( version 3 ++) ?

5. I'm using D3DX library for common math stuff etc, is there a library for OpenGL that is to it, like D3DX is to Direct3D ?


I have a limited OpenGL experience from the past.


Thank you.

More questions to come.




opengl 2.1 matches dx9 functionality.

yes you should use glew and sdl (or sfml, if you prefer c++-ish solutions), but these are not mandatory, you can write your own.

well cg is another language right? so you would have to port to cg then... the question is, do you want to learn glsl or not?

see: http://ogldev.atspace.co.uk/

but there are tons of others. google is your friend.

well for mesh loading and stuff related to that, you have assimp

for texture loading, you can do this with sdl / sfml. Just load in an image using sdl / sfml, acquire the pointer to the raw data, and fill a texture using that data.

for maths, you have tons of choices. The most popular ones are CML (http://cmldev.net) and GLM (http://glm.g-truc.net/).

I have also developed a math library: https://github.com/Yours3lf/libmymath


if you need anythin else, just go to the topics (http://www.gamedev.net/forum/25-opengl/) look at the right side of the page -->

and there you have it, tons of links.

#5011208 Bottom Row of 4x4 Matrix

Posted by Yours3!f on 16 December 2012 - 03:03 AM

well first take a look at this:
(the other pages are very useful too!!! so look around)

I'm not really sure if A is used at all, I did this a long time ago, but B is supposed to be the perspective divide element.
Or at least B and the projection matrix modify the 4th element of the position, so that in clip space if you divide by it, you get the ndc coordinates.
like when you do the transformations:
-Model (world) space (apply T here):
model_space_pos = model_mat * vertex
-View (camera) space (apply R here):
view_space_pos = view_mat * model_space_pos
-Clip space (apply projection matrix here):
clip_space_pos = projection_mat * view_space_pos
-normalized device coordinates (do the perspective divide, apply B here):
ndc_pos = clip_space_pos.xyz / clip_space_pos.w
-viewport coordinates (scale & bias to get texture coordinates, scale by window coordinates)
viewport_pos = (ndc_pos.xy * 0.5 + 0.5) * vec2( screen_width, screen_height )

and to the other questions:
you would apply the 4x4 matrix to a 4 component vector (ie vec4)
and you would invert A and B by inverting the 4x4 matrix.

#5007062 is gdebugger wrong or do I really have 300gb in renderbuffers?

Posted by Yours3!f on 04 December 2012 - 07:18 AM


I get false values too, I use a 1024x1024 RGBA8 cubemap texture. This should take:
1024 (width) x 1024 (height) x 6 (sides) x 4 (bytes per pixel) ~ 25 MB
instead it says 148 MB... So it is definitely wrong.
As far as I've seen it, it report correct values for 2D textures, but false ones for pretty much everything else.