Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 05 Apr 2006
Offline Last Active Jul 23 2014 02:54 AM

Topics I've Started

Efficient instancing in OpenGL

30 June 2014 - 06:32 AM

The game I'm working on should be able to render dense forests with many trees and detailed foliage. I have been using instancing for drawing pretty much everything, but even so, I have lately hit some performance issues.


My implementation is based on storing instance data in uniforms. I restrict the object transformations so that only translation, uniform scale and rotation along one axis are allowed. For the rotation part, I pass sin(angle) and cos(angle) as uniforms. Thus 6 floats are passed per instance. This way, I can easily draw 256 instances at once by invoking glUniform4fv, glUniform2fv and glDrawElementsInstancedBaseVertex per batch. The particular draw command is used, because I use large VBO:s that store multiple meshes.


Lately I have noticed, that the performance is too low for my purposes. I used gDebugger in an attempt to finding the bottleneck. The FPS count was initially roughly 40. Lowering texture resolution had no effect. Disabling raster commands had negligible effect. Disabling draw commands boosted FPS to over 100. Thus I guess the conclusion is that the excecution is not CPU nor raster operation bound, but has to do with vertex processing.


I'm also using impostors for the trees, and level of detail for the meshes, but I have the feeling that I should be able to draw more instances of the meshed trees than what I'm currently able to. I actually had quite ok FPS of 80 with just the trees in place, but adding the foliage (a lot of instances of small low poly meshes) dropped the FPS to 40. Disabling either the trees or the foliage increases the FPS significantly. Disabling the terrain, which uses a lot of polygons, has no effect, so I think the issue is not being just bound by polygon count.


Could it be that uploading the uniform data is the limiting factor?


For some of the instanced object types, such as the trees, the transformation data is static and is stored in the leaf nodes of a bounding volume hierarchy (BVH) in proper arrays, so that glUniform* can be called without further assembly of data. It would then make sense to actually store these arrays in video memory. What is the best way to do this these days? I think that VBO:s are used in conjuction with glVertexAttribDivisor. To me this does not seem very neat approach, as "vertex attributes" are used for something that are clearly of "uniform" nature. But anyway, I could then make a huge VBO for the entire BVH and store a base instance value and number of instances for each leaf node. To render a leaf node, I would then use glDrawElementsInstancedBaseVertexBaseInstance. This is GL 4.2 core spec. which might be a bit too high. Are there better options? I also have objects (the foliage), for which the transformation data is dynamic (updated occasionally), as they are only placed around the camera. What would be the best way to store/transfer the transformation data in this case?


Thank you in advance.

Gravity: Old school arcade game

08 March 2014 - 08:44 AM

I do a lot of 3D graphics and game programming and have been working on a rather large project for a few years now. Around a week back, however, I started pondering if I could make a simple yet entertaining game in a few evenings. I keenly remember a game called Lunar lander from the past, which I liked a lot. I wanted to make something similar, but with a twist or two.


The result is a game called Gravity, where you fly a little space ship over the surface of a planet, trying to collect valuable diamonds, emeralds and the kind. But it is not made easy: you have to fight gravity and various enemy ships, whose quantity increases as you proceed. Collect all the stars in a level to earn extra score. How high a score can you get?


Below is a picture. However, the graphics are simple on purpose, test the game itself to see if it's any good!





Download the game from here . At the moment binaries are built for Windows and linux (64-bit only).


Gamepads are also supported, but as of yet cannot be configured.


I highly appreciate your comments and suggestions.

Compressed texture array

25 August 2013 - 06:16 AM

I'm trying to create a texture array of DXT1/DXT3/DXT5 compressed images loaded from separate dds files (compressed, with mipmaps). I load the dds file with a function

unsigned char *load_dds(const char *filename, int &w, int &h, int &nmipmaps, GLenum &format, int &size);

where format is GL_COMPRESSED_RGBA_S3TC_DXT*_EXT and size the the total size of the data (including mipmaps). I have tested this for basic 2D textures and it seems to work.


I then create the texture and allocate data for all layers by calling glCompressedTexImage3D with no input data:

GLenum target = GL_TEXTURE_2D_ARRAY;
glCompressedTexImage3D(target, 0, format, w, h, nlayers, 0, size, 0);

This gives "invalid value" error. I'm not sure what size should reflect here. Should it include mipmaps and all layers? Is it allowed here to give NULL data pointer?


After allocating the data, I call for each image i

int mipWidth = w;
int mipHeight = h;
int blockSize = (format == GL_COMPRESSED_RGBA_S3TC_DXT1_EXT) ? 8 : 16;
int offset = 0;

for(int mip = 0; mip < numMipmaps; ++mip){
   int mipSize = ((mipWidth + 3) / 4) * ((mipHeight + 3) / 4) * blockSize;

  glCompressedTexSubImage3D(target, mip, 0, 0, i, mipWidth, mipHeight, 1, format,
                                    mipSize, data + offset);

  mipWidth = max (mipWidth >> 1, 1);
  mipHeight = max (mipHeight >> 1, 1);

  offset += mipSize;

where data is pointer returned by load_dds. This generates error "invalid operation". Is it necessary to call glCompressedTexImage3D prior to glCompressedTexSubImage3D to allocate the memory? What else could be wrong here? I would appreciate any hints.

Summary of best VBO practices

20 January 2013 - 04:09 AM

I have read several articles about using VBO:s and VAO:s on today's hardware, but due to mixed information, I'm not sure what the bottom line is. For example http://www.opengl.org/wiki/VBO_-_more is instructive, but I feel that someone could have second opinions on these matters.


In my game, I usually have models with approximately 300-1000 polygons and 1-3 textures. Each model also has a few lower detail versions of them, which don't use the same vertex data, but usually do use the same textures. Additionally, some models (e.g. player) can have 20k polygons and maybe 5 textures.


In my current implementation, a mesh is divided into groups by texture. Each group has VBO:s for vertex positions, normals, texture coordinates etc. (each has its own VBO) and then an index buffer. After culling, I have a bunch of mesh groups, which I then first sort by texture and then by VBO and draw by glDrawElements. The vertex data is always static and all animating is done in shaders.


A few questions:



If a group has, say 4000 polygons (which could happen for the player models), is it advisable to call glDrawElements once for the whole chunk, or should I cut it into pieces? I read about some "cache pressure" kicking in with large chunks, but I don't understand what it means. I have experienced some hick ups with 20k models, but just dividing the draw calls didn't seem to help.



Should I put all vertex data to a single VBO? For each group or the whole mesh? I read that 1 - 4 MiB buffer is preferred on some hardware, so should I go further and implement some sort of general VBO allocator, so that the data of several meshes are pushed there? For 4 MiB VBO and 1000 polygon meshes, I can see how this would reduce bindings, if all meshes use the same types of attributes.



Should I use interleaved arrays? I remember that there has been some controversy with this issue. Probably depends on local coherence of vertex / index data.



Any other considerations?

Layering animations: interpolation

30 December 2012 - 11:04 AM

I wish to animate a human character by layering e.g. walking and shooting animations. At this point, I have implemented simple MD5 mesh animation system.


I suppose I will add masks for each animation to select the bones that they should affect. I would like that the upper-body animation (say shooting) could be reused for different character stances, which would be described by lower-body animations. The lower-body should control the root bone to move the whole character and the upper body should always follow.


I then see a problem in interpolation: thus far, I have first constructed two keyframes and then interpolated the final positions vectors and orientation quaternions of the bones. If I apply this to the two animations separately and then combine the animations, the upper body can appear incorrectly as the root bone is only affected by the lower-body animation.


One solution might be to interpolate the bones in local spaces for the two animations and then combine the animations. The parent multiplication would be performed during the combination procedure. Then the upper body would be transformed by a correctly transformed root bone. These operations probably don't commute, so it is not clear that this will lead to good results. How is this usually done?