Sign in to follow this  
wolfscaptain

OpenGL Reduce memory output of 2D tile map

Recommended Posts

I made a normal 2D tile map code in WebGL some time ago, having 6 vertices consisting of positions and texture coordinates per tile.

The texture coordinates pointed to a specific texture in a dynamically generated texture atlas.

And so, every tile took 96 bytes (12 2D vectors).

This weekend I finally tried instanced rendering in C++, and together with a texture buffer and a texture array, I could render any amount of tiles using merely one byte (index to the texture) per tile, which made the code quite fast, and very small in memory output.

So, I am now thinkig of ways to reduce bandwidth also for pre OpenGL 3 code.

What I have in mind is having a constant matrix in the vertex shader that will hold the four vertex positions that make a unit quad.

Each vertex sent to the shaders, then, will have the tile number it belongs to (32 bits uint), the vertex number in the tile (0, 1, 2 or 3), and the texture number.

The correct vertex position will be selected from the constant matrix;
vec2 position = positiohMatrix[vertexNumber];
With this position two things happen: first, the texture coordinate is set to it, with inversed y value (1 - position.y), and then the tile number is used to move the vertex to its correct place, assuming each tile is one unit, and using a uniform to tell how many tils there are per row.

The texture index and texture coordinate will then move on to the fragment shader.

Here, the correct texture coordinate will be generated using uniforms that tell the size of each texture in the atlas, and then it's a simple texture lookup.

While this should work, each tile is still quite big - 36 bytes (6 per vertex).

I am wondering if someone has better ideas to reduce the memory output.

Thanks :)

Share this post


Link to post
Share on other sites
You're going to quickly run out of uniforms if you take that approach (plus possible lack of uint support on older hardware will kill you).

The easiest solution is just to check for the instancing extension and use it if present. NVIDIA, for example, will make more recent functionality available on older hardware via extensions if the hardware is capable of using it, and continue to support parts back to the GeForce 6 line. So even if you've just got GL2.1 you may very well have instancing available - it was originally a SM3 feature in D3D9 (where the feature originally derived from) so it should be available on a relatively wide range of pre-GL3 hardware. Not sure how this applies to WebGL vs desktop GL though, or even if there is a difference.

In cases where it's not you can abuse immediate mode calls to do some pseudo-instancing. Just put a few glVertexAttrib4f calls before you draw and use those for data that's going to be common to everything. In general though that's more suited to more complex geometry and I've read bad things about it on AMD hardware (e.g. [url="http://sol.gfxile.net/instancing.html"]http://sol.gfxile.net/instancing.html[/url]).

The other option is to just accept the increased memory usage. You could be doing an awful lot of work to reduce memory usage which could end up being counterproductive - the overhead of the extra work could more than outweigh any gains from reduced memory usage (which are really going to be quite small anyway - memory usage is typically not a bottleneck any more and hasn't been for quite some time) - the real gain from using instancing comes from reduced draw calls, not reduced memory usage, so make sure that your profiling is giving you the right info (and that you're interpreting it correctly) before proceeding with any drastic surgery.

Share this post


Link to post
Share on other sites
I wasn't sure if I should write it before, but this isn't really about practicality (where I would rather work on a game than wasting time on optimizations), but rather an experiment.

I did get about 30% FPS boost when testing one draw call with 96 bytes against instancing and using one byte, but we all know how much actual speed in milliseconds that usually turns into (close to nothing). Edited by wolfscaptain

Share this post


Link to post
Share on other sites
Ah, well experimentation is good. Always positive stuff about an opportunity to learn with the freedom to make mistakes while doing so. :)

My own opinion is that even a sub-millisecond speed boost can be worthwhile if it makes the difference between hitting or missing a vsync interval, so it's definitely worth shooting for.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this