Advertisement Jump to content
Sign in to follow this  

Webgl: Is It Possible To Emulate Baseinstance?

This topic is 914 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'd like to put all the instance data with the same vertex format into the same buffers. Mainly to save on vertexAttribPointer calls (because they are very slow). However, I can't render all of those instances in one draw call (because of transparency...) so I'd need a way to start drawing with a certain instance offset, so when I draw 1 instance (because I can't batch more) from the middle of the buffer, it pulls (e.g.) the correct matrix from the accompanying vertex buffer that has them. Desktop GL has the BaseInstance versions of all instanced draw calls, but they don't exist in WebGL. Is it possible to emulate them somehow?


Optional details:


I'm optimizing a 2D renderer and I'm mainly looking to optimize how quads are drawn (since they are almost 100% of our rendering load... quads for text, quads for sprites, quads for tiles, etc.).


I already render everything from the same buffer, so that goal is already reached. But I would like to reduce the work that is necessary to prepare all of those buffers, since it turned out to be a bottleneck (Javascript is slow).


I batch it by duplicating everything that would be per-instance data (e.g. transform data, colors, etc.). Of course because I'm not instancing yet so I also have to duplicate the actual quad vertex data for each sprite, even though it's the same (they all use a basic 1x1 quad set of vertices and all stretching and sizing is done using the transformation matrix).


Using instancing would not only allow me to use less memory in total, it would also allow me to skip a whole lot of copying around of memory each frame. When the order of quads changes, right now I need to copy 4x (4 byte color, 8 byte UVs, 16 byte transform) for each rendered quad (112 bytes). I need to transfer positions too, of course, but that can be done once because they are all the same and never change. 


With instancing, all instances would use the same position stream (just 6 elements) and to simulate instancing for UVs (which are different for each quad), I would add another 1 byte 'vertex ID stream' (i.e. also one buffer with 6 elements) and upload the 8 byte of UVs as instance data in a matrix or something, and use the vertex ID stream to index that UV data. So per quad I would only have to handle 1x(16 byte transform, 8 byte UVs, 4 byte color) each frame (= 28 byte, vs previously 116). What I currently don't do is separate all of this stuff out into different streams, so I only need to update the ones that actually change (i.e. only upload matrices each frame), for several reasons which I don't need to get into... but anyways, if I do that as well the ratio of improvement should be the same.


(I think, not sure because I haven't implemented it) I could achieve all of the above using emulated software instancing, but then I would have to go through various hassles because the WebGL API is quite limited (so I would have to use 'data textures', a custom instance id stream etc.) so it would be neater if there was a way to just emulate base instance somehow.

Share this post

Link to post
Share on other sites

You might be interested in:


You'll need to send uniforms more often (each time the location changes) but that will give you exactly the same result (maybe slower than if instanced were able to be used).


There's no WebGL equivalent available, unfortunately. However I'm not sure this would solve my problem anyways. Because I would have the problem that the shared instance data would always be at base vertex 0 (the position + vertex ID stream I mentioned), while only the per-instance data in other buffers would be at a different baseVertex. That's why desktop GL has baseInstance: So the used buffers which don't have any vertexAttribDivisor set still being 'from the start' and only the vertexAttribDivisor ones are offset by the given baseInstance. Am I incorrect?

Share this post

Link to post
Share on other sites

Then, you'll need to completly emulate the thing. The idea is that you loop over all your instances, incrementing an instance_id, send it to the shaders, then call to glDrawElements.


You can see this for example:


The thing is that you'll need to upload to the shaders the instance_id for each draw call. So, it might be more efficient to do it in a different way.


Also, if you have VAO (, then I highly suggest you to use them. Sending pointers to GL will be far more efficient, since you'll do them only once (at the VAO creation). It might also be possible to use both VAO and the kind of BaseInstance emulation by using dynamic VBO and only update the required part of the buffer (the attrib array that might vary).


Hope that helps.

Share this post

Link to post
Share on other sites

After some investigation I realized that vertex shader texture fetch isn't possible in WebGL (probably because it's slow on mobile) so emulating instancing would be pretty painful (since I'd have to use the limited amount of uniform slots for all the instanced data that I have...).


I can't wait to get back to desktop graphics and proper APIs again. 


BTW just to be clear, I'm already simulating instancing by just flattening/duplicating a lot of data and rendering everything in one draw call. This way I can skip ahead to the 'first instance' just by using the offset parameters in drawArrays and drawElements. It's the CPU side copying around of data in typed arrays that I'm trying to optimize (the actual transfer via bufferSubData is fast, I don't need to worry about that for now). Doing it this way is a ton faster than trying to use VAOs and state sorting or what not to try and make draw calls faster (profiled this a lot).


I think I might try keeping all of the vertex data for meshes in fixed array locations (so I don't need to update the CPU side buffers at all unless an object is 'dirty') and instead generate only the index buffer every frame... but I remember someone heavily discouraging this for some reason. 

Edited by agleed

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!