Jump to content

  • Log In with Google      Sign In   
  • Create Account

Webgl: Is It Possible To Emulate Baseinstance?


Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.


  • You cannot reply to this topic
4 replies to this topic

#1   Members   

920
Like
0Likes
Like

Posted 14 July 2016 - 07:07 AM

I'd like to put all the instance data with the same vertex format into the same buffers. Mainly to save on vertexAttribPointer calls (because they are very slow). However, I can't render all of those instances in one draw call (because of transparency...) so I'd need a way to start drawing with a certain instance offset, so when I draw 1 instance (because I can't batch more) from the middle of the buffer, it pulls (e.g.) the correct matrix from the accompanying vertex buffer that has them. Desktop GL has the BaseInstance versions of all instanced draw calls, but they don't exist in WebGL. Is it possible to emulate them somehow?

 

Optional details:

 

I'm optimizing a 2D renderer and I'm mainly looking to optimize how quads are drawn (since they are almost 100% of our rendering load... quads for text, quads for sprites, quads for tiles, etc.).

 

I already render everything from the same buffer, so that goal is already reached. But I would like to reduce the work that is necessary to prepare all of those buffers, since it turned out to be a bottleneck (Javascript is slow).

 

I batch it by duplicating everything that would be per-instance data (e.g. transform data, colors, etc.). Of course because I'm not instancing yet so I also have to duplicate the actual quad vertex data for each sprite, even though it's the same (they all use a basic 1x1 quad set of vertices and all stretching and sizing is done using the transformation matrix).

 

Using instancing would not only allow me to use less memory in total, it would also allow me to skip a whole lot of copying around of memory each frame. When the order of quads changes, right now I need to copy 4x (4 byte color, 8 byte UVs, 16 byte transform) for each rendered quad (112 bytes). I need to transfer positions too, of course, but that can be done once because they are all the same and never change. 

 

With instancing, all instances would use the same position stream (just 6 elements) and to simulate instancing for UVs (which are different for each quad), I would add another 1 byte 'vertex ID stream' (i.e. also one buffer with 6 elements) and upload the 8 byte of UVs as instance data in a matrix or something, and use the vertex ID stream to index that UV data. So per quad I would only have to handle 1x(16 byte transform, 8 byte UVs, 4 byte color) each frame (= 28 byte, vs previously 116). What I currently don't do is separate all of this stuff out into different streams, so I only need to update the ones that actually change (i.e. only upload matrices each frame), for several reasons which I don't need to get into... but anyways, if I do that as well the ratio of improvement should be the same.

 

(I think, not sure because I haven't implemented it) I could achieve all of the above using emulated software instancing, but then I would have to go through various hassles because the WebGL API is quite limited (so I would have to use 'data textures', a custom instance id stream etc.) so it would be neater if there was a way to just emulate base instance somehow.



#2   Members   

579
Like
0Likes
Like

Posted 14 July 2016 - 07:26 AM

You might be interested in: https://www.khronos.org/registry/gles/extensions/OES/OES_draw_elements_base_vertex.txt

 

You'll need to send uniforms more often (each time the location changes) but that will give you exactly the same result (maybe slower than if instanced were able to be used).


Edited by _Silence_, 14 July 2016 - 07:27 AM.


#3   Members   

920
Like
0Likes
Like

Posted 14 July 2016 - 07:49 AM

You might be interested in: https://www.khronos.org/registry/gles/extensions/OES/OES_draw_elements_base_vertex.txt

 

You'll need to send uniforms more often (each time the location changes) but that will give you exactly the same result (maybe slower than if instanced were able to be used).

 

There's no WebGL equivalent available, unfortunately. However I'm not sure this would solve my problem anyways. Because I would have the problem that the shared instance data would always be at base vertex 0 (the position + vertex ID stream I mentioned), while only the per-instance data in other buffers would be at a different baseVertex. That's why desktop GL has baseInstance: So the used buffers which don't have any vertexAttribDivisor set still being 'from the start' and only the vertexAttribDivisor ones are offset by the given baseInstance. Am I incorrect?



#4   Members   

579
Like
0Likes
Like

Posted 21 July 2016 - 12:27 AM

Then, you'll need to completly emulate the thing. The idea is that you loop over all your instances, incrementing an instance_id, send it to the shaders, then call to glDrawElements.

 

You can see this for example: https://www.opengl.org/sdk/docs/man/html/glDrawElementsInstancedBaseInstance.xhtml

 

The thing is that you'll need to upload to the shaders the instance_id for each draw call. So, it might be more efficient to do it in a different way.

 

Also, if you have VAO (https://www.khronos.org/registry/webgl/extensions/OES_vertex_array_object/), then I highly suggest you to use them. Sending pointers to GL will be far more efficient, since you'll do them only once (at the VAO creation). It might also be possible to use both VAO and the kind of BaseInstance emulation by using dynamic VBO and only update the required part of the buffer (the attrib array that might vary).

 

Hope that helps.



#5   Members   

920
Like
1Likes
Like

Posted 23 July 2016 - 01:21 PM

After some investigation I realized that vertex shader texture fetch isn't possible in WebGL (probably because it's slow on mobile) so emulating instancing would be pretty painful (since I'd have to use the limited amount of uniform slots for all the instanced data that I have...).

 

I can't wait to get back to desktop graphics and proper APIs again. 

 

BTW just to be clear, I'm already simulating instancing by just flattening/duplicating a lot of data and rendering everything in one draw call. This way I can skip ahead to the 'first instance' just by using the offset parameters in drawArrays and drawElements. It's the CPU side copying around of data in typed arrays that I'm trying to optimize (the actual transfer via bufferSubData is fast, I don't need to worry about that for now). Doing it this way is a ton faster than trying to use VAOs and state sorting or what not to try and make draw calls faster (profiled this a lot).

 

I think I might try keeping all of the vertex data for meshes in fixed array locations (so I don't need to update the CPU side buffers at all unless an object is 'dirty') and instead generate only the index buffer every frame... but I remember someone heavily discouraging this for some reason. 


Edited by agleed, 23 July 2016 - 01:22 PM.





Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.