Hi everyone!
I look forward to implement the technique described in this Article, which looks fine by itself, but I got some questions about the implementation details.
First of all, the general idea seems to be that you have one big Vertex- and one big Indexbuffer to work with. You then put every mesh you want to be rendered in there and store the offsets and index-counts in an other datastructure which goes together with the instance-data into another buffer.
Then all you need to do is to issue a call to something like DrawInstanced, with the maximum amount of indices a mesh in the buffer has, and walk the instance-data buffer to get the actual vertexdata from the buffers.
If the mesh uses less indices as we told the Draw-Call, it says one should just use degenerate triangles and keep an eye on the vertexcounts.
Now, the article gives us a scenario about rendering a forest, with different types of trees and LOD-levels.
- #1: Why even bother with LODs, when we draw everything with the same vertex/index-count anyways?
- Idea: Use multiple instance-buffers with different ranges of vertex/index-counts and use more DrawCalls instead of wasting time on drawing overhead vertices on simple LOD-levels.
Next problem is about the updating of the instance-buffer. Since of course we want some frustum-culling or moving objects if we are drawing a huge forest, we would need to do that every frame. The Article suggests that one should keep a CPU-copy of the data in the buffer and if something changes, just copy everything over again.
- #2: Wouldn't that take a huge impact on performance if we have to copy thousands of matrices to the GPU every frame? Also I'm pretty sure you would hit a GPU-sync point when doing this the naive way.
- Idea: I haven't looked to deep into them yet, but couldn't you update a single portion of the buffer by using a compute-shader or just do the full frustum-culling on the GPU? If not, there are those Map-Modes (other than WRITE_DISCARD) worth a shot where the data stays to update only single objects? Or do I just throw this into an other thread, use doublebuffering to help with sync-points and forget about it?
The last question is regarding textures. I assume that in the article the textures are all of the same size, which makes it easy to put them all into a TextureArray, as the Author is doing at least.
- #3: But I don't know much about the textures I have to work with, other than that they are all sized by a power of two. I'm using D3D11 at the moment, so TextureArrays is as far as I would get. Next problem is, that my textures can be dynamically streamed in and out.
- Idea: Make texture-arrays of different sizes and assume how many slots we would need for the given size. For example, pre-allocate a TextureArray with 100 Slots of the size 1024² and if we ever break that boundry or a texture gets cached out, allocate more/less and copy the old one over. Slow, but would work. Then use the shaders registers for the different arrays to get access to them.
- The other thing I could do is to allow this kind of rendering technique only for static level-geometry and to try to keep the textures for them in memory the whole time.
Does anyone maybe have better solutions/ideas to the problems than me or can give me some other useful input about this technique?
Thanks in advance!