Instancing worth it for rendering text in a 2D game?

Started by
17 comments, last by Eric F. 7 years, 4 months ago

Hey guys,

I have a DX11 text rendering implementation where I have a bitmap with all the character glyphs that I use to build a list of quads with the position and UV of each glyph corresponding to the string I need to print. I then use one call to draw the text into a texture and can draw that texture wherever I need. Simple functions, simple shaders. I can also easily parse for special character, do linefeeds, limit the width for text boxes, change color, etc.

Today I sat down and decided to tackle the long overdue task of converting the rendering to use instancing, but it just hit me that it might not provide the great gains I previously thought it would. Most characters are of different sizes, so I would either have to handle resizing the geometry or maybe update the bitmap and then draw a fixed size quand and then reposition it based on the real width of the character. Or something else. This seems to break the K.I.S.S. principle.

Anyways, in the case where I do not issue more than 1 draw call to render the text, would there be any real benefits of using instancing to render text?

Thanks!

Advertisement

A quick test to check if any optimizations are needed - just fill the whole screen with text, and see if there is a significant slowdown.

(Just to answer you question, instancing shouldn't be needed).

Instancing does not perform well for meshes with small polygon counts -- such as a single quad. You're better off not using instancing for rendering a list of quads.

You can minimize data by only sending the quad center position X/Y coordinate and the width/height (instead of four x/y coordinates) along with a special VS that manually reads the vertex attributes from a SRV (instead of using the IA to read them automatically), or alternatively you can use the GS to convert a single vertex into four.

There's was a wonderful presentation on just that (vertex shader tricks, instancing, etc...), the slides can be found here: http://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau.

There's actually a much easier way to do it. When you have your final scene ready to present, you can get the Win32 device context - apologies, but it's been awhile since I used DX 11 and I can't remember the specifics. A little perusal of the various interfaces and you'll find it; there's a method to GetDC. You can get the device context (DC) from DirectX, pass that to any Windows GDI calls desired to use standard font drawing stuff - sizing, colors, you name it - then just present the scene as normal. It saves a lot of problems.

You're much better off using D2D/DirectWrite for D3D text rendering than GDI via GetDC.

In addition to the above advice, you're almost certainly not drawing enough text for it to be a bottleneck worth addressing.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

A quick test to check if any optimizations are needed - just fill the whole screen with text, and see if there is a significant slowdown.

Great idea, I'll definitely do that.

In addition to the above advice, you're almost certainly not drawing enough text for it to be a bottleneck worth addressing.

Yeah, you're probably right. Besides, rendering the text in a texture would incur the rendering cost of parsing that text only once. It's probably cheaper to then just render the text texture until it's not needed anymore. I'll check that too.

@Ryan_001, thanks a lot for the link to that presentation. I found the video of that presentation on the GDC website: http://www.gdcvault.com/play/1020624/Advanced-Visual-Effects-with-DirectX along with many other great ones too! Very informative.

Thanks guys.

In addition to the above advice, you're almost certainly not drawing enough text for it to be a bottleneck worth addressing.

Yeah, you're probably right. Besides, rendering the text in a texture would incur the rendering cost of parsing that text only once. It's probably cheaper to then just render the text texture until it's not needed anymore. I'll check that too.

@Ryan_001, thanks a lot for the link to that presentation. I found the video of that presentation on the GDC website: http://www.gdcvault.com/play/1020624/Advanced-Visual-Effects-with-DirectX along with many other great ones too! Very informative.

In addition to the above advice, you're almost certainly not drawing enough text for it to be a bottleneck worth addressing.

Yeah, you're probably right. Besides, rendering the text in a texture would incur the rendering cost of parsing that text only once. It's probably cheaper to then just render the text texture until it's not needed anymore. I'll check that too.

@Ryan_001, thanks a lot for the link to that presentation. I found the video of that presentation on the GDC website: http://www.gdcvault.com/play/1020624/Advanced-Visual-Effects-with-DirectX along with many other great ones too! Very informative.

Instancing does not perform well for meshes with small polygon counts -- such as a single quad. You're better off not using instancing for rendering a list of quads.

It performs identically on the GPU and orders of magnitude faster on the CPU. How is that not better?

Instancing does not perform well for meshes with small polygon counts -- such as a single quad. You're better off not using instancing for rendering a list of quads.

It performs identically on the GPU and orders of magnitude faster on the CPU. How is that not better?

If you're comparing one instanced draw-call for all quads vs one draw-call for each quad, then sure there's a massive difference in CPU perf...
But you shouldn't use one draw-call per quad, you should use a single indexed draw-call for all quads, in which case the CPU performance is the same.

When i said 'does not perform well' I was referring to the GPU side -- instancing does incur a cost on the GPU side, especially for meshes with a small number of vertices. The alternative that I mentioned will be faster in terms of GPU time and equal in CPU time (one draw call, one buffer of per-quad attributes).

Vertex Shader Tricks by Bill Bilodeau (linked above by Ryan) has the gist of it -- Drawing quads as an indexed draw-call is much faster than an instanced draw-call in terms of GPU time:

4u4NSeq.png awOM9hk.png

I've seen this in practice too -- we saved a measurable amount of milliseconds by converting our impostor rendering system (for drawing a crowd of 100k characters) from using instanced quads to a large index list of quads -- and we didn't even do it the ideal way of having one vertex per quad (we still used the simple method of four verts per quad in the buffer and a standard VS and IA config).

Side notes from the above graph; NV GPU's seem especially sensitive to this "small-mesh instancing overhead" (this penalty goes away for meshes with ~500 verts IIRC), and NV GPU's are great at using the GS stage.

This topic is closed to new replies.

Advertisement