Instancing worth it for rendering text in a 2D game?

Started by
17 comments, last by Eric F. 7 years, 3 months ago

Obviously the highly specific technique for that use case is gonna be faster than the generic instancing one that doesn't allow further optimization and you should use it whenever possible, but that's not a valid comparison to show instancing overhead.

Advertisement

I think the flaw in the thinking here is only measuring at the front-end. I know that I certainly used to fall into that trap years ago.

The way it looks in this case is: (a) you measure the number of vertices used for an indexed or non-indexed draw, and (b) you measure the number of vertices used for a GS or instanced draw. You see that (b) is significantly lower than (a), and therefore you assume that (b) must be faster than (a).

The reality is that vertex counts are only part of what can contribute to performance, there are other factors, and depending on one's use case vertex counts may not even be relevant.

This can be counter-intuitive; there's a whole "anti-bloatware" culture based on the premise that using more memory is bad, using less must be good, and this kind of metric just flies completely in the face of it.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Obviously the highly specific technique for that use case is gonna be faster than the generic instancing one that doesn't allow further optimization and you should use it whenever possible, but that's not a valid comparison to show instancing overhead.

The statement you wanted me to clarify was: Instancing does not perform well for meshes with small polygon counts. You're better off not using instancing for rendering a list of quads. :wink:

I assumed we were both talking about the performance in this specific situation of text rendering, and that specific bit of advice, not the general case :P


Rendering a list of quads (e.g. text or billboards) is the extreme case, but the same performance pitfall applies to any low-poly model. e.g. if instancing a-few-hundred-poly models, you may find that old-school pre-HW-instancing techniques (or modern techniques that appeared after the IA stage disappeared from HW) are actually still faster than using HW instancing. At around 1k+ poly's you'll likely see no real performance overhead from instancing, making it useful.

Instancing does not perform well for meshes with small polygon counts -- such as a single quad. You're better off not using instancing for rendering a list of quads.
You can minimize data by only sending the quad center position X/Y coordinate and the width/height (instead of four x/y coordinates) along with a special VS that manually reads the vertex attributes from a SRV (instead of using the IA to read them automatically), or alternatively you can use the GS to convert a single vertex into four.


I have agree here because I've done exactly what the op originally did but i have compulsion to change to instancing. Simply because i don't get anything from the move.

1. I actually Calc the quad in screen space and store all the geometry in the cpu. This Calc only occurs when my string is updated.
2. I don't even bother passing the point only to gs to expand it. Because the difference in data is yes an order of magnitude but I'd still just bytes.
3. Instancing does give me some options like rotating quads etc. And adding instancing anyway should be trivial for this.

So in the end. I would use instancing actually not for instancing but for things like animating the quad. I found nothing performance wise with such small data sets.

Indie game developer - Game WIP

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

Insane Software Facebook

Anyways, in the case where I do not issue more than 1 draw call to render the text, would there be any real benefits of using instancing to render text?

In my renderer I have 1 draw call for text rendering (1 draw per pass: score pass is 1 draw call, stat pass - is a second one).

I use GS for quad generation, texture has several fonts, and each letter can be in different color.

VS passes through 1 point/character {uv, screen coord, and color}

I do not use rotations or other transforms yet.

I can use mono fonts and normal fonts and combine them in one draw call.

Just measured timings in NSight for stat pass:

For 38 characters - 36 microsec,

For 122 characters - 52microsec,

for 300 characters - 66.

The other benefit for me is that I need to update instance buffer only once per draw call and only if my text has changed.

One more thought: probably rendering more than 1000 characters per frame is not very common, so examples with 500k sprites are not so relevant for this topic.

I just did some tests while I rewrote my text box parsing and rendering. This is on a Lenovo Ideapad Y560 laptop, which has a Radeon HD5730 video card.

Rendering a screen full of text at 1980x1080 (external monitor),

15750 total characters for 63000 vertices,

using a single DrawIndexed call,

with a simple pixel shader that does transparency.

The time to render the text varies between 2 and 4 microsecond. That is with no instancing and sending all those vertices to the gpu. I only update the buffer if the text changes in some way.

Strangely enough, if I render 200 characters, I get the same timing, 2 to 4 microseconds. This might be an innacuracy of the highperf counter, but I'm not sure. What I know is that its fast enough for my needs.

Strangely enough, if I render 200 characters, I get the same timing, 2 to 4 microseconds. This might be an innacuracy of the highperf counter, but I'm not sure. What I know is that its fast enough for my needs.

Probably you are measuring CPU time (time to prepare your commands).

GPU timings are usually measured by GPU profilers such as NSight/DX GPU query/GPUView/..

I think you're probably better off using a batching method for doing text. The only real difference is the geometry, which should be perfectly fine for creating a massive index for rendering. Even better is that you can store this index data, and then toss out the string if you're not going to change it.

Also, if you're rendering 200 characters in 2 microseconds, I think you're doing good. Remember that a micro is a fraction of a mili....

Especially when you consider the fact that if you're doing gui with scroll bars or something... more than likely you have a scissor over the window to get rid of the junk you don't care about.

Strangely enough, if I render 200 characters, I get the same timing, 2 to 4 microseconds. This might be an innacuracy of the highperf counter, but I'm not sure. What I know is that its fast enough for my needs.

Probably you are measuring CPU time (time to prepare your commands).

GPU timings are usually measured by GPU profilers such as NSight/DX GPU query/GPUView/..

Doh, very true! I'l give it a looksee.

Thanks guys. I got things working pretty fast now. If this becomes a problem, then I'll investigate.

This topic is closed to new replies.

Advertisement