Quick question on rendering!

Started by
16 comments, last by Kitasia 16 years, 10 months ago
I was wondering if I'm drawing about 70 quads (140 triangles) and each one's position is set through Matrix multiplication and each one has a seperate texture. Should I be able to handle 1,120 triangles without that much slowdown? I'm using XNA btw! Intel Pentium 4 1.7ghz Nvidia Geforce FX 5550 256 MB 1.5GB Ram I plan on upgrading soon but I'm pretty peculiar about making things too expensive.
Advertisement
1120 triangles is pretty tame. I'd be far more worried about the 70 SetTexture calls per frame. If you're fitting that many textures into 256MB of VRAM, they can't be that big. Is texture atlasing not an option?

Anyway, the only way to find out is to test it - it wouldn't take a minute to throw together a prototype. The worst-case scenario is that each triangle fills the screen with maximal overdraw (which is very unrealistic); the best-case is that everything is culled before rendering so that the only performance hits are the render state changes and batch submission. The performance difference here is huge, and so we really couldn't tell you how things will pan out without knowing more.

Admiral
Ring3 Circus - Diary of a programmer, journal of a hacker.
Hmm so the texture calls do have a lot to do with it. Oh and I'm not exactly using seperate textures just using textures from a list of textures stored. Heh I thought things like that were pretty normal (especially when thinking about levels). Ah well, guess I'll figure something out.

Thanks!
The less changes to the pipeline you need to make the better.

If you think about it, each time you make changes you're telling the device "you need to change all this stuff before you can process the next batch of work".

If the batches are small (like yours) then the GPU/driver is probably spending as much, if not more, time on configuration than actual rendering. A more optimal system will try to load the system so it's doing more actual rendering and less configuring - that is, using as much time as possible to do useful stuff.

The other factor is that each (re-)configuration requires interaction from the CPU, which is a bad thing if you want optimal performance. Treat your CPU and GPU as independent co-processors, so you want them working in parallel not in sync...

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Whoah, if that's the case might I ask what would be the most optimal method for drawing a bunch of quads? SpriteBatch came to mind but it's way too limited.

Also I'm pretty sure it's not the texture thing seeing as I can't draw a bunch of non-textured quads and get a fair speed.
Quote:Original post by AntiGuy
Also I'm pretty sure it's not the texture thing seeing as I can't draw a bunch of non-textured quads and get a fair speed.

I hope you're not drawing sprites using DrawPrimitiveUP. This plays right into the hands of the wasteful 'preparation' that Jack just described.

If you have a set of sprites that will be drawn many times without any changes being made (such as level geometry) then you'll witness tremendous performance increases by compiling them into a single Vertex Buffer. Considering that a batch submission following a render-state change (this includes DrawPrimitiveUP) will stall the pipeline, on current hardware, you could render vertices at a vastly improved rate by batching them all together into a single submission (DrawPrimitive). This applies equally to the indexed variants of the draw calls.

To give you an idea of the scope of this problem, my GeForce 7800GT is cited as being able to process 1.1 billion vertices per second under optimal conditions, and that's only a fraction of what the newest cards can manage. Under more realistic circumstances, this scales down to a few million quads per frame at 60fps. If you render each quad individually, the VPU will spend virtually all of its time waiting for the next set of vertices to be lined up. To maintain the 60fps you couldn't expect more than a couple of hundred quads to be rendered*. Things get more complicated when rasterisation and texture-lookups come into play, but if vertex-throughput is your bottleneck, you're doing something wrong.

Give us some details on what the quads contain and how they will behave. Chances are that there is an existing tried-and-tested design pattern that optimises rendering performance.

Admiral

* Don't lynch me over the accuracy of the figures - they're artificial estimates. The orders of magnitude being discussed, however, are very real.
Ring3 Circus - Diary of a programmer, journal of a hacker.
Thanks for the reply!

Yes I'm using DrawPrimitiveUP (If that means what I think it means, which is DrawUserPrimitive). I'm using quads to create entities, entities made up of textured quads with each quad carrying it's own color, vertex coordinates, and matrix position. 1 entity contains about 70 quads and currently I can only do about 4 of em at 60fps [dead].

If there really is a way I could do all this and gain speed I'd really be indebted! Oh, and eh sorry about the question not being as quick as implicated : )!

[Edited by - AntiGuy on May 23, 2007 5:58:11 PM]
I would say you are burning fill rate (or texture bandwidth) rather than any other limitation. While you're drawing in a very inefficient manner, 280 draw calls despite coming from system memory, being transformed and changing state each time should be trivial at 60fps on the specs you mentioned.
Okay so I should...

1. Use a vertex buffer and...
2. Fix this fill rate problem. Not sure what texture bandwidth is exactly.

I've always thought that when you use a vertex buffer, the verticies couldn't be manipulated afterwards (Highly thinking I'm wrong about that). Anyhow, would that about do it? [wink]
By texture bandwidth I mean the memory bandwidth consumed by reading texels. You can test if this is the bottleneck by simply reducing your texture size. Fill rate can be tested by reducing your sprite size. If you significantly reduce both of these and your frame rate doesn't improve then your bottleneck is elsewhere.

Vertices in a dynamic (D3DUSAGE_DYNAMIC) buffer can be modified regularly, but you should be careful to lock the buffer correctly (write only, nooverwrite etc...) to avoid stalling the pipeline.

This topic is closed to new replies.

Advertisement