Sign in to follow this  
AntiGuy

Quick question on rendering!

Recommended Posts

I was wondering if I'm drawing about 70 quads (140 triangles) and each one's position is set through Matrix multiplication and each one has a seperate texture. Should I be able to handle 1,120 triangles without that much slowdown? I'm using XNA btw! Intel Pentium 4 1.7ghz Nvidia Geforce FX 5550 256 MB 1.5GB Ram I plan on upgrading soon but I'm pretty peculiar about making things too expensive.

Share this post


Link to post
Share on other sites
1120 triangles is pretty tame. I'd be far more worried about the 70 SetTexture calls per frame. If you're fitting that many textures into 256MB of VRAM, they can't be that big. Is texture atlasing not an option?

Anyway, the only way to find out is to test it - it wouldn't take a minute to throw together a prototype. The worst-case scenario is that each triangle fills the screen with maximal overdraw (which is very unrealistic); the best-case is that everything is culled before rendering so that the only performance hits are the render state changes and batch submission. The performance difference here is huge, and so we really couldn't tell you how things will pan out without knowing more.

Admiral

Share this post


Link to post
Share on other sites
Hmm so the texture calls do have a lot to do with it. Oh and I'm not exactly using seperate textures just using textures from a list of textures stored. Heh I thought things like that were pretty normal (especially when thinking about levels). Ah well, guess I'll figure something out.

Thanks!

Share this post


Link to post
Share on other sites
The less changes to the pipeline you need to make the better.

If you think about it, each time you make changes you're telling the device "you need to change all this stuff before you can process the next batch of work".

If the batches are small (like yours) then the GPU/driver is probably spending as much, if not more, time on configuration than actual rendering. A more optimal system will try to load the system so it's doing more actual rendering and less configuring - that is, using as much time as possible to do useful stuff.

The other factor is that each (re-)configuration requires interaction from the CPU, which is a bad thing if you want optimal performance. Treat your CPU and GPU as independent co-processors, so you want them working in parallel not in sync...

hth
Jack

Share this post


Link to post
Share on other sites
Whoah, if that's the case might I ask what would be the most optimal method for drawing a bunch of quads? SpriteBatch came to mind but it's way too limited.

Also I'm pretty sure it's not the texture thing seeing as I can't draw a bunch of non-textured quads and get a fair speed.

Share this post


Link to post
Share on other sites
Quote:
Original post by AntiGuy
Also I'm pretty sure it's not the texture thing seeing as I can't draw a bunch of non-textured quads and get a fair speed.

I hope you're not drawing sprites using DrawPrimitiveUP. This plays right into the hands of the wasteful 'preparation' that Jack just described.

If you have a set of sprites that will be drawn many times without any changes being made (such as level geometry) then you'll witness tremendous performance increases by compiling them into a single Vertex Buffer. Considering that a batch submission following a render-state change (this includes DrawPrimitiveUP) will stall the pipeline, on current hardware, you could render vertices at a vastly improved rate by batching them all together into a single submission (DrawPrimitive). This applies equally to the indexed variants of the draw calls.

To give you an idea of the scope of this problem, my GeForce 7800GT is cited as being able to process 1.1 billion vertices per second under optimal conditions, and that's only a fraction of what the newest cards can manage. Under more realistic circumstances, this scales down to a few million quads per frame at 60fps. If you render each quad individually, the VPU will spend virtually all of its time waiting for the next set of vertices to be lined up. To maintain the 60fps you couldn't expect more than a couple of hundred quads to be rendered*. Things get more complicated when rasterisation and texture-lookups come into play, but if vertex-throughput is your bottleneck, you're doing something wrong.

Give us some details on what the quads contain and how they will behave. Chances are that there is an existing tried-and-tested design pattern that optimises rendering performance.

Admiral

* Don't lynch me over the accuracy of the figures - they're artificial estimates. The orders of magnitude being discussed, however, are very real.

Share this post


Link to post
Share on other sites
Thanks for the reply!

Yes I'm using DrawPrimitiveUP (If that means what I think it means, which is DrawUserPrimitive). I'm using quads to create entities, entities made up of textured quads with each quad carrying it's own color, vertex coordinates, and matrix position. 1 entity contains about 70 quads and currently I can only do about 4 of em at 60fps [dead].

If there really is a way I could do all this and gain speed I'd really be indebted! Oh, and eh sorry about the question not being as quick as implicated : )!

[Edited by - AntiGuy on May 23, 2007 5:58:11 PM]

Share this post


Link to post
Share on other sites
I would say you are burning fill rate (or texture bandwidth) rather than any other limitation. While you're drawing in a very inefficient manner, 280 draw calls despite coming from system memory, being transformed and changing state each time should be trivial at 60fps on the specs you mentioned.

Share this post


Link to post
Share on other sites
Okay so I should...

1. Use a vertex buffer and...
2. Fix this fill rate problem. Not sure what texture bandwidth is exactly.

I've always thought that when you use a vertex buffer, the verticies couldn't be manipulated afterwards (Highly thinking I'm wrong about that). Anyhow, would that about do it? [wink]

Share this post


Link to post
Share on other sites
By texture bandwidth I mean the memory bandwidth consumed by reading texels. You can test if this is the bottleneck by simply reducing your texture size. Fill rate can be tested by reducing your sprite size. If you significantly reduce both of these and your frame rate doesn't improve then your bottleneck is elsewhere.

Vertices in a dynamic (D3DUSAGE_DYNAMIC) buffer can be modified regularly, but you should be careful to lock the buffer correctly (write only, nooverwrite etc...) to avoid stalling the pipeline.

Share this post


Link to post
Share on other sites
This seems like it'll work! Thanks! I'll see how it goes!

Update: Well I reduced both and no go. It's almost as if I didn't change anything. I'd like to say it's all the vertex buffer issue but if I simply skip drawing I still get hit hard. I'm stumped. It seems things only go right if I don't commit to any changes (This includes - Matrix Changes, Material changes to an extent, and most importantly Texture changes) Should I be using HLSL or maybe (XNA related) use effect.Being and effect.End more often instead of piling everthing I draw inbetween it once and making a bunch of changes in betweeen?

(Just tried that effect part and everything was a lot worse! Not sure what I was thinking)

[Edited by - AntiGuy on May 24, 2007 9:37:07 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by AntiGuy
Update: Well I reduced both and no go. It's almost as if I didn't change anything. I'd like to say it's all the vertex buffer issue but if I simply skip drawing I still get hit hard. I'm stumped.

You're rendering all the geometry in one batch using a single vertex buffer, and have taken all your textures down to a reasonable size (no bigger than 512x512) with mip-mapping enabled (right?) and it's still running slowly? Something is amiss.

I don't follow that second sentence in my quoted text. Do you mean that it runs slowly even if you don't submit the geometry to the video card? Changing state very frequently is highly detrimental to otherwise fast performance, but it shouldn't have too much effect when nothing is being drawn. Maybe the problem is more fundamental. I trust you have the debug runtimes enabled, with maximum debug spew, and are looking for performance warnings. If not, this comes before anything else on the to-do list [smile].

If things are no better after all that, then we'll need to see some code. And I don't mean to patronise, but are you sure the problem is in your render process and not something external clogging up your CPU?

By the way - sorry about the terminology. You translated correctly. For some reason, I always tend to assume people are using C++ [rolleyes].

Admiral

Share this post


Link to post
Share on other sites
XD I have no idea what any of those debug settings are. But I'm using Visual Studio Express and running in debug configuration.

As for the rest of the information. I haven't set up any vertex buffer yet so I'm just commented out the line DrawUserPrimitives to isolate other errors first. All my textures are no greater than 200x200 and mip-mapping is enabled.

Here's my rendering code!



public static void DrawLimb(Limb limb, SortedTextureList textureList)
{

PushMatrix();

MultiplyMatrix(limb.matrix);

//Draw
effect.Texture = limb.GetTexture(textureList);
effect.SpecularColor = limb.shadowColor.ToVector3();
effect.Alpha = limb.Alpha;
effect.CommitChanges();

device.DrawUserPrimitives<Vertex>(PrimitiveType.TriangleStrip, limb.vert, 0, limb.vert.Length - 2);

PopMatrix();

}

public static void DrawCharacter(Character.Character character, SortedTextureList texList)
{
//Updates all Matrix Positions
character.UpdateMatrix();
foreach (Limb part in character.drawOrder)
{
if (part.IsVisible == false) { continue; }
DrawLimb(part, texList);
device.RenderState.SlopeScaleDepthBias -= 0.1f;
}
device.RenderState.SlopeScaleDepthBias = 0;
}







I know it's not perfect but should it really kill me this much? I used the same technique in OpenGL and it wasn't half as bad as this.

Ooh I forgot to add! I can handle about 8 of them at fps when I'm not in debug mode and I'm not using DrawUserPrimitives.

[Edited by - AntiGuy on May 24, 2007 12:03:25 PM]

Share this post


Link to post
Share on other sites
Your code looks fine to me.

You can maximise debug output by opening 'DirectX' in the control panel and putting the corresponding slider up to maximum. Now, when you run a DirectX program that's compiled with the debug libraries from VS++EE, the API will report anything it thinks you should know in the 'Output' window. You should keep an eye on this every time you test-run. It can save you a lot of time and a lot of headaches.

I must ask. If you comment out the calls to DrawLimb and DrawCharacter (and further, any other graphic-related calls) do you get the performance back? It's looking more and more like you're trying to fix the wrong problem [rolleyes].

But now that we've reached the point where the bottleneck isn't obvious, it's time to bring out the most powerful optimisation tool you'll ever need - the profiler. I don't have much experience with C#, but I've heard some good things about Ants. Language-independent alternatives would be VTune if you have an Intel processor or CodeAnalyst for AMD. Profile a few seconds of execution and look to see where most of the CPU time is being spent. If it's in your program then use the profiler to find out where, and fix it. On the other hand, if the time is being wasted in a device driver or a DirectX DLL then you'll need a different, graphics-oriented, profiler. Luckily for you, the DirectX SDK comes with PIX, with which you can isolate the bottleneck in no time.

So they are your orders. Report back with your findings [wink].

Admiral

Share this post


Link to post
Share on other sites
[smile] Okay! Sorry for the wait!

Before I begin I'd like to know everything works swimmingly when I skip drawlimb all together.

My slowest method is apparently getting the texture which comes as a complete shocker seeing as all I do is return a list entry. The next slowest invoves setting the matrix which Multipy, Push, and Pop Matrix do. I'll comment out some things and see what I find and.....

Not too much of a difference. It's maddening!? As if everything in that function is working as one to slow me down. The only way to get a decent speed is to not draw anything, not make any texture changes, and don't change the material. Pretty much everything DirectX is control of! [razz]

I'm beginning to wonder if my video card is responsible somehow.

Oh yeah, I couldn't set Maximum Debuging output because it was grayed out for some mysterious reason.

Share this post


Link to post
Share on other sites
Quote:
All my textures are no greater than 200x200 and mip-mapping is enabled.


Now you do realize your card likely isn't a fan of non-power of 2 textures right? I took a quick look online at your card and couldn't find if it supported them or not. But make sure all your textures are like 32x32, 64x64, 128x128, 128x256, 256x128, 256x256, etc. Any combo of power of 2.

I know it doesn't seem to be the bottle neck but when I seen your 200x200 quote I worried :)

Share this post


Link to post
Share on other sites
Thanks! I'm familiar with that rule but I don't think it'll hurt that bad. I use textures in powers of 2 for things like scenery and such.

More news I downloaded a XNA game I found Here and it runs INCREDIBLY SLOW!!! Even the menu is slow! So I'm beginning to think this is related to a driver of some sort. I'll look into it some more.

I'm guessing you can test it for yourself, but I'll note it's kinda gorey for my tastes. [dead]

Share this post


Link to post
Share on other sites
Oy, never imagined I'd be sorting this out this long 9_9.

Well I downloaded the latest DirectX SDK hoping to achieve a conclusion but, alas, nothing. I was able to increase debug output however! My findings using the Ants profiler reported effect.CommitChanges() and DrawUserPrimitives to be the biggest thorns in my side. Third runner up would be EffectParameter.SetValue.

I was about to purchase a new computer being that mine is 7 years old. May as well now, just may solve the issue. I have a feeling my code doesn't have much to do with this.

I'll note I found a post with a nearly identical problem.
http://forums.nvidia.com/index.php?showtopic=36278&mode=linear

[Edited by - AntiGuy on May 25, 2007 6:06:52 PM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this