Unexpected fps and screen-aligned quad

Started by
5 comments, last by neneboricua19 20 years, 5 months ago
Hi everyone, I''ve been trying to run some performance tests with DX9 (using latest summer update) and I''m getting what I think are unreasonable results. I''m trying to render 1000 screen-aligned quads using one large triangle list. Each of the thousand quads are placed one in front of the other and I''m rendering them in back-to-front order so that the Z-Buffer doesn''t cull any of them out. I''m using some very simple shaders but I''m outputting a depth value from my pixel shader. The each have a solid color and texture coordinates (which I''m not using right now but I have them there because I will need them later on) When rendering 1000 screen-aligned quads, I get about 4.2 frames per second. This seems extremely slow to me. I''m running on an ATI 9600 Pro. The app is running in a window at a resolution of 500x375 (the default that the SDK framework sets up). I get the same framerate whether a use the Debug or Retail runtimes. Since I''m using the same shaders, vertex buffers, and index buffers every frame, I thought I could just set them once in the RestoreDeviceObjects method from the SDK framework but this doesn''t seem to work. So I''m setting all the shaders, and buffers every frame inside my Render method. Is there something extremely wrong that I''m doing? I would think that rendering 1000 quads like this would be pretty fast, considering that the shaders are just passing along the info that comes from the vertex structure. Thanks for any help, neneboricu
Advertisement
How big are the quads? Is Alpha blending enabled? Multisampling? You''ve created one list, so I''m assuming you''re rendering all 1000 at once.

Calculate the pixel area used (ie: 1000*32*32 = 1024000, or 1MPixel). How much throughput does you card have? A Geforce3 can do about 700 MPixel/sec... peak... ie: no textures, no lights, nothing special at all.

That will give you ~700 FPS, ignoring the time it takes for clearing, and z buffering.

If you have multisampling, drop that number. If you have textures, drop it a bit. If you have trilinear filtering, drop it in half. If you use more than 2 textures, drop it in half. If you use alphablending, drop it by 10%. For every 2 instructions in your pixel shader, divide.

(1-2 instructions = free.
3-4 instructions = 1/2 speed
5-6 instructions = 1/3 speed
7-8 instructions = 1/4 speed)

1) Turn up the debug output level in the control panel and take a look at whether D3D is telling you anything - if it''s outputting 1000 error/warning messages for each of those quads, then you''d get what you describe!.

2) Make sure that you''re not using a device set up for SOFTWARE VERTEX PROCESSING with a buffer (VB or IB) set up for HARDWARE VERTEX PROCESSING or vice versa or similar. Same goes for the pools you''re specifying. Doing that would get terrible performance.

3) When you say you''re rendering that with one large triangle list - do you mean a single Draw*Primitive() call ?. If so, that''s good. If not, that might be bad (for example if each quad was rendered with its own Draw*Primitive() call).

4) Make sure your code hasn''t accidentally selected the REF device! - as obvious as it sounds, double check! That''d give you that kind of performance too.

5) What do you mean by "So I''m setting all the shaders, and buffers every frame inside my Render method" ? If you simply mean calling Set* calls then that''s fine. But if you mean you''re re-creating the buffers each frame, then that''s very bad.


There could be other things, but the above are the most obvious, so check those first. If none help locate the problem, post again and I''ll mention some of the more obscure/bizarre problems I can think of.

--
Simon O''Connor
3D Game Programmer &
Microsoft DirectX MVP

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

Thanks for the replys.

Namethatnobodyelsetook: The quads are really just 1x1 if you go by the actual coordinates, but I use scaling and an orthogonal projection so that they come out to be the exact same size as my window. I believe my is a slimmed down version of a Radeon 9800 Pro. It's the same except that it has half the rendering pipleines (4 instead of the 9800's 8) and half the geometry pipelines (2 instead of the 9800's 4). I'm not sure what the throughput is on it (can't seem to find that spec. either on the box or on ATI's website) but I would imagine it's gotta be more than a GeForce 3. Also, I'm not using any textures, no lighting, no alpha blending, etc... Each quad is a solid color that's taken from the vertex info.

S1CA:
1. Debug is at max. I checked and I don't get any errors or warnings.
2. I double checked and I'm using pure device with hardware vertex processing.
3. I'm doing a single DrawIndexedPrimitve call.
4. Just checked and yes, I'm using HAL and not the REF device.
5. I mean I call Set* every frame. I create all those shaders and such at device init time and Set* them every frame. I thought that I could just set them once at init and it would work fine if I never changed them but that doesn't seem to be the case.

I really appreciate any help. Thanks in advance,
neneboricua

[edited by - neneboricua19 on October 20, 2003 11:50:13 PM]
Uhm, I''m pretty much a newbie; but filling your entire screen with pixels 1000 times - doesn''t that compute to something like 500 x 375 x 1000 x 4.2 = cirka 780 million pixels per second, ie. you are fillrate limited?
have you solved it?
sorry to bring this up but im really curious...
Sorry about that, I should have posted some kind of conclusion. It turns out that fenghus was right: I was fillrate limited. Normally you wouldn''t think that rendering 1000 quads would be so slow.

I downloaded the free version of 3DMark2003 and ran some tests on the fillrate and it turns out that I was trying to push more pixels through the card than it''s capable of handling.

You would normally be able to render 1000 primitives without much of a problem. However, this is based on the unwritten assumption that each of these 1000 primitives take up only a relatively small amount of pixels each.

A few years ago, when video cards used to advertise how many polygons per second they could process, it was assumed that most polygons would not take up more than 6 pixels on the screen. This assumption totally breaks down for a screen-aligned quad and thus we become fillrate limited.

neneboricua

This topic is closed to new replies.

Advertisement