• Advertisement
Sign in to follow this  

Fill rate question

This topic is 4257 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a question regarding the pixel fill rate Im getting. I have a nvidia 6800 card that has a theoretical maximum pixel fill rate of 3900M pix/s. I have constructed a demo program that renders one quad with a vertex shader that only transform the quad with a worldviewproj matrix and a pixel shader that only returns the color red for all pixels. I know that my program is pixel limited because if I look at the quad I get a frame rat of 320fps and if I look away the fps goes up to over 500fps and are then probobly CPU bounded. Im using a resolution on 1024*1024 and that gives me 1Mpixel per frame to fill. So theoretically I shuld not get limited by the pixel pipe for any fps under 3900 right? Im using a 16bit Z-buffer. I know that 320 is a fairly good frame rate but my problem is that if I do a couple of overdraws (still with a pixel shader that does allmost nothing) the frame rate can esely drop to 40fps. My question is now way am I so far away from my cards theoretical limit? I an using NVPerfHud to make sure that Im not CPU bounded. I can't be vertex bounded because I get the exact same framerate if I make the quad out of a 30000 polly mesh. It can't really be my pixel shader ether because it does nothing and if I make it do something like just sampeling a pixel from a texture I get the same fps. I hope someone out thare have a good anser, I would be most greatful. And if you need any other specification just ask and I will post it. Im using direct3d9

Share this post


Link to post
Share on other sites
Advertisement
Reading directly from the marketing details is often misleading. They're often quoted as maximum theoretical and not necessarily under a real-world usage/load.

When running pixel shaders you've got the arithmetic cost as well as any texture look-ups, possible filtering costs, then you've got the output merger as well. Arithmetic and bandwidth are related and can affect each others perceived performance.

I remember reading that one of the 3DMark's only used a 2x2 textures with no filtering as it should fit into the GPU's local cache thus not require any cache<->VRAM latency/bandwidth.

Various settings in the driver might also affect the throughput - forced multisampling, forced higher quality, forced VSYNC etc...

An even simpler one, which I would hazard is your true problem - overhead. There is little guarantee that the pixel pipes are actually being 100% utilized in such a simple test case. The Draw**() call overheads combined with a deep pipeline could mean that you're including a lot of latency in your computation(s) rather than actual real work.

In my recent vertex processing demo I quite specifically used the most expensive vertex shader I could (Oren Nayar @ 91 arithmetic instructions) so as to be sure I was (over) loading the vertex units.

Whether any of those apply in your case is unlikely but difficult to tell for definite. But hopefully it'll give you an idea of what sort of issues you're going to have to consider.

hth
Jack

Share this post


Link to post
Share on other sites
Thanks for you quick reply!

I know that I can't get the full 3900Mpix/s but as it is now I only get 1/10 of that and I find that a little to low.

If I have a scene with 4 quads I draw all the quads every frame but I have a camera that I can move so I can look at the quads from different positions:
Here follows some numbers I get:

0 pixels drawn =620fps
1 quad =230fps
2 quads =170fps
3 quads =135fps
4 quads =112fps

(when I say 2 quads or more I meen that only one is visible but the others are behind it and are therefor used as fragments)

Tose fps gives in ms/frame:

0=1.61
1=4.34
2=5.88
3=7.40
4=8.93

as you can see 5.88-4.34=1.54 and 7.40-5.88=1.53 and 8.93-7.40=1.53

I think it is safe to say that the time for my card to draw 1M pixels under current settings is ~1.53ms
(with draw I dont mean draw to the screen but process the fragment)

that gives a fill rate of 653Mpix/s is this accepteble? when my card has a theoreticla limit of 3900Mpix/s

Share this post


Link to post
Share on other sites
When those theoretical numbers are published, a lot of things are turned off -- like the depth buffer. If you turn off the depth buffer (not just the test, but the buffer) you'll probably do better. I'm assuming you're already using a well-performing backbuffer format -- try both R5G5B5 and R8G8B8 and see if there's a difference.

Also: are you clearing the screen for each frame? That eats into fill rate.

In fact, presenting frames eats into fill rate, too, because it might stall the GPU. Benchmarks will run at very low frame rates (say, 1 fps) to recuce that overhead, and instead measure performance with massive overdraw.

You're not going to get to the theoretical number (which is derived from "X pixel pipes times Y clock frequency") but you can get closer.

Share this post


Link to post
Share on other sites
I take it from the answers that my ~600M pix/s are somewhat normal then.
Im not really aiming for the speed record hear but is trying to do a real life scen and I find my self fill rate limited all the time. So I wanted to start by making sure Im not doing any fundamatal errors.

Thanx for the answers

Share this post


Link to post
Share on other sites
I think your only "fundamental error" is in interpretting the data (both real-world and the marketting). Even timing it is non-trivial (Have you read the 'Accurately Profiling Direct3D API Calls' article in the documentation?).

Not that it'd be your fault as such - its just not a simple thing to understand/profile accurately [smile]

I would personally approach it from the other way around: Write the effects I NEED, test it where possible and then see about improving it if its too slow (there are lots of tricks you can use here)...

Theres a lot to be said for "elegant degradation" of shader effects. There are no strict rules to the game, but its where many of the 1000's of shader permutations you hear about come from. Juggle things around and change which calculations you do where and which shader gets applied to which surface etc...

hth
Jack

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement