Sign in to follow this  
ilian

fill rate - memory bound vs shader bound.

Recommended Posts

I'm trying to understand fill rate better and was wondering if someone could point me towards a resource or help me understand it better. The situation I've come across is my game seems to be fill rate bound. I have done extensive profiling using numerous tools and gathered this data: My actual fill rate for most objects is absymally low, around under 30m/sec in most cases and as low as 2m/sec. My profilers looking at my pixel shaders claim that their idea fill rate is around 300m/sec. From my understanding of this, my shaders are fine and the number of texture stages im using is fine, since those things are capping my shader throughput, which still has a decent ideal fill rate. But, my actual fill rate still being so low, means that I must be memory bandwidth bound. This kind of makes sense, if I am doing lots of passes on a single pixel each pass requires a certain number of bytes to read/write to the zbuffer read/write/blend with the backbuffer, etc. But, that is where I get a little confused... Does this mean I can use the most complicatd pixel shader possible right now without it affecting my performance because I am getting such a low actual fill rate? Does the size and use of the textures have anything to do with being memory bound on fill rate? From all the docs and info I can find, you solve this problem by occuling more pixels, doing less passes, and throwing out more pixels early at small alpha values. But this doesnt make any sense to me, shouldnt the size of a texture in memory dwarf the amount of memory it takes to actual read/write a pixel to the screen? How can it be that my pixel shaders appear to be doing so well, but my actual fillrate is so bad? I just dont have a good grasp of the problem and how being memory bandwidth bound on fillrate relates to shaders and textures.

Share this post


Link to post
Share on other sites
are you using swizzles? the number of memory transfers within a shader can cause slowdown of fillrate.
Depending on your card the size of textures going thru the pipeline can have a negative effect (on my radeon 9000 a 512x512 texture runs slower thru the pipeline than 4 256x256 textures when both are wrapped around the same irregular polymesh).
The shape of your textures (non power of 2, non square etc.) could affect it. The number of samples.. the list is endless, can u show us your shader code? Is it in HLSL?

Share this post


Link to post
Share on other sites
How many texture reads do you have?
How many of those are dependent texture reads?
How long is your pixel shader? How many cycles should it be taking up?
What's your backbuffer resolution (assuming you're trying to fill the whole backbuffer)?
What graphics card do you have?

Share this post


Link to post
Share on other sites
Quote:
Original post by ilian
I'm trying to understand fill rate better and was wondering if someone could point me towards a resource or help me understand it better.

The situation I've come across is my game seems to be fill rate bound. I have done extensive profiling using numerous tools and gathered this data: My actual fill rate for most objects is absymally low, around under 30m/sec in most cases and as low as 2m/sec. My profilers looking at my pixel shaders claim that their idea fill rate is around 300m/sec.

From my understanding of this, my shaders are fine and the number of texture stages im using is fine, since those things are capping my shader throughput, which still has a decent ideal fill rate.

But, my actual fill rate still being so low, means that I must be memory bandwidth bound.
This kind of makes sense, if I am doing lots of passes on a single pixel each pass requires a certain number of bytes to read/write to the zbuffer read/write/blend with the backbuffer, etc.

But, that is where I get a little confused... Does this mean I can use the most complicatd pixel shader possible right now without it affecting my performance because I am getting such a low actual fill rate?

Does the size and use of the textures have anything to do with being memory bound on fill rate? From all the docs and info I can find, you solve this problem by occuling more pixels, doing less passes, and throwing out more pixels early at small alpha values. But this doesnt make any sense to me, shouldnt the size of a texture in memory dwarf the amount of memory it takes to actual read/write a pixel to the screen?

How can it be that my pixel shaders appear to be doing so well, but my actual fillrate is so bad?

I just dont have a good grasp of the problem and how being memory bandwidth bound on fillrate relates to shaders and textures.


There are alot of factors which influence overall fillrate. It might help if you understood a little bit about how a graphics card gets it speed.

Most modern video cards have several hunder pixels in flight at any point in time. They are not all being processed at once. In an over simplfied explenation, a pixel shader executes up until it hits a texture load, at which point it swaps out since a memory hit can be many cycles, and it begins executing another pixel. In this way, the 'cost' of a texture load can be completly hidden assuming there are enough pixels in flight to swap between.

Many (most I'd say) shaders are actually not bound by texture loads, it is bound instead by the ALU. That is, the shaders are long and most of the time is spent waiting for the ALU cores to execute. The equation of what is slowing you down is often complex, and requirets alot of expirementation.

There is a list of of things I would try (and consider) to figure out where your bottleneck is:

1) Replace your shader with a simple shader which performs the same texture loads, but does as little as possible iwth them (multiply them together or add). Compare against reference speed. This will tell you how ALU bound you are.

2) Replace you textures with the same small 1x1 texture. This will tell youi if you cache/memory bandwitdh bound. Compare.

3) Replace your shaders with a trivial shader that just writes a constant color out. Compare. This will tell you how fillrate bound you are.

4) Consider the number of batches you are processing. if you have more then 500 draw primitive calls, you are most likely batch (or CPU) limited. You can also try increasing resolution until you see a slowdown.

5) Consider the amount of geometry you are procesing. Try reducing it (without changing the # of draw primtive calls). Compare.

6) Consider the amount of microgeometry you are rendering. Small polygons burn alot of fillrate. Since gradients must computed, (most) hardware renders at least 2x2 clusters of pixels - even if it throws away 3 of them (hence why its sometimes called a fragment shader). This means a 1 pixel polygon is actually burning 4 pixels.

Share this post


Link to post
Share on other sites
Thanks for the info and I really appreciate you writing up a good list like this. I have actually dont many of these tests allready as I tried to explain in the original post, its just that I am having trouble interpreting the results.
i have a profiling system that sets all my textures to 1x1 opaque textures, and another one that forces the viewport to a size of 0 to help identify these issues, I will try some more of the tests you described as well, and this definately gives me more to chew on. I feel like I am still missing some *glue* that would help my ovreall understanding, but you have definately helped me make some progress.



Quote:
Original post by EvilDecl81
1) Replace your shader with a simple shader which performs the same texture loads, but does as little as possible iwth them (multiply them together or add). Compare against reference speed. This will tell you how ALU bound you are.

2) Replace you textures with the same small 1x1 texture. This will tell youi if you cache/memory bandwitdh bound. Compare.

3) Replace your shaders with a trivial shader that just writes a constant color out. Compare. This will tell you how fillrate bound you are.

4) Consider the number of batches you are processing. if you have more then 500 draw primitive calls, you are most likely batch (or CPU) limited. You can also try increasing resolution until you see a slowdown.

5) Consider the amount of geometry you are procesing. Try reducing it (without changing the # of draw primtive calls). Compare.

6) Consider the amount of microgeometry you are rendering. Small polygons burn alot of fillrate. Since gradients must computed, (most) hardware renders at least 2x2 clusters of pixels - even if it throws away 3 of them (hence why its sometimes called a fragment shader). This means a 1 pixel polygon is actually burning 4 pixels.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this