Archived

This topic is now archived and is closed to further replies.

OrangyTang

Reducing fill rate in 2D graphics

Recommended Posts

I have some 2D lights which are produce a nice visual effect, yet when there gets to be several on screen at once (6+) the fps drops waaaay down. Reducing the resolution bumps it up again, so I guess i''m having fill rate limiting my app. At the moment theres a pass per light, requiring a screen clear, then numerous layers blended onto the framebuffer. I''m not sure how to reduce the fill rate this is using though, the only thing that springs to mind is using the scissor test to restrict drawing to the lights bounds. Would this actually help? Is there any other optimisations i can do to stop me being fill rate limited?

Share this post


Link to post
Share on other sites
How do you render your lights ?
Limiting the updated screen section sounds like a basic optimisation.

Do you use the 3D rendering pipeline ?
(That would boost your perfs and let you use many new features, and also standard lights.)

-* So many things to do, so little time to spend. *-

Share this post


Link to post
Share on other sites
quote:
Original post by Ingenu
How do you render your lights ?
Limiting the updated screen section sounds like a basic optimisation.

Lights are made from mucking around with the alpha buffer. I''m already culling geometry used down to that within the view, but now it seems that the physical pixel pushing is limiting me.


quote:
Do you use the 3D rendering pipeline ?
(That would boost your perfs and let you use many new features, and also standard lights.)


Yup, all done though many shaded triangles in OpenGL. Regular GL lights aren''t an option (per vertex, not per pixel. And a whole bunch of other reasons).

Share this post


Link to post
Share on other sites
Have you determined that the problem is actually fillrate rather than memory bandwidth (each blend incurs a hell of a lot of read-read-write). Follow the flowchart in the presentation on the nVidia site to determine...

When you clear the surface are you clearing any stencil and depth buffer at the same time?, if not you should be (if at all possible) for many chips.

Scissoring to just the region where the light is might help, depending on whether the pixels outside that scissor region would be touched by blending without scissoring (fullscreen quad etc)


Maybe you could do the lighting at a lower resolution/frequency into a smaller render target surface and stretch that.


Having said that, limits are just that - if you''re [truly, not through the result of bad code] fillrate limited but not CPU or vertex throughput limited you really should use some of those other resources to balance things out.

Living with the limits and making it look like you''re not is what sets a good graphics programmer apart from a normal one.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by S1CA
Maybe you could do the lighting at a lower resolution/frequency into a smaller render target surface and stretch that.



I thought stretching caused more fillrate problems than it solved i.e. resampling.

What is the size of the light map/texture? Do you use a mip-mapped texture surface?

I remember using alpha enabled quads in a directx 7 program and that stretching the quad across the screen severly dropped the frame rate. But If I rendered the texture the actual size on the screen, the framerate increased.

Try creating a 256x256 256 color lightmap and mip-mapping it.

Share this post


Link to post
Share on other sites
quote:
Original post by S1CA
Have you determined that the problem is actually fillrate rather than memory bandwidth (each blend incurs a hell of a lot of read-read-write). Follow the flowchart in the presentation on the nVidia site to determine...

Good question, it may be some sort of memory bandwidth with all the blending going on. I assume you mean this document? I''m looking though it now..

quote:
When you clear the surface are you clearing any stencil and depth buffer at the same time?, if not you should be (if at all possible) for many chips.


Clear is probably the wrong word, i''m resetting the framebuffer alpha with a full screen quad. At the very least the scissor test should speed this up somewhat.

quote:
Scissoring to just the region where the light is might help, depending on whether the pixels outside that scissor region would be touched by blending without scissoring (fullscreen quad etc)


Its difficult to see how much drawing outside of the light bounds is actually happening. I''m sure that for small lights the scissor should help (but with small lights theres only a small amount of geometry to re-render).


quote:
Maybe you could do the lighting at a lower resolution/frequency into a smaller render target surface and stretch that.


Having said that, limits are just that - if you''re [truly, not through the result of bad code] fillrate limited but not CPU or vertex throughput limited you really should use some of those other resources to balance things out.

Living with the limits and making it look like you''re not is what sets a good graphics programmer apart from a normal one.


Using a RTT for the shadows sounds promising, but i''m not sure how much accuracy i''d loose. I''ll keep it in mind, it''d be interesting to see how much smaller the surface would have to be to start making a performance difference. (would be nice for a speed vs. accuracy detail setting on slower systems as well )

Other than that, I think i may have to come up with some way to fake the lighting. Keep the high quality lighting as it is for dynamic lights, but have some sort of pre-calculated lightmap for static lights.

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
What is the size of the light map/texture? Do you use a mip-mapped texture surface?

I remember using alpha enabled quads in a directx 7 program and that stretching the quad across the screen severly dropped the frame rate. But If I rendered the texture the actual size on the screen, the framerate increased.

Try creating a 256x256 256 color lightmap and mip-mapping it.


Theres no lightmap textures used (not in a conventional way, at any rate) the whole scene is composed in the framebuffer with various geometry layers to mask out the lights influence.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by OrangyTang
Clear is probably the wrong word, i''m resetting the framebuffer alpha with a full screen quad. At the very least the scissor test should speed this up somewhat.



Most graphics cards have optimized methods for clearing the frame buffer which is faster than rendered a full screen quad (less state changes too).

Share this post


Link to post
Share on other sites
Yes, but I only want to clear the alpha component in the framebuffer and leave RGB channels alone. AFAIK thats not possible with regular glClear since the colour masking for the individuals is ignored.

Share this post


Link to post
Share on other sites
D3D has "write mask" states to determine which the chip writes to. No doubt GL has the same (though maybe as an extension). Letting the driver know what you''re attempting to do in an explicit a way as possible gives it many more chances to take advantage of things like fast clears.

Many chips interleave the Z (& stencil) buffer and the ARGB frame buffer in the same memory (i.e. 64bit pixels consisting of ARGBSZZZ). That''s the reason for always clearing the depth buffer at the same time as the colour buffer where possible.

At the moment the blends you''re doing to clear the surface are probably incurring read-read-write overhead. Then the actual work that''s used in the rendered result is incurring more overhead again.

Without thinking it through in any great detail I''d say: 8 bit alpha (or luminance) only render target texture. Clear that the official way, do your stuff, use blending to preserve colour in the final thing.

Share this post


Link to post
Share on other sites