Performance difference between scissor and viewport?

Started by
12 comments, last by Infinisearch 8 years, 6 months ago

I would like to make certain objects only render at certain parts of the screen. Basically exactly what the scissorRect can do. However, the exact same thing can also be achieved by setting the viewport to a smaller region and compensating the matrices for the offset. Does this give better performance? If I understand the documentation correctly the scissorTest is done per fragment while the viewport is done per vertex. This suggests that using the viewport instead of the scissorRect would potentially save significant amounts of fillrate. Is this true or am I misunderstanding this? ScissorRect is a lot easier to implement so I only want to use the viewport for this if I am actually gaining something that way.

(Note that this is for our 2D game, in which fillrate is currently the main performance bottleneck. The fragment shaders are extremely simple, so basically most performance goes into large amounts of overdraw of partially transparent objects.)

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Advertisement

I've been told the scissor rect doesn't eat fill.

read this short thread: http://www.gamedev.net/topic/669714-update-a-specific-region-of-the-render-target/

-potential energy is easily made kinetic-

Scissors would probably be better. Scissors prevents the render from filling anything out of bounds.

I think view port still fills.

Okay, I'll just happily use scissors then. Thanks! :)

Scissors would probably be better. Scissors prevents the render from filling anything out of bounds.

I think view port still fills.

From how I understand it the viewport might sometimes fill outside its bounds, for example when doing a wireframe render with a thick wire, because viewports operate on the vertex level. Viewports don't just fill all pixels of the object outside the viewport bounds: that only happens in those rare edge cases like that wireframe edge.

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Almost every API forces the scissor rect to always be within the viewport bounds, and if you "disable" the scissor rect, it actually just sets it to the viewport bounds!
So, if you're clipping via a viewport, you're actually still using a scissor rect.


(Note that this is for our 2D game, in which fillrate is currently the main performance bottleneck. The fragment shaders are extremely simple, so basically most performance goes into large amounts of overdraw of partially transparent objects.)

Have you tried screen space tiling such that all Frame Buffer reads and writes result in a FB/DB cache hit?

-potential energy is easily made kinetic-

What is "screen space tiling"? If I Google for this I get hits for tile based renderers but as far as I know that is a type of graphics hardware, so I suppose you mean something else?

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Sorry I might be using my own terminology, not sure. The basic idea is for 2d is:

1. figure out how big a tile can fit in the GPU ROP caches (i think this is the proper term)

2. divide the screen into above sized or less tiles. Create a bin per tile.

3. for all your alpha-blended let say quads find out which tiles are affected and add to bins.

4. set scissor rect for tile. Draw all geometry for tile from bin.

5. repeat for all tiles.

-potential energy is easily made kinetic-

That explodes the number of rendercalls, unless the GPU ROP caches are quite big. Is this actually a good idea when rendering 500 objects per frame?

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)


That explodes the number of rendercalls, unless the GPU ROP caches are quite big. Is this actually a good idea when rendering 500 objects per frame?

Yes it increases the number of draw calls but depending on what type of 2d game you are making this might not be an issue. Then of course there is always instancing with texture atlas's or indirect draw calls with texture atlas's. ROP caches vary depending on hardware, but you can disable the technique depending on detected hardware, a benchmark or forced with a user option. I hear DX11 can hit 10,000 calls a frame normally... but like I said there always instancing or indirect draw calls. How's your performance looking now? on what hardware?

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement