Stencil buffer speed?

Started by
10 comments, last by Yann L 17 years, 1 month ago
At this point I have a *nice* non-euclidean portal system in place, the idea is that each "room" has several portals which can link to other rooms' portals... the part that makes this more interesting is that the my portal drawing uses stencil buffer magic to make sure that geometry on the other side of a portal does not leak past the portal, imagine if you make a portal in the middle of a room one has to make sure that the the sutff behind the portal does not get drawn into the room of the other side, I use the stencil buffer, together with clipping of bounding boxes to stop this (the bouding box clipping jsut saves me from sending a mesh through the pipeline, the stencil buffer is the real doer here), but now for the issues: speed. I do several state changes of enabling/disabling color and depth masks, changing depth test and changing stencil func and op for this trickery, when I disbale the stencil magic I see a significant speed up (but lots of graphic anomloies as geometry leaks everywhere) how expensive is it to change the stencil op and func? each room essentially needs about 4 or so stencil func and op changes, so when I do halls of mirrors I get lots of rooms and lots of changes: in one stress test: one room with 4 mirrors, 2 facing each other, depth=7, number of non-clipped rooms rendered=309 rooms for a total of 1571 non clipped meshes, with stenil: 11-12 FPS, without stencil (but looks bad) 20FPS, vsync is there, so the real FPS might be like 15 vs 20 but you get the idea... I don't think fill rate is the issue since the meshes are rendered with a key frame per-pixel lighting mesh, so without stencil more frags would get rendered that way... and I thought stencil drawing was cheap, so are the stecil changes killing my FPS? or is it glDepthMask and glColorMask that are hurting me or both? edit: Specs of machine are GeForce6600GT(128MB), AthlonXP3000+(real clock=2.1Ghz), 1024MB RAM, Fedora Core 4 [Edited by - kRogue on February 26, 2007 12:45:55 AM]
Close this Gamedev account, I have outgrown Gamedev.
Advertisement
Do you use glbegin-glend for vertexes? How complex is ur scenes (i mean number of triangles) ?
all geometry data is stored as VBO's, so each mesh is exactly one glDrawStuff command, but there are several meshes to each room... I would say each room clocks in at most about 3000 triangles (i.e. all meshes of a room toegether is 3000), maybe like 3500, but more than 2000 tri's so the triangle cost, but still when I kill the stencil business the framerate goes up, (by a decent amount), so 300 rooms at 3000 tris each--> 900,000 total atriangles that is quite a few, but still it's note like the card is a Voodoo1, its a GeForce6600GT, so it should easily handle that many tri's...

[Edited by - kRogue on February 26, 2007 11:00:33 PM]
Close this Gamedev account, I have outgrown Gamedev.
Damn, I'm doing the same thing with portals.. That makes me extremely sad. Anyway, I read somewhere that if you have depth testing on, stenciling is basically free... Don't remember where I read it thought.
I confirm the above. Z/stencil is usually implemented near each other.
My performance on a similar system (but on D3D) clocks about 280fps and 15Mtri/s but even in this case I am pixel shader limited. As a reference, NFSMW clocks at 11Mtri/s and most games I've benchmarked also went about 10-12Mtri/s.
"Vsync is there?" disable it. There are many good reasons to turn off vsync and having a earier to predict debug is one of this (you don't want your rendering thread to stop just because it's vsyncing).

I don't think changing stencil could hurt much (being used by ID engines, this is likely to be a performance path).

300 draw calls with 3k vertices shouldn't be a problem in GL but just to try, what happens if you doubly tassellate and use 6k batches (assuming you're using a batch per room)?

Previously "Krohm"

kRogue: Are you rendering directly to the window or to an FBO/p-buffer? I remember having performance issues when trying to use stencil for masking while rendering to an off-screen buffer. In my simple experiment (nothing to do with portals), using the stencil for masking out unaffected pixels didn't help with FBOs. It seemed that early stencil rejection didn't work. Of course stencil testing worked ok, but not before the fragment shader, which made the whole thing slower than rendering directly to the window, because i had to update the stencil (more triangles rasterized even if they were rendered at double speed).

Just a thought.

HellRaiZer
HellRaiZer
It's hard to understand what's causing low framerates in ur case. I think it's mostly because ur accelerator not optimized for using stencil buffer much. But most important - if u clear stenicil buffer too much that's will be absolutely origin of slow fps. Instead of clearing whole buffer, just draw your used portal with zero in stencil, hope that helps
I recall being told that clearing is a fast operation.
Quote:Original post by Centurion
because ur accelerator not optimized for using stencil buffer much.
Recall it's 6600GT if this isn't efficient, I don't know what is.
Quote:Original post by Centurionif u clear stenicil buffer too much that's will be absolutely origin of slow fps. Instead of clearing whole buffer, just draw your used portal with zero in stencil, hope that helps
I confirm DaBookshah, with a little caveat: clearing Z/Stencil togheter is almost free. Clearing Z only or stencil only is definetly more expensive but I guess the driver manages being faster than a polygon issuing.

Previously "Krohm"

I don't know where u got that clearing whole buffer is fast, in every tutorial i read about stencil buffer it's said underlined: It's much faster to draw ur clip-triangles to stencil twice (with +1, and after drawing -1), than drawing once and then clear whole buffer. Especially when ur clip-portal-triangles are so far so they just take 20-50 pixels, while whole buffer is one million pixels.

This topic is closed to new replies.

Advertisement