Jump to content

  • Log In with Google      Sign In   
  • Create Account


Questions to Yann: cache for shadow maps


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
23 replies to this topic

#1 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 11 July 2004 - 10:27 AM

I'm currently implementing a cache system for shadow maps, massive models and per-pixel lighting. I vaguely remember something you wrote about shadow maps in your engine some month (years?) ago, and after digging into the archives, i found this: Yann about shadow mapping. And indeed it's extremely similar to what i'm currently doing. Your last post was left unanswered, so if you don't mind, here are a few questions for you: 1. In your cathedral screenshot, which amount of video memory was allocated to the shadow maps cache ? To get a good quality i found i need _at least_ 50 Mb of shadow maps. In my engine i have a scale factor for shadow maps which i can adjust to lower the memory requirements, but then the quality of my shadow maps starts to decrease, and becomes noticeable. 50 Mb of shadow maps is quite a lot, and you can't compress them. 2. Again in your cathedral screen, how many lights are visible on screen ? You mention 39 lights for the whole cathedral, but they're not all visible at once. Also, how many lights do you have in each category (how many point lights, how many spot lights, and how many directionnal lights ?). Do you have some lights not casting shadows ? What are the typical resolutions for your shadow maps in that scene ? 3. How are you handling the soft edges ? I am antialiasing my spot/directionnal shadows from 4 to 8 times, but i get a speed hit that is quite high. It's even worse for omni lights since i must access a cube map many times. Basically, for a single scene, i am shader limited to 50 fps on a Radeon 9700 pro with a single light (filling the whole screen). I am guessing that's not the way you are antialiasing your shadows since you get 60 fps on a GF4. Are you rendering your shadows to a texture, blurring it, and modulating it in your lighting equation ? 4. Speaking of the lighting equation.. which per-pixel effects did you handle in this cathedral screenshot ? It only looks like dot3 bump mapping - i cannot see any specular/gloss contribution - but i might be wrong ? 5. I am going to implement LOD in the next months, but i am afraid to see some geometry popping in the shadows, as the shadow maps are cached, and not updated every frame.. did you have that problem ? 6. How do you quickly render your shadow maps to a texture ? I am using a Radeon 9700 and i must admit that even if rendering a 1024x1024 shadow map is quite fast, i hit a bottleneck to upload it to the graphics card. I'm not using the traditionnal render-to-texture approach, since given the amount of lights, it would require one PBuffer per light. Instead i'm using a constant amount of PBuffers, i generate my shadow map in the oldest one, and i copy its contents it to the graphics card by using glCopyTexSubImage2D. How were you doing it ? That's a few questions but i hope you do have the time to answer them :) Thanks, Y.

Sponsor:

#2 Yann L   Moderators   -  Reputation: 1794

Like
0Likes
Like

Posted 11 July 2004 - 11:32 AM

OK, let me find the jan-2003 source tree and docs. The shadowing system used in our engine changed a lot since then - I actually tried so many different approaches, I really don't remember from the top of my head which one was used at that time.

#3 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 12 July 2004 - 04:28 AM

Thanks Yann, i'm looking forward to reading your answers :)

Btw, if you or one of our fellow visitors are curious, here is a screenshot of my current system in action (one light only):



(Note that it's the same engine i'm developping for the Minas Tirith model).

Y.


#4 evolutional   Moderators   -  Reputation: 1044

Like
0Likes
Like

Posted 12 July 2004 - 04:30 AM

Quote:
Original post by Ysaneya
(Note that it's the same engine i'm developping for the Minas Tirith model).


This is going to be very special :)

#5 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 12 July 2004 - 04:34 AM

It's the same engine, not the same application :) This one above is a for an old-school 3D-maze like game (ala Dungeon Master, Eye of the Beholder, etc.. ).

Y.


#6 cheese   Members   -  Reputation: 162

Like
0Likes
Like

Posted 12 July 2004 - 05:50 AM

How can you write so many great engines so quickly? Nice detail in the walls, is that using the same techniques as the Unreal Tournament 3 engine?

For my own curiosity, how is your lightspeed hovercraft game going?

#7 quasar3d   Members   -  Reputation: 683

Like
0Likes
Like

Posted 12 July 2004 - 06:45 AM

very nice.

btw. If there's only one light, how does that wall straight in front of you get shaded? YOu would say that it would be totally black, or some equal coloour because of ambient lighting, but it looks brighter near the left wall.

#8 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 12 July 2004 - 10:15 AM

Quote:

How can you write so many great engines so quickly?


I don't.. all my projects are using the same engine which i'm improving on time :)

Quote:

Nice detail in the walls, is that using the same techniques as the Unreal Tournament 3 engine?


More or less. UT3 also has static lightmaps and stencil shadows, which i don't, but that's planned for later. Other than that, i am guessing my lighting equation is similar to UT3, yeah. It's doing pretty much everything that is state-of-the-art nowadays. Namely, 1 to 4 times shadow maps antialiasing (spot, directional or omni lights with cube maps, 16 times AA maybe in the future), dot3 normal mapping, parallax/offset bump mapping, per-pixel lighting with attenuation, specular/gloss mapping with per-pixel exponent, light diffuse & specular color, and probably a few things i'm forgetting.

However due to lack of time i haven't implemented a non-ARB_fragment_program code path, so basically it only runs on Radeon 9700+ / Geforce FX+. Most features are simply disabled if you don't have these.

Quote:

For my own curiosity, how is your lightspeed hovercraft game going?


Not very well i'm afraid. My artist was busy at university and/or on other projects, so there hasn't been any real progress on the art side for a few months, and i do not feel like investing more time to a project that doesn't have an active artist. I'm improving the engine and working on other projects until he can resume his work. If he's still busy in september i'll be looking for somebody else, as the game is almost playable.. it would be a real shame to not finish it. Hell, it even has a fully functionnal MFC editor :)

Quote:

very nice.

btw. If there's only one light, how does that wall straight in front of you get shaded? YOu would say that it would be totally black, or some equal coloour because of ambient lighting, but it looks brighter near the left wall.


Thanks. You are right on the shading of the wall, it was caused by a bug that i had already fixed, which was incorrectly applying a light specular effect in shadowed areas. Didn't thought somebody would notice it :)

Y.


#9 Yann L   Moderators   -  Reputation: 1794

Like
0Likes
Like

Posted 12 July 2004 - 03:07 PM

OK, I couldn't find source snapshots for january 2003 (it's on the company server, and I just realized I don't even have a local copy of the '03 source tree. Hmmm). I'll try to answer the questions from memory, and add information about how it was been done in later engine revisions (a lot of things were still broken and sub optimal at the time of that thread you linked to).

Quote:
Original post by Ysaneya
1. In your cathedral screenshot, which amount of video memory was allocated to the shadow maps cache ? To get a good quality i found i need _at least_ 50 Mb of shadow maps. In my engine i have a scale factor for shadow maps which i can adjust to lower the memory requirements, but then the quality of my shadow maps starts to decrease, and becomes noticeable. 50 Mb of shadow maps is quite a lot, and you can't compress them.

The cache for the shadow maps is around 32 MB in the current engine, but it's a user setting and can be modified at will.

Generally, the caching is handled using a priority scheme. Each light has an importance factor associated with it, a visual priority. The higher the priority, the more important the shadow is deemed by the system, and the more resolution it gets. Smaller priority maps get gradually less resolution. Direct sunlight always has maximum priority, and is guaranteed to get a 2048 map at least. In nightscenes, the moon takes the role of the sun, but with reduced map resolution. Maps assigned to sun or moon are never cached, as they are viewdependent and updated every frame.

All other lightsources are then assigned maps from the pool, using a modified LRU scheme (modified to take the priorities into account). If a lightsource was moved, or geometry changed within its visual range, its associated shadowmap is sent to an update manager. The manager tries to balance shadowmap regeneration over several frames, in order to avoid updating hundreds of maps in a single frame. Again, the visual importance factor helps a lot: lights with lower priority can be updated less often, and their update can be defered to a later frame. This can lead to an effect that the shadow of some low priority light lags behind the object (especially if the object moves very fast), but is generally unnoticeable if the system is well balanced.

The priority is assigned depending on several different visual metrics: distance from the viewer and occlusion based temporal coherence are the two most important ones. You can add several others, but this depends on your engine and the type of realism you're looking for. Also, visual metrics is the perfect spot to include a user adjustable shadow quality setting.

Quote:
Original post by Ysaneya
2. Again in your cathedral screen, how many lights are visible on screen ? You mention 39 lights for the whole cathedral, but they're not all visible at once. Also, how many lights do you have in each category (how many point lights, how many spot lights, and how many directionnal lights ?). Do you have some lights not casting shadows ? What are the typical resolutions for your shadow maps in that scene ?

I can't give you any detailed information about this particular scene, because we don't use it anymore. From memory, all lightsources in the cathedral were point lights, except for the sun, which is directional (and using PSM back then, now replaced by TSM). No spots in that scene, afair. Many lights didn't cast shadows, those were what we called 'virtual lights' back then. They were used to give the scene the radiosity ambience, but still keep the lighting dynamic. We used a lot of those virtual light sources, but their number was mostly important in preprocessing. We later replaced that system by dynamically controlled GI lighting, because we needed more flexibility and realtime HDR support.

Quote:
Original post by Ysaneya
3. How are you handling the soft edges ? I am antialiasing my spot/directionnal shadows from 4 to 8 times, but i get a speed hit that is quite high. It's even worse for omni lights since i must access a cube map many times. Basically, for a single scene, i am shader limited to 50 fps on a Radeon 9700 pro with a single light (filling the whole screen). I am guessing that's not the way you are antialiasing your shadows since you get 60 fps on a GF4. Are you rendering your shadows to a texture, blurring it, and modulating it in your lighting equation ?

The GeForce card series have dedicated hardware for bilinear filtered shadowmaps, a kind of poor man's PCF natively built into the chipset. All shadowmaps in the cathedral shot used that feature, which was available from GF3 upwards. ATI never supported that, because of some intellectual property issues they had with nvidia.

Quote:
Original post by Ysaneya
4. Speaking of the lighting equation.. which per-pixel effects did you handle in this cathedral screenshot ? It only looks like dot3 bump mapping - i cannot see any specular/gloss contribution - but i might be wrong ?

No, you're perfectly right. There were not enough regcom stages left on the GF4 to do nice looking speculars. I didn't manage to both normalize the half angle vector per pixel, and to exponentiate H dot N to an acceptable power. So instead of having sucky speculars, I just switched them off completely. I don't remember if the speculars were actually switched off or simply invisible on that particular shot, but I think it was the former. The problem came from the way the engine handled multiple shadowmaps back then, there were no texture units left for cubemap normalization of H (keep in mind, the GF4 only had 4 units). The thing was possible later on, using the new material and shader system, by dynamically distributing the work load over multiple passes. Thanks god for ARB_fragment_program, which solved all that ;)

Quote:
Original post by Ysaneya
5. I am going to implement LOD in the next months, but i am afraid to see some geometry popping in the shadows, as the shadow maps are cached, and not updated every frame.. did you have that problem ?

Yep. That will happen, unless you update each map every frame (or as soon as anything within the light frustum changes). But by carefully choosing the priority metrics, it can be minimized. Also, there are a few tricks we learned by trial and error:

* for example, if you need space in the cache, and want to downgrade an existing cached map to a lower resolution, you can simply resample the existing shadowmap without rerendering it (by using a nearest point filter). If you do that in gradual steps, users are unlikely to notice the temporal blur. For example, you have a 1024 map cached, but much of the light got occluded. So your priority metrics tell you that a 128*128 map is enough. Now, don't downsample in a single step (1024->128), but do it gradually: 1024->512->256->128, distributed over several frames.

* Another neat trick is gradual resolution increase. Imagine you have a high priority shadow suddendly coming into view, and requesting a huge shadow map. But you don't have the space in the cache right now, you first need to downgrade a few other maps, and rerender the new light. You don't want to do all that in a single frame. What you can do, is using a low resolution map even for the high priority light. The shadows will be blurry. The next frame, you render the light view at full resolution, but don't directly replace the map by the hires version. Instead, you quickly increase resolution over the next frames, until it finally matches the target quality. This works especially well with very bright light sources. It looks like the eye trying to adapt to the bright light, being a little out of focus at first, and getting sharp after a short while.

* A very mean, but highly effective trick: if possible, change the shadow maps when they are out of view. For example: the player moves towards a light source, the maps are still a bit too lowres. He turns around - the shadows are now out of view, the perfect opportunity to sneak in the new maps without the player noticing - turns back to the previous view, and magically the previously blurry shadows are now perfectly sharp. Noone is ever going to notice the change, unless he is specifically told to watch for the effect.

* When using point lights: from a certain distance on, you can safely switch from a cubemap to a dual paraboloid map. The latter will take considerably less memory, and require less work in case of an update. You can directly convert a hires cubemap to a lowres dual paraboloid map without rerendering the light view.

Quote:
Original post by Ysaneya
6. How do you quickly render your shadow maps to a texture ? I am using a Radeon 9700 and i must admit that even if rendering a 1024x1024 shadow map is quite fast, i hit a bottleneck to upload it to the graphics card. I'm not using the traditionnal render-to-texture approach, since given the amount of lights, it would require one PBuffer per light. Instead i'm using a constant amount of PBuffers, i generate my shadow map in the oldest one, and i copy its contents it to the graphics card by using glCopyTexSubImage2D. How were you doing it ?

I use RTT pbuffers only for the large map entries in the cache. I then use a single shared pbuffer for all other, lower resolution entries. Updating the hires textures is a simple render-to-texture operation. For each lowres light to update, I first render the light view into the shared pbuffer, and copy it to the target texture. While the copy operation does cost a little more than a direct RTT, everything stays on the card, nothing runs over the AGP bus. The shared pbuffer will also heavily reduce the number of context switches required.

Also, don't forget that you can tile several smaller 2D maps onto a single larger shadowmap. You just need to be careful with the clamping, and it won't work on cubemaps. It works very well for spot lights however, or for point lights using dual paraboloid maps.

#10 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 12 July 2004 - 09:26 PM

Thanks Yann, these answers have been helpful.

Quote:

Maps assigned to sun or moon are never cached, as they are viewdependent and updated every frame.


Ok, so because you are using PSM/TSM, you need to update the sun's shadow map every frame.. that makes sense. But i have no idea how you were able to do it so quickly on a GF4 at 60 fps with scenes like 100 to 300k triangles in view. If you download the PSM demo on NVidia's site (which also has a fairly complex scene), the framerate is more like in the 20-30 fps, on a Geforce fx. And you're rendering to a 2048x2048. Is there a trick to get such a speed ?

Quote:

From memory, all lightsources in the cathedral were point lights, except for the sun


Were some all of them per-pixel, or some per-vertex ?

Quote:

but their number was mostly important in preprocessing.


What were you preprocessing for your lights ?

Quote:

The GeForce card series have dedicated hardware for bilinear filtered shadowmaps


I see, the most simple explanation was the good one. Did you left the shadows sharp on ATI cards ?

Quote:

for example, if you need space in the cache, and want to downgrade an existing cached map to a lower resolution, you can simply resample the existing shadowmap without rerendering it


That sounds interesting, but it's not clear to me how it worked. If your shadow maps contain a depth information, how can you downsample this ? The most immediate idea is to render-to-texture (say 512x512) in ortho mode a single quad using the original 1024x1024 texture. But how do you perform the depth-copy operation ?

Quote:

if possible, change the shadow maps when they are out of view.


That's a neat idea, and i think i've done it the wrong way in my system so far. When my shadow maps are out of the view i simply free them from the cache to make room for other lights. It obviously becomes a nightmare when you are turning the head quickly. I need a bit of time to think about and improve the priorities thing.

Y.


#11 Ingenu   Members   -  Reputation: 827

Like
0Likes
Like

Posted 12 July 2004 - 10:41 PM

Quote:
Original post by Yann L
The shared pbuffer will also heavily reduce the number of context switches required.


so GL_EXT_render_target and ARB_super_buffer (both yet to be released) would solve that problem and let you set everything as a Render_Target, right ?

-* So many things to do, so little time to spend. *-

#12 davidino79   Members   -  Reputation: 156

Like
0Likes
Like

Posted 12 July 2004 - 11:34 PM

Quote:
Original post by Yann L


The cache for the shadow maps is around 32 MB in the current engine, but it's a user setting and can be modified at will.

Generally, the caching is handled using a priority scheme. Each light has an importance factor associated with it, a visual priority. The higher the priority, the more important the shadow is deemed by the system, and the more resolution it gets. Smaller priority maps get gradually less resolution. Direct sunlight always has maximum priority, and is guaranteed to get a 2048 map at least. In nightscenes, the moon takes the role of the sun, but with reduced map resolution. Maps assigned to sun or moon are never cached, as they are viewdependent and updated every frame.

All other lightsources are then assigned maps from the pool, using a modified LRU scheme (modified to take the priorities into account). If a lightsource was moved, or geometry changed within its visual range, its associated shadowmap is sent to an update manager. The manager tries to balance shadowmap regeneration over several frames, in order to avoid updating hundreds of maps in a single frame. Again, the visual importance factor helps a lot: lights with lower priority can be updated less often, and their update can be defered to a later frame. This can lead to an effect that the shadow of some low priority light lags behind the object (especially if the object moves very fast), but is generally unnoticeable if the system is well balanced.

The priority is assigned depending on several different visual metrics: distance from the viewer and occlusion based temporal coherence are the two most important ones. You can add several others, but this depends on your engine and the type of realism you're looking for. Also, visual metrics is the perfect spot to include a user adjustable shadow quality setting.


Is The pripority used to create the shadow map at the correct resolution also used to obtain the most influencing light for a geometry chunk?
Thanks Davide

#13 Yann L   Moderators   -  Reputation: 1794

Like
0Likes
Like

Posted 13 July 2004 - 06:24 AM

Quote:
Original post by Ysaneya
Ok, so because you are using PSM/TSM, you need to update the sun's shadow map every frame.. that makes sense. But i have no idea how you were able to do it so quickly on a GF4 at 60 fps with scenes like 100 to 300k triangles in view. If you download the PSM demo on NVidia's site (which also has a fairly complex scene), the framerate is more like in the 20-30 fps, on a Geforce fx. And you're rendering to a 2048x2048. Is there a trick to get such a speed ?

The 2048 map is on the new engine, the resolution on the GF4 was probably lower (I can't say for sure, as the resolution selection by the cache manager is dynamic, and depends on the viewpoint. It also is influenced by the framerate - if it drops too much, the resolutions are reduced). The view in the cathedral screenshot is also the most optimal case for our modified PSM: orthogonal light coming from above the viewpoint, and from the side, almost at 90° angle to the view direction. That's why the shadows still look crisp in that shot - although they aren't really that sharp, the complex geometry they are projected on gives that impression. Most of the lighting complexity actually comes from the simple N*L equation.

About the speed, well the sun is comming from above and outside of the cathedral. The outside walls all act as occluders, with the upper windows as the only openings. From the light view, a lot of geometry is culled away by the HOM. Furthermore, the inside is rendered to the shadowmap using a lower LOD. There isn't that much geometry to render to the SM, it's more a fillrate issue. That again can be reduced by clever resolution selection, using RTTs on larger maps, and using a very simple fragment pipeline setup in order to reduce bandwidth requirements. Also keep in mind, that the depth render pass won't access any textures (except if you want alpha masking), which is often a fillrate bottleneck.

[quote]
Were some all of them per-pixel, or some per-vertex ?
[quote]
That's dynamic. Switching from per-pixel to per-vertex lighting is a part of the light LOD system.

Quote:

What were you preprocessing for your lights ?

For standard dynamic lights, nothing except their influence bounding box. For virtual lights (a simple radiosty look-alike simulation), ambient shadows were approximated as a preprocess. It was then decided if the light requires a realtime shadow, or if the light transfer could be prestored as directional incidence components per vertex (read: a small cubemap per vertex). That's similar to spherical harmonics, only that incident light is only stored over very few quantized directions. AFAIR, we used 8 directions per vertex, encoded in two vertex streams as RGBA - one direction per component. The decoding is done in a vertex program. This system gives you more or less dynamic ambient light, ie. for simulating the diffuse indirect light from a moving sun.

Back then, this system was used per vertex. Later on, we added directional indirect light incidence maps (DILIMs), which take the per vertex concept to the per-pixel level and allow much better precision. The system gives similar visual results as SH lighting does, but doesn't have the low-frequency cutoff problem of spherical harmonics: it can store very sharp but still perfectly dynamic ambience light. Imagine each DILIM texel as a small cubemap storing the incident light over a few quantized directions.

Also, the concept of virtual lights was dropped, in favour of a special PR radiosity / photon mapper combo, that creates the DILIMs directly (basically by storing directional incidence information per lightmap texel, instead of a simple RGB illumination value). The result of either the per-vertex directional information (as used in the cathedral), or the DILIMs is then combined with the light from the fully dynamic sources. This gives a very pleasing combo of fully dynamic light sources and a GI ambience.

Quote:

I see, the most simple explanation was the good one. Did you left the shadows sharp on ATI cards ?

Yep, in the non-ARB_FP path, the shadows are non-filtered on ATI cards.

Quote:

That sounds interesting, but it's not clear to me how it worked. If your shadow maps contain a depth information, how can you downsample this ? The most immediate idea is to render-to-texture (say 512x512) in ortho mode a single quad using the original 1024x1024 texture. But how do you perform the depth-copy operation ?

What do you mean by depth-copy operation ? You simply render the textue values into the depth component (or another channel, depends on your shadow map format) using a fragment shader. The result is the same as you would get if you rendered normal geometry. We didn't use that system on the GF4, because it would be rather tricky to make it work on a texture shader setup (although not impossible). It really depends on the texture data format you use for you shadow maps. What format do you use ? A real depth format, a packed RGBA, a floating point texture ?

Quote:

That's a neat idea, and i think i've done it the wrong way in my system so far. When my shadow maps are out of the view i simply free them from the cache to make room for other lights. It obviously becomes a nightmare when you are turning the head quickly. I need a bit of time to think about and improve the priorities thing.

Only throw out maps based on frustum culling information if absolutely required (ie. if the alternative would be a cache overflow). In typical FPS games, the player will constantly turn the head around, so "it's behind the camera" is a bad metric. Occlusion is a much better and more reliable parameter.

Quote:

so GL_EXT_render_target and ARB_super_buffer (both yet to be released) would solve that problem and let you set everything as a Render_Target, right ?

Yep.

Quote:

Is The pripority used to create the shadow map at the correct resolution also used to obtain the most influencing light for a geometry chunk?

Not directly. The most influencing light per CG is an object space operation, and is based on distance, light intensity and attenuation. The priority metric, however, is a purely eye space and view dependent operation. While it also uses distance and intensity as factors, it can take additional view dependent factors such as screenspace coverage and occlusion into account.

#14 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 13 July 2004 - 08:15 AM

Quote:

Furthermore, the inside is rendered to the shadowmap using a lower LOD.


Thanks, i think that's the main reason why i'm getting low framerates. I delayed the implementation of LOD after the one of the shadowing cache, but i can see now that i cannot have a good idea of the final performance without it. It's not rare for me to have to render up to 500k polys when updating a single shadow map.

Quote:

Also, the concept of virtual lights was dropped, in favour of a special PR radiosity / photon mapper combo


That's what i had in mind too, as you get the benefits of dynamic lighting/shadowing, but can also have static lights with soft shadows (thanks to the lightmap filtering) and ambient lighting.

Quote:

Yep, in the non-ARB_FP path, the shadows are non-filtered on ATI cards.


And in the ARB_FP path ? Are you antialiasing them in the fragment shader ? If so how many samples, and did you notice a large performance drop ? I also tried to dither it when using a low amount of samples (1 or 4 samples), but it looked quite ugly.

Quote:

What do you mean by depth-copy operation ? You simply render the textue values into the depth component (or another channel, depends on your shadow map format) using a fragment shader.


But can you really do that ? I read that accessing the depth component of a texture in a pixel shader is truncated to 8 bits on NVidia cards.

I'm using ARB_depth_texture for spotlights, and a cube map with depth encoded in RGBA for my point lights.

Another question before i forget, i am assuming you were using a cube map with encoded RGBA for point lights (as depth cube maps aren't supported), but doesn't that mean you also had sharp edges (no hardware PCF like with depth 2D textures on NVidia cards) ?

Y.

#15 Yann L   Moderators   -  Reputation: 1794

Like
0Likes
Like

Posted 15 July 2004 - 09:45 AM

Quote:
Original post by Ysaneya
Thanks, i think that's the main reason why i'm getting low framerates. I delayed the implementation of LOD after the one of the shadowing cache, but i can see now that i cannot have a good idea of the final performance without it. It's not rare for me to have to render up to 500k polys when updating a single shadow map.

LOD does a lot, obviously. You can also use image based rendering techniques during the depth map creation. Parallax errors tend to be much less visible on a shadow map, than they would be in an actual rendering. Also, aggressive occlusion culling can help a lot.

The human brain mainly uses shadows as cues to evaluate the depth and relative positioning of objects in an environment. However, the brain is no raytracer - when the environment is complex enough, it can't determine the projective accuracy of each single shadow. It will directly spot artifacts such as holes or light bleeding, but it will have a very hard time spotting small perspective anomalies and parallax errors in the shadows. As long as the global shadowing is mostly correct, the scene will look perfectly natural. Take advantage of this.

Quote:

And in the ARB_FP path ? Are you antialiasing them in the fragment shader ? If so how many samples, and did you notice a large performance drop ?

We have different fragment programs with different amounts of samples, the system selects one based on a user preference setting. Obviously there is a performance drop, and a unfortunately a significant one when using many samples. That's the price you pay for quality. But 4 samples are acceptable in terms of speed, and are comparable to nvidias hardware PCF from a quality point of view. It doesn't look very nice if the map resolution is low, but at higher resolutions, quality is generally OK in most scenes. If people have more powerful hardware, they can increase the sample count.

Quote:

I also tried to dither it when using a low amount of samples (1 or 4 samples), but it looked quite ugly.

Hmm. How exactly did you do the sampling ? I suspect something specific, but, hmm, can you post the relevant fragment program snippet ?

Quote:

But can you really do that ? I read that accessing the depth component of a texture in a pixel shader is truncated to 8 bits on NVidia cards.

Yep, z-depth maps will be treated as 8bit monochrome textures when bound as colour texture. But there seem to be a little misunderstanding due to my sloppy terminology: when I said "depth map", I was referring to your texture containing depth values, I didn't mean the GL_DEPTH_COMPONENTxx format specifically. No, you can't directly render a GL_DEPTH texture to the depth buffer without going through an intermediate format (eg. as a packed RGBA or floating point depth map, it can be rendered to the "real" zbuffer using a fragment program). If you keep everything in packed RGBA, then downsampling the map on the hardware is rather straightforward. Don't forget to turn off bilinear filtering, though.

Quote:

I'm using ARB_depth_texture for spotlights, and a cube map with depth encoded in RGBA for my point lights.

OK. In that case, the hardware supported downsampling will be a little more involved, and might require a readback operation.

Quote:

Another question before i forget, i am assuming you were using a cube map with encoded RGBA for point lights (as depth cube maps aren't supported),

On the ARB_FP path, yes. But not on the GF3/4 path, as this hardware can't unpack the texture as needed. First, I simulated depth cubemaps manually, by projecting six separate 2D maps and recombining the results. That added a lot of complexity to the render engine, and was less than optimal (it was the system used on the cathedral shot, and a main reason why I run out of texture units on the GF4). I later dropped cubemaps in favour of dual paraboloid entirely. Even later, when I added the final ARB_FP code path, I used cubemaps again, with optional DP maps as a fallback option and as part of the LOD system.

Quote:

but doesn't that mean you also had sharp edges (no hardware PCF like with depth 2D textures on NVidia cards) ?

Well, on GeForce FX and above, the ARV_FP path is taken, just as with ATI cards - the filtering is then done in the fragment program, so the format doesn't really matter (although I use GL_DEPTH maps wherever possible on NV hardware, because the hardware PCF is faster - unless the user selects a high shadow quality setting).

On GF3/4, some variation of GL_DEPTH is used, projected in various ways depending on the light type. Antialias is then done by nvidias' hardware filter.


#16 Thorris   Members   -  Reputation: 122

Like
0Likes
Like

Posted 15 July 2004 - 10:12 AM

Hello,

Quote:
Also, aggressive occlusion culling can help a lot.


Do you use occlusion culling for all shadow maps, or maybe just for the sun shadow map? I know you use a software rasterizer to create the occlusion maps, but even if you take advantage of CPU/GPU concurrency, this seems to be a big overhead.

Maybe when traversing the tree you cull against the bounding volume of the light. This could save a bit of work.

Ciao, Thorris.

#17 Yann L   Moderators   -  Reputation: 1794

Like
0Likes
Like

Posted 15 July 2004 - 10:32 AM

Quote:
Original post by Thorris
Do you use occlusion culling for all shadow maps, or maybe just for the sun shadow map?

Always for the sun map. For the other lights, it depends. By default, the system will render an occlusion map per light. IT then monitors the efficiency of the culling, ie. it maintains a simple culled nodes to global nodes ratio over some frames. If the ratio drops below some threshold, then the occlusion is inefficient, and the system doesn't perform it anymore on that light. Until the light source moves, or some major geometry changes in its influence volume - then the system will try again.

Quote:
Original post by Thorris
Maybe when traversing the tree you cull against the bounding volume of the light. This could save a bit of work.

For point and spot lights, this influence distance clipping is implicitely done by the far plane of the light frustum.

#18 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 15 July 2004 - 11:19 AM

Quote:
Original post by Yann L
the price you pay for quality. But 4 samples are acceptable in terms of speed, and are comparable to nvidias hardware PCF from a quality point of view.


Comparable to hardware PCF ? My experience has been the opposite, but i only tested my code on ATI cards, so i don't know if the result on NVidia cards looks similar or not (note to self: take some time to test it). My logic tells me it should as i'm only using the ARB_fp path. I only have 4 shades of brightness in my shadows edges (with 4 samples, that is), and from what i saw, NVidia's PCF is similar to bilinear filtering the results (it looks nice and smooth). Unless you are looking at the shadows from the distance, but if you zoom in, even 4 samples are looking quite ugly. 8 samples is a bit better but you can still notice the sampling. 16 samples is almost perfect but is horribly slow. Do you have some screenshots of 4 samples so that i can compare with what i got ?

Quote:
Original post by Yann L
Hmm. How exactly did you do the sampling ? I suspect something specific, but, hmm, can you post the relevant fragment program snippet ?


The sampling for dithering is done in eye space. That's one of the reasons why it looks ugly. I basically have a small noise texture (a pattern tiled a lot of times) that is used to offset randomly the tex coords after the projection of the shadow map.

I'm only posting the relevant parts of the shader:


# dithering
TEMP texc;
TEMP screenPos;
TEMP dither;

# texcoord 1 contains the tex coords for shadow projection
TXP dither, fragment.texcoord[1], texture[1], 2D;
MAD dither, dither, 2.0, -1.0;
MUL dither, dither, 0.0005;

RCP texc.w, fragment.texcoord[1].w;
MUL texc, fragment.texcoord[1], texc.w;
MUL screenPos.x, texc.x, 400.0;
MUL screenPos.y, texc.y, 300.0;
FRC screenPos, screenPos;

SGE dither, screenPos, 0.5;
ADD dither.y, dither.y, dither.x;
SGE dither.z, dither.y, 1.1;
SUB dither.z, 1.0, dither.z;
MUL dither.y, dither.y, dither.z;
MUL dither, dither, 0.0005;


After this, "dither" contains an offset that is applied when sampling the shadow 4 times. I also tried with a regular pattern (not so random) but results the are always ugly, and the performance drop is tremendous.

Quote:
Original post by Yann L
misunderstanding due to my sloppy terminology: when I said "depth map", I was referring to your texture containing depth values, I didn't mean the GL_DEPTH_COMPONENTxx format specifically. No, you can't directly render a GL_DEPTH texture to the depth buffer without going through an intermediate format (eg. as a packed RGBA or floating point depth map, it can be rendered to the "real" zbuffer using a fragment program). If you keep everything in packed RGBA, then downsampling the map on the hardware is rather straightforward. Don't forget to turn off bilinear filtering, though.


True, but then you do not benefit from NVidia's hardware PCF. All of that is becoming a bit confusing :) So as i understand it, since in your cathedral you were using hardware PCF, you were not able to use that shadow map redimensionning trick, or am i even more confused than i thought :p ? Or is there a way to enable hardware PCF in a pixel shader (doubtful) ?

Quote:
Original post by Yann L
OK. In that case, the hardware supported downsampling will be a little more involved, and might require a readback operation.


I think i'll just switch all my shadow maps to pixel shaders with depth encoded as RGBA. That way i will have a single path for all cards. Remains the question of compatibility with older cards like the GF3/GF4. Can all of these shaders be implemented with NV pixel shaders ?

Quote:
Original post by Yann L
hardware can't unpack the texture as needed. First, I simulated depth cubemaps manually, by projecting six separate 2D maps and recombining the results.


I also tried that, but the performance drop is quite impressive too...

Quote:
Original post by Yann L
I later dropped cubemaps in favour of dual paraboloid entirely. Even later, when I added the final ARB_FP code path, I used cubemaps again, with optional DP maps as a fallback option and as part of the LOD system.


That makes me wonder.. i know the theory behind DP maps, but would you say it is really worth the effort ? As i understand it you need a pretty highly tesselated scene in order to avoid the artifacts due to the texture coordinates interpolation which should no longer be linear.

Y.


#19 Yann L   Moderators   -  Reputation: 1794

Like
0Likes
Like

Posted 15 July 2004 - 01:47 PM

Quote:
Original post by Ysaneya
Comparable to hardware PCF ? My experience has been the opposite, but i only tested my code on ATI cards, so i don't know if the result on NVidia cards looks similar or not (note to self: take some time to test it). My logic tells me it should as i'm only using the ARB_fp path. I only have 4 shades of brightness in my shadows edges (with 4 samples, that is), and from what i saw, NVidia's PCF is similar to bilinear filtering the results (it looks nice and smooth).

Ah, that's exactly what I was suspecting in my post above :) OK, the problem is terminology again, or better our loose use of the term "PCF". There is "real" PCF, and there is fake PCF. What you are (probably) doing is the real thing: at each fragment, take x samples and average the result. Of course, if you only take 4 samples, you'll get maximal 4 shades of grey, which isn't enough. I agree that you need more samples for real PCF, that's what the "high quality" setting I mentioned in my post does - and that's why I'm disabling nvidia style PCF when using that setting.

Now, nVidia does something very different. It's not percentage closer filtering at all, it's just a cheap approximation. It uses also 4 samples, thus our little misunderstanding above :) In fact nvidia is just doing that: bilinear filtering. They take four depth comparisons per fragment, but at the four shadowmap texel corners. Then they compute the fractional position of the current fragment within the shadow texel (in both u and v directions), and linearly interpolate between the results of the four corner comparisons. Basically, it's just bilinear filtering of four 1-sample shadow comparisons. You can simulate that behaviour in your pixel shader, and you'll get similar results as the hardware fake-PCF nvidia uses. It's also pretty fast, as it only uses 4 depthmap samples. Compared to real PCF with more samples, the quality will be worse, oviously.

Quote:
Original post by Yann L
True, but then you do not benefit from NVidia's hardware PCF. All of that is becoming a bit confusing :) So as i understand it, since in your cathedral you were using hardware PCF, you were not able to use that shadow map redimensionning trick, or am i even more confused than i thought :p ?

That's correct. We use that trick in later engine revisions.

Quote:

Or is there a way to enable hardware PCF in a pixel shader (doubtful) ?

Well, you can simulate it within the shader, but you can't control the hardware feature on per-pixel base.

Quote:

I think i'll just switch all my shadow maps to pixel shaders with depth encoded as RGBA. That way i will have a single path for all cards.

That's a possibility.

Quote:

Remains the question of compatibility with older cards like the GF3/GF4. Can all of these shaders be implemented with NV pixel shaders ?

Nope. GF3/4 don't do pixel shaders at all. They have "texture shaders", which are basically a set of predefined fragment programs encoded in the GPU. You can select and combine those programs, but this is rather limited. You can't unpack an RGBA encoded depthmap, and you can't do PCF in a shader. With a GF3/4, your only real option is to use GL_DEPTH maps, and the built-in hardware fake-PCF. You can't use native cubemaps then anymore.

Quote:

That makes me wonder.. i know the theory behind DP maps, but would you say it is really worth the effort ? As i understand it you need a pretty highly tesselated scene in order to avoid the artifacts due to the texture coordinates interpolation which should no longer be linear.

Depends on how you define high tesselation. I never encountered any major problems in our engine, but our scenes are in fact pretty well tesselated. There are also some tricks to optimize the technique, and partially avoid the tesselation issue. here are some interesting notes.

Whether they are worth the effort or not highly depends on your engine, and on what kind of scenes you apply the shadows to. In our case, it was worth it - but your mileage may vary. They're the only viable option to get pointlights on GF3/4 type hardware (due to the lack of shadow cubemaps), so that's a plus point. If you restrict your codepath to DX9 type hardware, then cubemaps will most certainly be easier and more versatile. If you have to increase your scene resolution just to fit DP mapping, then forget it. But if it works without changing the scene data, then you should give them a try.

#20 Ysaneya   Members   -  Reputation: 1235

Like
0Likes
Like

Posted 15 July 2004 - 11:03 PM

Quote:
Original post by Yann L
the problem is terminology again, or better our loose use of the term "PCF". There is "real" PCF, and there is fake PCF. What you


I know :) When i mention "hardware PCF" for NVidia cards, i actually think to the bilinear filtering trick to make the shadow edges smoother. It's not real PCF for sure.

Quote:
Original post by Yann L
In fact nvidia is just doing that: bilinear filtering. They take four depth comparisons per fragment, but at the four shadowmap texel corners. Then they compute the fractional position of the current fragment within the shadow texel (in both u and v directions), and linearly interpolate between the results of the four corner comparisons. Basically, it's just bilinear filtering of four 1-sample shadow comparisons. You can simulate that behaviour in your pixel shader, and you'll get similar results as the hardware fake-PCF nvidia uses. It's also pretty fast, as it only uses 4 depthmap samples. Compared to real PCF with more samples, the quality will be worse, oviously.


That does sound interesting but i'm not sure to see how you can do that in a pixel shader.

Here's what i currently got:





Particularly in that last shot, you can see the shades of gray with 4 samples.. it looks pretty good from the distance (first screen), but quite ugly when you start to zoom. I'll eventually switch to TSMs for directional lights but i'll still use antialiased shadow maps for spot or omni lights, so i'd like to fix that problem without having to use > 8 samples.

Speaking of TSMs, i've read the papers about it, but how well does it handle frustums with an infinite (or at least very far) far clipping plane ?

Quote:
Original post by Yann L
Depends on how you define high tesselation. I never encountered any major problems in our engine, but our scenes are in fact pretty well tesselated. There are also some tricks to optimize the technique, and partially avoid the tesselation issue. here are some interesting notes.


I know, Tom is an old collegue of mine, so i'm pretty aware of his work on DPSMs :) But i remember him saying it was only useful when the scene is already pretty well tesselated. I'm not sure i can guarantee that, but when i'll have some time i'll just test it and see if it's good/bad enough.

Y.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS