Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

192 Neutral

About multisample

  • Rank
  1. multisample

    deferred shadows

    Doh, I have ShaderX5 (I used it as a reference for CSM, good article btw), but I guess I missed the shadow mask/collector terminology. I thought I remembered the book discussing it, but not actually naming it. My bad. BTW, thanks for the response. I probably owe you a beer by now.
  2. multisample

    deferred shadows

    > so we are talking about deferred shadows or deferred rendering systems? One is independent from the other an Sorry, I guess I wasn't clear enough. Yes, I was talking about deferred shadows and NOT transparent shadow maps and the like. And obviously you don't want translucent objects to cast shadows in standard shadow mapping. But I usually want them to receive shadows. In my forward rendering system, translucent objects receive shadows with the exception of particles. So if I have a glass window lit by the sun, it will get proper shadows from opaque shadow casters. But what happens when I use a deferred shadow mask/collector. The mask can only contain the result for the opaque objects, as the zpass depth has only opaque depths. So my question is, what do developers use in this case ? 1) No shadow received on translucent objects (ignore shadow mask, shadowAtten=1) 2) Use the shadow mask that was calculated for the opaque object behind it. 3) Mix of 1 and 2 (let the artist choose per material instance). 4) Use actual/original shadow map (defeats some of the purpose and benefits and couples shadow sampling back into shader for translucents). 5) Other... ??? I tested #2 and its generally pretty good based on the type of translucent object. ie if its just a layer above an opaque object, it works well. For other items, not so good. So I wonder if #3 is a reasonable option. wolf: I assume a shadow collector is the same as a shadow mask ? As this terminology is not really documented, I would like to know for sure. As per CSM: Yeah, it seems its the way to go for large range lights. I am using it for directional light across a city. Only issue I had is not using an array index in the shader for calculating the right uv inside the atlas, because certain hardware turns indexing into gobs of cmp instructions.
  3. multisample

    deferred shadows

    Yeah, I was guessing that they probably didn't shadow them. Most things I have read have either skipped the topic, or just said it doesn't quite work. I am still interested on what different solutions those that use it attempt. Anyone else using it, please chime in. I just tested deferred CSM and found that in most cases using the computed opaque shadow value for the transparent objects works (particles not included, i don't usually shadow those directly anyways due to perf costs). Of course this depends on how you create your transparent objects. If they are just a layer just above or in front of an opaque surface, it seems they will probably shadow correctly. If its not, then it will be wrong. I guess you could specify it per material to allow it to use the shadow value or not. The traditional way for blended is still an option, but I think it defeats the main reasons for the technique (simplified main shaders, independent shadow sampling, reduced shadow buffer memory if buffer reuse is an option).
  4. Don't assume that passing in an interpolator is going to be any less expensive. In fact, if you talk about just shader performance (not counting shader patching cost), its probably slower to use an interpolator. If you end up having to patch even one constants, then you might as well patch them all. If you are worried about ps3 and shader patching, then I suggest you look into the latest tricks to get around the patching issues. A little bird told me its on their devsite. If your not on ps3, then theres not much you can do about it for those cards. Just let the driver writers deal with that.
  5. multisample

    Variance Shadow Maps

    Quote:Original post by krausest Another thing I noticed is that when using GL_RGBA16F and storing the moments in two components to enhance precision I see way more artefacts when using GL_LINEAR filtering (like "random" white pixels) than when doing linear interpolation in the shader and using GL_NEAREST. Sounds like a bug or are that expected precision problems? Thanks! If your using NVIDIA cards (7 series or lower, I can't speak for the 8 series), then this is "normal" behaviour. AFAIK, Nvidia cards interpolate in the using the bit depth that the texture is in. So a F16 texture uses F16 for interpolation, causing the artifacts. I sure hope thats changed for the 8 series. I don't believe recent ATI cards do this, but don't quote me on that. This issue is the main reason I had to dump VSM as a general solution (at least for now). I really want to use VSM, but it has serious problems using F16 based formats, and thats all most cards (nvidia) can do these days respectably. Add large light range to the equation (for city levels etc) and it gets worse . If someone solves this it would be much appreciated (free beer? :) ).
  6. You need to declare your output structure as an "output". as in: // note i added "out" to tell compiler this is an output void vs(a2v InVS, out v2p OutVP) another option is to just return the structure: v2p vs(a2v InVS) { v2p OutVP; // ... paste old code return OutVP; }
  7. Quote:Original post by jamesw Hmm, that's strange. Like you said, the GPU programming guide says that D3DFMT_G16R16 is not renderable, but the caps viewer on my machine says it is. Anyone actually tried this out on a geforce 6/7? Sigh, I hate the caps stuff ... I believe it can have the "usage" of rendertarget (ie D3DUSAGE_RENDERTARGET), but its not a RenderTarget "format". In the Caps viewer for this laptop (Nvidia 6800 Go), D3DFMT_G16R16 is not listed under "Render Target Formats" but is listed as a texture format that can have RenderTarget usage. I believe all this means is that it can be used as a source or dest when doing a StretchRect from texture to texture. I could definitely be wrong about this, so you may want to ask Nvidia. My hunch is to believe the table in the GPU Programming Guide though.
  8. Quote: Both D3DFMT_G16R16F and D3DFMT_G16R16 are renderable and filterable on my 6800GT. Actually, only fp16x2 is renderable on the 6/7 series (according to NVIDIA's GPU Programming Guide and my experience). Both formats are filterable. NVIDIA could be using an fp16x4 in the background behind our back, but I don't think they are. AFAIK, filtering on NVIDIA's hw happens at the same bit/precision as the source, so fp16 is filtered using fp16, rgba8 using rgba8, and fp32 as fp32. Hence filtering results can be less than spectacular when you filter outside of the range/precision of the source. Quote:Original post by AndyTXATI has quite a few papers on this. Indeed the goal is to try and futher hide the discrete nature of the shadow map - it works well with enough samples. On ATI you can use Fetch4 to do quite a few samples and get a pretty good result actually. Regarding bilinear weights, you should probably always use them. Otherwise you're going to be dealing with only N possible contributions... unless you're taking 256 samples or more (unlikely), this is quite undesirable. I haven't found many papers that use only around 4-6 samples which is what I am finding as acceptable for what I need (multiple lights, multiple shadows, current gen). If I had only 1 shadow the 12 sample one might be doable. Anyone have a reference to any lower sample rotated/jittered versions that look good ? As for the Bil weights, I totally agree. I messed around with this a bit and i personally think its almost essential. Someone could probablty work out the 4x4 dithered to use bil weights for each pixel and get good results. At a decent distance the 4x4 dithered looks pretty good (due to the "noise" factor), so a bilinear weighted version may work well. As per VSM and fp16, I was fighting the filtering problems (I could live with manual filtering if I can blur them), but I still had problems with the medium to larger light ranges. I believe its due to the "square" of the depth, since it requires much more precision than the depth itself. I was wondering if the Ch. Inequality would work with depth and sqrt(depth) as that might give better results (in this case treating sqrt(depth) as the random variable, and depth as the "square" of the random variable. You would probably still have issues though. What were your results with light ranges that were acceptable (relative to your smallest shadowable range) ? Maybe I should move this to another thread as I think I may have hijacked it a bit :)
  9. I've found Manual Bilinear 3x3 PCF (weighted PCF, not the simple averaging) gives very good results even with a large magnification of a small shadow map. Its quite expensive though, as it takes 9 samples. It looks much better than some limited VSM blurs, but is probably/most-likely much more expensive. Better VSM blurs (which require more time) would probably be just as good minus any light bleeding artifacts. NOTE: My VSM experience is somewhat limited as I got it working, but spent most of the time dealing with 16bit fp precision issues on nvidia hardware for moderate to long ranges (even storing -1->1, and trying fp16x4). Also had issues with nvidias fp16 texture filtering causing artifacts compared to manual bilinear filtering in "large" light ranges. Haven't had time to get back to VSM due to other work. fp32x2 works quite well, but is sloooooowww on most current cards. You could probably get similiar results with 3-4 weighted hardware bilinear PCF (nvidia cards only) lookups, offset from the original sample points. A 4x4 dithered average PCF (only 4 samples per pixel, mentioned in one of the graphics gems books i think) looks good for less work, but still has some visible blockiness in the worst situations due to the averaging. It does require the screen position to calculate the sample offsets, so its still got a few more instructions than a standard 4 sample filter. I hear of people using rotated sample locations (4 samples or more) to good effect, but I haven't tried it much. If anyone (wolf?) wishes to speak about there usage/attempts at this, it would be welcome by me as well. Anything that is ordered (such as averaging PCF) will probably look "blockier" unless jittered, or rotated. The bilinear weighting helps this to some effect compared to simple averaging, but its still not as good as jitter/rotated. I'm no expert on this subject, so you can always just try things. There are a few papers out there discussing and comparing different PCF techniques and Poisson disc filters and the like. All that said, I would also keep working on VSM. I think it has good potential, and anything that could reduce texture mem usage for shadows (ie store at lower resolutions for comparable quality) will help speed tremendously.
  10. multisample

    Fast Box Filtering for Soft Shadows

    Might as well start a new discussion, as I am sure other people (including myself) would be interested. Would you be storing 2 depths and 2 depths squared, or just using the second depth as a reference depth ? Or is it more complicated than that (Not sure why you would need more than 2, but I'm no shadow expert ... yet :) )
  11. 20-30 lights per-pixel would be expensive on almost any rendering method (deferred or forward) at the moment. I don't want to get into a deferred/forward debate in this thread though, we can do that elsewhere :) So just remember that you can do multiple lights per pass (its not that hard). Deferred will scale better for more lights, no doubt. One alternative is to do full lighting calcs on the closest N lights, and for TOTAL-N just approximate them with a few "fake" lights that are averages of their light dir/intensity etc. If you need shadowing, your out of luck doing that, but your probably in trouble anyways. This method is used in quite a few pc/ps2 games. We used to do 2-3 main lights plus ambient, then approximate the rest with one light. You could probably just segment the remaining lights around the object using a simple split (quad, octree), and put a light in each segment which represents the remainers. Don't have to be crazy accurate here as they are approximations, but you will probably want to either skip specular, or make its contribution minor. There are many other ways to do this same thing (its in a similiar vein to using Irrad Volumes and SH lighting), but you could probably get away with less.
  12. multisample

    Crysis using Deferred Shading ?!

    1) Those large memory numbers are suspect. How are they calculated ? Edit: looks as if Andy already beat me to that one. 2) Not using fully deferred does not mean you don't get soft particles. We've been using a depth buffer thats laid down first with the zfill/HiZ pass. We use this for all sorts of effects. This is similiar to being "deferred", but we've done this for a while before I knew about deferred, so I'd be hesitant to say its based on it. I'm sure there are many variants that would do some of the deferred method does without being deferred. 3) I don't think loss of MSSA shouldn't be shaken off so lightly. Yes there are other ways to do this, but supersampling is not really viable for most people in the present or near future. Of course it probably depends on the amount of supersampling; my guess is you need quite a bit. Other fullscreen pass methods exist, but I haven't been fully pleased with them. Anyone have any good filters for fulscreen AA they would like to share ? MSAA is so nice because its relatively inexpensive compared to the other options. 4) To those who say deferred "breaks" alpha blended items, it doesn't. Its just you don't render those items using the deferred method. You just render it as if it wasn't deferred. It does make you have to code both methods though, but its really not a horrible issue. It would be nice if you didn't have to as it makes it a little more complex. You can basically do translucency the same way regardless of methods.
  13. multisample

    Crysis using Deferred Shading ?!

    It was clear. We were talking about the same thing. I obviously didn't make this statement stand out correctly in context (i.e. poor writing style): <quote> "... sampling its textures each time (normal mapping etc)" </quote> I was referring to multi-pass lighting in a forward renderer. I was contrasting it to single-pass forward renderer which doesn't necessarily have this problem, and to some degree deferred. AFAIK, deferred re-reads the BRDF parameters for each light, so this occurs to some degree. HUGE DISCLAIMER TO ANYONE JUST BROWSING THIS SUBJECT: The following statement) are UTTERLY dependant on your hardware. Your results CAN and WILL vary. Please test yourself and post results. In my experience, the sampling of the large uncompressed textures during the lighting phase of deferred rendering, combined with large uncompressed shadow maps can be a big texture bandwidth/cache hog. If you max out your texture cache/bandwidth in the lighting pass, it becomes almost impossible to make the scene run at 60fps (for us, 60fps is important) for a decent number of lights. END DISCLAIMER STATMENTS <quote> If it was a question of trading off ALU vs Memory, I'd be the first to argue that using ALU is the right way to go in the long term, but it is not. It's a question of O(LG) vs. O(L)+O(G) where the second complexity has a slightly higher constant factor. i.e. as the lighting and geometric complexity increases, algorithm B will be increasingly more attractive... </quote> While the "O(L) + O(G)..." is true, the actual cost in each algorithm can vary dramatically when memory access is brought into the equation (which can skew the results). In practice (using a single pass forward renderer), I've found that the extra ALU cost of the multiple lights per pass usually hides much of the texture latency and texture cache issues that can occur. While doing deferred, I was usually bound by the texture cache/lookups and so cycles could be wasted. In the future I can see this problem being resolved with larger texture caches and cards that are can handle large uncompressed textures better. Current hardware still seems optimized for the DXT(n) formats. Unfortunately for me, we are still bound to the current hardware which ranges quite a bit in performance. Some of the newest cards can really do well in both deferred and non-deferred depending on your scene requirements (ALU power is going way up, but so is memory bandwidth). I am kind of rooting for deferred to become the clear winner as it does make some things simpler. I would love to make use of it. Please note, I have no problem with anyone talking about the benefits of deferred lighting/shading. I agree with many of the statements. But my (our) results were not quite satisfactory with the hardware level it was running on. If I come across as biased against it, its by accident. I just want others to be aware of issues if they are going to attempt at implementing it on current gen hardware. I cannot stress this enough. As an aside, I am impressed with the work on VSM. I will probably be trying it on our shadows here soon (hopefully).
  14. multisample

    Crysis using Deferred Shading ?!

    Wolf is right on the money. We tried out deferred rendering for a while. Its main drawback was/is bandwidth. Lack of MSAA support (which is big to us as well) is second, and a secondary alpha pass is a distant third (not that big of a deal). The bandwidth issue with GF6 series cards is a big problem. The framebuffer bandwidth is a problem along with texture bandwidth (since G-buffers are usually large uncompressed textures). Personally, I love the relative simplicity of deferred rendering. I can live without MSAA and do it manually to some degree if it makes life much easier. Also, to be completely fair, the material params in G-buffers still are a limiting factor for some geometry. For our characters we ended up rendering them "forward-style" do be able to do more complex materials. We could have used up another render target , but that would have been even more expensive. Without deferred you are left doing some more light sorting. Realistically this isnt' horrible as we were already doing this sort of stuff for previous gen. The problem is the extra PS ALU required if you are doing multiple lights per-pass, since some lights may not hit all of an object. Deferred does this much better. If you don't do multiple lights per pass, you can do some other optimizations (light stencil/scissor) but you end up rendering the geometry more often, sampling its textures each time (normal mapping etc), and using framebuffer bandwidth for each pass. That said, I am always looking at deferred as a possible option because of its high points.
  15. multisample

    Color Space Precision

    >I think we are talking about the same platform :-) ... so yes exactly on this platform .. Yes, I am sure we are :) I guess we get the choice of one platform with tiling or another with no FP MSAA. I almost wish they would have waited one more year to get true FP w/MSAA support all around. Guess we never get everything we want. >but other than this my idea is using a 16:16:16:16 fp render target to catch everything and then work from there in 8:8:8:8 ... which should be perfectly suitable to this platform. I am still confused as to why you would use FP16 then RGBA32 w/MSAA. I would guess it should be the other way around. Unless ... are you doing deferred style rendering ? That would make more sense to me. You would still have to deal with blending, albeit additive isn't that difficult to work with. As far as losing information goes, I would surmise that keeping luminance data is much more important than the other data as thats how most image compressors work (ie jpeg, divx, bink). IIRC the gamecube did the trick on its memory back buffer (not edram) to store luminance per-pixel, but the CrCb was split across two pixels to save space. I am sorry am not able to contribute much more to your questions, I guess when I have time I will try and figure some of it out, but it might be a while. I'd be insterested in hearing any of your issues and how they were resolved. I am still trying to fix the banding in the low range I get with FP10 surfaces and hdr. I am almost tempted to just do tonemapping in the main pixel shader and just store out a luminance value (encoded maybe) in alpha or something like that for the later passes.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!