|
||||||||||||||||||
Add Forum to Favorites | Send Topic To a Friend | View Forum FAQ | Track this topic Page: 1 2 »» |
Last Thread Next Thread ![]() |
| DEMO: Deferred Shading |
|
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
| Hi everyone, I decided to give a try deferred shading under Direct3D recently and here is result of my work: Demo Site . ![]() I'd be glad if you give me some feedback on whether it runs at all and what was rendering speed on your specs. Unfortunately it requires GeForce 6 class card (or higher) or Radeon 9500 (or higher). I'd expect it work on GeForce FX as well, but switching to another rendering mode should fail (no support for 64 bit render targets). Note: By default there are 14 light sources in the demo. Disabling few of them should make things smoother. Thanks! [edit: Now tested on Radeon cards as well] [Edited by - MickeyMouse on December 23, 2009 5:50:02 PM] |
||||
|
||||
![]() Samsonite Member since: 5/9/2005 From: Oslo, Norway |
||||
|
|
||||
| Oh sorry, i got Geforce 4 Hope I was helpful. And thank you if you were! |
||||
|
||||
![]() SimmerD GDNet+ Member since: 1/5/2003 From: Los Gatos, CA, United States |
||||
|
|
||||
| When I tried, it said it's missing d3dx9d_24.dll. I may even have this lying around, but I didn't feel like hunting for it. Some responses to your nvidia-related questions. I just resigned from nvidia, but I can help you with some of your questions. Contact developer relations for more specific help. from your readme.txt : Issues found when implementing deferred renderer (probably very GeForce cards specific): - an optimization with stencil masking pixels not being lit by the light was actually not optimization at all (but it's still necessary to correctly determine lit pixels); you can see stencil test in action by pressing <J> and disabling few lights (just to see more clearly) Geforce6 cards use a very fast, but somewhat picky stencil culling algorithm. Unless you perform your stencil within certain parameters, the entire shader will be run before stencil is tested. It IS possible to get it fast, just requires a few tweaks. - cube normalization texture didn't give any speed up Not surprising, as the GeForce6 has very fast normalize ( especially in half precision ), and you are most likely bound by other things. - rendering to 4 render targets in pre-processing step is very expensive (but it's only once) The sweet spot for GF6 cards is 3 MRTs. Most deferred shading algorithms can be squeezed into 3 MRTs, so this is not a big deal in practice. - fastest (and good quality) deferred renderer mode for me was: R32F for position (as depth; stored in clip space) A8R8G8B8 for normals (biased; stored in world space) - best quality deferred renderer mode was obviously: R16G16B16F for position (stored in world space) R16G16B16F for normal (stored in world space) - speed of rendering using deferred renderer was varying depending on when (yes when) were render target textures allocated; e.g. for me mode R16G16B16 (non-float) when switched on for the first time was usually about 2 times slower than when switched on for the second time (every time I switch, I recreate all required render targets); looks like card drivers are doing some unpredictable job when allocating / deallocating render target textures The GF6 has some limited hw resources for non-power of 2 render targets. If you fall outside of the limits, speed can suffer. Typically the current drivers rely on the order of allocation for who gets the resources. Future drivers will address this more automatically. Game Development Journal |
||||
|
||||
![]() vEEcEE Member since: 10/11/2002 From: Los Angeles, CA, United States |
||||
|
|
||||
| I can't run it because I don't have d3dx9d_24.dll ...I have the _25 and _26 dlls from the April and June 2005 DX9.0c SDK though, but that won't help. I tried recompiling, and I got this error: deferredrenderer.cpp(369) : error C2552: 'quad' : non-aggregates cannot be initialized with initializer list 'D3DHelper::SimpleVertex' : Types with user defined constructors are not aggregate I'm not sure how to fix it, but it looks like it's something to do with the D3DHelper class. If you can fix it (maybe upgrading to June 2005 SDK), I'll test it for you. I'm running on a Athlon XP 2500+ (1.8GHz), and a GeForce 6600GT AGP. |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
| Ok, I've just included missing d3dx9d_24.dll. I'll try recompile it with newer D3D version soon too. BTW, thanks for feedback SimmerD, > Geforce6 cards use a very fast, but somewhat picky stencil culling algorithm. > Unless you perform your stencil within certain parameters, the entire shader > will be run before stencil is tested. It IS possible to get it fast, just > requires a few tweaks. I admit I was a bit surprised seeing I didn't get speed up using stencil test here. What I do in the demo is for each light render its bounding sphere 2 times: 1. stencil "tag" pixels lit by the light (disable depth and color writes; enable depth test) 2. if stencil test passes, light pixel using pixel shader; for each pixel set stencil value to 0 (just clear stencil) Do you suggest I might somehow not benefit from stencil culling (that is the pixel shader is executed for culled away pixels too)? What parameters and tweaks do you mean? Thanks :) [edit: Changed "culled" to "culled away"] [Edited by - MickeyMouse on July 19, 2005 1:57:41 PM] |
||||
|
||||
![]() NoodleizzeR Member since: 7/16/2005 From: Billericay, United Kingdom |
||||
|
|
||||
| Ive downloaded the new version, but it looks like youve included the wrong file, the program wants "d3dx9d_24.dll", not "d3dx9_24.dll". It works if you just rename the dll to the right name, but its not a great idea as the program will be expecting to open up the debug dll and its getting the release dll. edit: You might also want to change the way the fps is displayed, as currently its changing a bit too fast for me to read easily. |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
| Oh yes, I'm sorry, now should be ok. |
||||
|
||||
![]() b34r Member since: 8/8/2004 From: Kawasaki, Japan |
||||
|
|
||||
Quote: I cannot try the demo but judging from your screenshot the test scene is not going to benefit from stencil test since even with a simple zbuffer based rejection the number of false hit will stay fairly low. That's assuming you shade if greater than Z when outside the volume and shade when less than when inside it. |
||||
|
||||
![]() Konfusius Member since: 3/14/2004 From: Dortmund, Germany |
||||
|
|
||||
| Hi, I renamed the d3dx dll and got ~85 Frames with default options. Specs: Sempron 2600+ 512 MB DDR2 Memory/333 MHz Radeon 9800 Pro 128 MB. |
||||
|
||||
![]() NoodleizzeR Member since: 7/16/2005 From: Billericay, United Kingdom |
||||
|
|
||||
| Well, ive tried it with the newest version and here are my results: AMD64 3200+, 1.5GB RAM, 6800GT With the information off the framerate changes very quickly and its hard get an exact reading, the lowest i can see is about 102fps, the highest is about 126fps. If i overlcock my 6800GT to the speed of a 6800 Ultra, the highest is still 126fps, but it doesnt drop below 120fps. |
||||
|
||||
![]() Anonymous Poster |
||||
|
||||
| Works great on my P4 2.4, X800XT. Framerate is somwehere along 130-200 fps, and thanks for sharing! |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
|||||
|
|
|||||
Thank you all guys for feedback!Quote: It "should" benefit even in my simple scene - really. If you enable stencil-clipping preview mode you can see how many pixels are not lit by each single light. Just take a look at this: ![]() [ Scene rendered normally ] ![]() [ Scene rendered in stencil-preview mode (<j> key in the demo) - lit pixels are marked with green color ] Green lines enclose light's bounding volume. As you can see, there are a lot less lit pixels than those which fit within light's volume screen space bounding shape. Of course it heavily depends on viewing angle, viewer position and sorrounding geometry, but in overall it should be a good optimization - however on GeForce 6600 TD I even get slightly worse performance with stencil clipping enabled. Quote: Just fixed FPS calculation. Quote: Your frame rate is a little surprise to me, because on GeForce 6600 I get between 30-40 (with default settings) and my card shouldn't be twice slower. [Edited by - MickeyMouse on July 19, 2005 12:02:17 PM] |
|||||
|
|||||
![]() AndyTX Member since: 7/9/2001 From: Waterloo, Canada |
||||
|
|
||||
Quote: Make sure to ONLY profile in Release mode and with the RELEASE mode DirectX DLLs! From my understanding of his renaming, he's using the latter while your using the DirectX Debug DLLs. Still, the 9800 is a pretty fast card and it wouldn't surprise me if it was able to run this type of rendering at a much faster speed. |
||||
|
||||
![]() blue_knight Member since: 3/6/2003 From: USA |
||||
|
|
||||
| With all lights on and specular enabled I get slightly over 100fps on my GeForce6 Ultra. I get around 120 fps without specular. The deferred renderer also runs twice as fast as the forward renderer (due to the large number of lights). |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
Quote: Yes, forward renderer is more of a quality comparison for deferred renderer. It's not optimized at all from geometry point of view like finding only closest objects in the scene which is a must for every considerable forward renderer. The only optimization it uses is scissor test, which saves huge amount of fill rate in my case. Regarding the debug and release D3D versions - the version that is currently to download is compiled in release mode of D3D. The performance on my machine stays the same though. |
||||
|
||||
![]() SimmerD GDNet+ Member since: 1/5/2003 From: Los Gatos, CA, United States |
||||
|
|
||||
| For what you are doing, the easiest way to get stencil culling to work is to do the following. a) don't change your stencil test, mask or reference value during the frame, you can enable and disable stencil though. In other words, do it like stencil shadow volumes. b) don't ever write stencil while testing stencil. For instance, it's faster to do a separate stencil clear that it is to clear a stencil value after testing it c) be sure to clear stencil at least once each frame If these things don't fix it, let me know. Game Development Journal |
||||
|
||||
![]() rept Member since: 6/14/2004 From: Hungary |
||||
|
|
||||
| First, thanks for sharing, and good work! I spent the last year working on a deferred renderer, which will be used in a few soon-to-be-released games, so i thought i could contribute to this thread. Quote: We're having the exact same problem. It affects the 6600 series, does not happen on 6800s! We've submitted a test-case to the nvidia test labs a few weeks ago, if you're interested, i can share the feedback when they get back to us. As i observed it's got to do with the order of allocations (as you already mentioned), but on the size of the target, too! A few pixels (like 2-4) difference can suddenly "click" it back to normal operation. I recommend using the NVPerfHUD tool - it's very obvious on the graphs when something's wrong. Another thing, a recommendation, based on my findings: dont rely on packing floats to r8g8b8a8 quads (or anything similar). It's not very precise, and differs from vendor to vendor - i could never get it "right". (and i _have_ tried - it was just never robust enough) |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
Thanks for your tips rept,Quote: I used it a bit while working on this demo, but the graphs didn't really help mu much - I just see the graphs sometime jump with no real reason. My app is too simple to be software bound and the bottleneck is obviously in the drivers. As I switch to new rendering mode (and so release 4 old render targets and create 4 new ones) the FPS usually jumps up for few times (the more fill rate, the more jumpy it is), then after some time it's more or less stable - depends on real fill rate. Here's quite typical shot from the NVPerfHUD graphs after switching to new rendering mode: ![]() @SimmerD I'll try your hints soon. They look like some dirty hacks implemented by NVIDIA especially for John Carmack, aren't they? ;-) |
||||
|
||||
![]() et1337 Member since: 11/11/2004 From: Columbus, OH, United States |
||||
|
|
||||
| Worked great, 100+ fps with deferred shading at good quality, 50+ fps with forward shading. AMD Athlon XP 2800+ 1 GB PC 3200 memory AGP 8X ATI Radeon 9800 Pro 128 mb |
||||
|
||||
![]() rept Member since: 6/14/2004 From: Hungary |
||||
|
|
||||
| MickeyMouse, that "spiky" graph is the problem - it shouldnt be like that, and it's definitely not like that on 6800s. switch to windowed mode, and try resizing the window, sooner or later it will stabilize itself. almost randomly, i'd say, thats why we asked nvidia's help with it. |
||||
|
||||
![]() pbryant Member since: 7/10/2005 From: Portland, OR, United States |
||||
|
|
||||
| P4 2.66 HT Radeon 9800 Pro 1280x1024 Forward Rendering: ~45fps Deferred Rendering: Mode 0: ~90fps Mode 1: ~65fps Mode 2: ~65fps Mode 3: ~90fps Mode 4: ~90fps PM 1.6 Geforce 6800 GO 1920x1200 Forward Rendering: ~75fps Deferred Rendering: Mode 0: ~85fps Mode 1: ~90fps Mode 2: ~30fps!!!!!!!!!!!!!!!!!!! Mode 3: ~90fps Mode 4: ~70fps Some very weird stretching going on here. Resolutions are the natives for my screens. No idea what res the program was running at. |
||||
|
||||
![]() vEEcEE Member since: 10/11/2002 From: Los Angeles, CA, United States |
||||
|
|
||||
| Athlon XP 2500+ (1.8GHz) GeForce 6600GT (AGP 8x, ForceWare 77.72 drivers, DX 9.0c) I'll just the lows and highs since the fps was jumping around a bit. Also, this was with the default camera position, and no other options changed...I just press "R" to change renders, and "M" to change between the modes. Video mode was the demo's 800x600 resolution. Forward: 59 to 78 fps Deferred: - mode 0: 59 to 73 fps - mode 1: 49 to 64 fps - mode 2: 41 to 59 fps - mode 3: 39 to 49 fps - mode 4: 54 to 69 fps |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
Quote: Does the 30 FPS thing happen every time you switch to Mode 2? Could you please try switching through all modes two times and then test once again? BTW, I've just done a little card efficiency comparison table on demo site . |
||||
|
||||
![]() SimmerD GDNet+ Member since: 1/5/2003 From: Los Gatos, CA, United States |
||||
|
|
||||
| BTW, you really should not work with world space vectors. The best idea is to move world space into view space in the vertex shader, and pass down the values that way. That way, you can use half precision in the shader, as well as storing in the frame buffer, without limiting your geometry, or how big your world can be. Game Development Journal |
||||
|
||||
![]() MickeyMouse Member since: 3/5/2002 From: Melbourne, Australia |
||||
|
|
||||
Quote: Yep, that was definitely a good idea and the pixel shaders have one instruction less now (eye vector calculation). I was wondering how do you imagine should this be done: Quote: How can I render following 2 passes _without_ changing stencil function between them ? : 1. increment (decrement) stencil values where needed <- Stencil function has to be ALWAYS 2. render light normally where stencil equals some value <- Stencil function has to be EQUAL Am I missing something? Thanks. |
||||
|
||||
|
Page: 1 2 »» All times are ET (US) ![]() |
Last Thread Next Thread ![]() |
|