Sign in to follow this  

DEMO: Deferred Shading

This topic is 4526 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone,I decided to give a try deferred shading under Direct3D recently and here is result of my work: [url="http://www.uberengine.com/projects/pages/deferred_shading_demo.html"]Demo Site[/url].

[img]http://www.uberengine.com/projects/images/deferred_shading_1.jpg[/img]


I'd be glad if you give me some feedback on whether it runs at all and what was rendering speed on your specs. Unfortunately it requires GeForce 6 class card (or higher) or Radeon 9500 (or higher). I'd expect it work on GeForce FX as well, but switching to another rendering mode should fail (no support for 64 bit render targets)

Note: By default there are 14 light sources in the demo. Disabling few of them should make things smoother.
Thanks!

[edit: Now tested on Radeon cards as well)

Share this post


Link to post
Share on other sites
When I tried, it said it's missing d3dx9d_24.dll. I may even have this lying around, but I didn't feel like hunting for it.

Some responses to your nvidia-related questions. I just resigned from nvidia, but I can help you with some of your questions. Contact developer relations for more specific help.

from your readme.txt :

Issues found when implementing deferred renderer (probably
very GeForce cards specific):

- an optimization with stencil masking pixels not being lit
by the light was actually not optimization at all (but it's
still necessary to correctly determine lit pixels); you can
see stencil test in action by pressing <J> and disabling
few lights (just to see more clearly)

Geforce6 cards use a very fast, but somewhat picky stencil culling algorithm. Unless you perform your stencil within certain parameters, the entire shader will be run before stencil is tested. It IS possible to get it fast, just requires a few tweaks.

- cube normalization texture didn't give any speed up

Not surprising, as the GeForce6 has very fast normalize ( especially in half precision ), and you are most likely bound by other things.

- rendering to 4 render targets in pre-processing step is very
expensive (but it's only once)

The sweet spot for GF6 cards is 3 MRTs. Most deferred shading algorithms can be squeezed into 3 MRTs, so this is not a big deal in practice.

- fastest (and good quality) deferred renderer mode for me was:
R32F for position (as depth; stored in clip space)
A8R8G8B8 for normals (biased; stored in world space)

- best quality deferred renderer mode was obviously:
R16G16B16F for position (stored in world space)
R16G16B16F for normal (stored in world space)

- speed of rendering using deferred renderer was varying
depending on when (yes when) were render target textures
allocated; e.g. for me mode R16G16B16 (non-float) when
switched on for the first time was usually about 2 times
slower than when switched on for the second time (every time
I switch, I recreate all required render targets); looks
like card drivers are doing some unpredictable job when
allocating / deallocating render target textures

The GF6 has some limited hw resources for non-power of 2 render targets. If you fall outside of the limits, speed can suffer. Typically the current drivers rely on the order of allocation for who gets the resources. Future drivers will address this more automatically.

Share this post


Link to post
Share on other sites
I can't run it because I don't have d3dx9d_24.dll ...I have the _25 and _26 dlls from the April and June 2005 DX9.0c SDK though, but that won't help. I tried recompiling, and I got this error:

deferredrenderer.cpp(369) : error C2552: 'quad' : non-aggregates cannot be initialized with initializer list
'D3DHelper::SimpleVertex' : Types with user defined constructors are not aggregate

I'm not sure how to fix it, but it looks like it's something to do with the D3DHelper class.

If you can fix it (maybe upgrading to June 2005 SDK), I'll test it for you. I'm running on a Athlon XP 2500+ (1.8GHz), and a GeForce 6600GT AGP.

Share this post


Link to post
Share on other sites
Ok, I've just included missing d3dx9d_24.dll. I'll try recompile it with newer D3D version soon too.

BTW, thanks for feedback SimmerD,

> Geforce6 cards use a very fast, but somewhat picky stencil culling algorithm.
> Unless you perform your stencil within certain parameters, the entire shader
> will be run before stencil is tested. It IS possible to get it fast, just
> requires a few tweaks.

I admit I was a bit surprised seeing I didn't get speed up using stencil test here. What I do in the demo is for each light render its bounding sphere 2 times:
1. stencil "tag" pixels lit by the light (disable depth and color writes; enable depth test)
2. if stencil test passes, light pixel using pixel shader; for each pixel set stencil value to 0 (just clear stencil)

Do you suggest I might somehow not benefit from stencil culling (that is the pixel shader is executed for culled away pixels too)? What parameters and tweaks do you mean?

Thanks :)

[edit: Changed "culled" to "culled away"]

[Edited by - MickeyMouse on July 19, 2005 1:57:41 PM]

Share this post


Link to post
Share on other sites
Ive downloaded the new version, but it looks like youve included the wrong file, the program wants "d3dx9d_24.dll", not "d3dx9_24.dll". It works if you just rename the dll to the right name, but its not a great idea as the program will be expecting to open up the debug dll and its getting the release dll.

edit: You might also want to change the way the fps is displayed, as currently its changing a bit too fast for me to read easily.

Share this post


Link to post
Share on other sites
Quote:
Original post by MickeyMouse
Do you suggest I might somehow not benefit from stencil culling (that is the pixel shader is executed for culled pixels too)? What parameters and tweaks do you mean?


I cannot try the demo but judging from your screenshot the test scene is not going to benefit from stencil test since even with a simple zbuffer based rejection the number of false hit will stay fairly low. That's assuming you shade if greater than Z when outside the volume and shade when less than when inside it.

Share this post


Link to post
Share on other sites
Well, ive tried it with the newest version and here are my results:

AMD64 3200+, 1.5GB RAM, 6800GT

With the information off the framerate changes very quickly and its hard get an exact reading, the lowest i can see is about 102fps, the highest is about 126fps.

If i overlcock my 6800GT to the speed of a 6800 Ultra, the highest is still 126fps, but it doesnt drop below 120fps.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Works great on my P4 2.4, X800XT. Framerate is somwehere along 130-200 fps, and thanks for sharing!

Share this post


Link to post
Share on other sites
Thank you all guys for feedback!

Quote:
Original post by b34r
Quote:
Original post by MickeyMouse
Do you suggest I might somehow not benefit from stencil culling (that is the pixel shader is executed for culled pixels too)? What parameters and tweaks do you mean?

I cannot try the demo but judging from your screenshot the test scene is not going to benefit from stencil test since even with a simple zbuffer based rejection the number of false hit will stay fairly low. That's assuming you shade if greater than Z when outside the volume and shade when less than when inside it.


It "should" benefit even in my simple scene - really.
If you enable stencil-clipping preview mode you can see how many pixels are not lit by each single light. Just take a look at this:


[ Scene rendered normally ]


[ Scene rendered in stencil-preview mode (<j> key in the demo) - lit pixels are marked with green color ]

Green lines enclose light's bounding volume. As you can see, there are a lot less lit pixels than those which fit within light's volume screen space bounding shape. Of course it heavily depends on viewing angle, viewer position and sorrounding geometry, but in overall it should be a good optimization - however on GeForce 6600 TD I even get slightly worse performance with stencil clipping enabled.

Quote:
Original post by NoodleizzeR
With the information off the framerate changes very quickly and its hard get an exact reading

Just fixed FPS calculation.

Quote:
Original post by Konfusius
I renamed the d3dx dll and got ~85 Frames with default options.

Specs:
Sempron 2600+
512 MB DDR2 Memory/333 MHz
Radeon 9800 Pro 128 MB.

Your frame rate is a little surprise to me, because on GeForce 6600 I get between 30-40 (with default settings) and my card shouldn't be twice slower.

[Edited by - MickeyMouse on July 19, 2005 12:02:17 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by MickeyMouse
Your frame rate is a little surprise to me, because on GeForce 6600 I get between 30-40 (with default settings) and my card shouldn't be twice slower.

Make sure to ONLY profile in Release mode and with the RELEASE mode DirectX DLLs! From my understanding of his renaming, he's using the latter while your using the DirectX Debug DLLs.

Still, the 9800 is a pretty fast card and it wouldn't surprise me if it was able to run this type of rendering at a much faster speed.

Share this post


Link to post
Share on other sites
With all lights on and specular enabled I get slightly over 100fps on my GeForce6 Ultra. I get around 120 fps without specular. The deferred renderer also runs twice as fast as the forward renderer (due to the large number of lights).

Share this post


Link to post
Share on other sites
Quote:
Original post by blue_knight
With all lights on and specular enabled I get slightly over 100fps on my GeForce6 Ultra. I get around 120 fps without specular. The deferred renderer also runs twice as fast as the forward renderer (due to the large number of lights).


Yes, forward renderer is more of a quality comparison for deferred renderer. It's not optimized at all from geometry point of view like finding only closest objects in the scene which is a must for every considerable forward renderer. The only optimization it uses is scissor test, which saves huge amount of fill rate in my case.

Regarding the debug and release D3D versions - the version that is currently to download is compiled in release mode of D3D. The performance on my machine stays the same though.

Share this post


Link to post
Share on other sites
For what you are doing, the easiest way to get stencil culling to work is to do the following.

a) don't change your stencil test, mask or reference value during the frame, you can enable and disable stencil though. In other words, do it like stencil shadow volumes.

b) don't ever write stencil while testing stencil. For instance, it's faster to do a separate stencil clear that it is to clear a stencil value after testing it

c) be sure to clear stencil at least once each frame

If these things don't fix it, let me know.

Share this post


Link to post
Share on other sites
First, thanks for sharing, and good work!

I spent the last year working on a deferred renderer, which will be used in a few soon-to-be-released games, so i thought i could contribute to this thread.

Quote:
Original post by MickeyMouse
Your frame rate is a little surprise to me, because on GeForce 6600 I get between 30-40 (with default settings) and my card shouldn't be twice slower.


We're having the exact same problem. It affects the 6600 series, does not happen on 6800s! We've submitted a test-case to the nvidia test labs a few weeks ago, if you're interested, i can share the feedback when they get back to us. As i observed it's got to do with the order of allocations (as you already mentioned), but on the size of the target, too! A few pixels (like 2-4) difference can suddenly "click" it back to normal operation. I recommend using the NVPerfHUD tool - it's very obvious on the graphs when something's wrong.

Another thing, a recommendation, based on my findings: dont rely on packing floats to r8g8b8a8 quads (or anything similar). It's not very precise, and differs from vendor to vendor - i could never get it "right". (and i _have_ tried - it was just never robust enough)

Share this post


Link to post
Share on other sites
Thanks for your tips rept,

Quote:
Original post by rept
I recommend using the NVPerfHUD tool - it's very obvious on the graphs when something's wrong.

I used it a bit while working on this demo, but the graphs didn't really help mu much - I just see the graphs sometime jump with no real reason. My app is too simple to be software bound and the bottleneck is obviously in the drivers.

As I switch to new rendering mode (and so release 4 old render targets and create 4 new ones) the FPS usually jumps up for few times (the more fill rate, the more jumpy it is), then after some time it's more or less stable - depends on real fill rate. Here's quite typical shot from the NVPerfHUD graphs after switching to new rendering mode:



@SimmerD

I'll try your hints soon. They look like some dirty hacks implemented by NVIDIA especially for John Carmack, aren't they? ;-)

Share this post


Link to post
Share on other sites
MickeyMouse,

that "spiky" graph is the problem - it shouldnt be like that, and it's definitely not like that on 6800s. switch to windowed mode, and try resizing the window, sooner or later it will stabilize itself. almost randomly, i'd say, thats why we asked nvidia's help with it.

Share this post


Link to post
Share on other sites
P4 2.66 HT
Radeon 9800 Pro
1280x1024

Forward Rendering: ~45fps

Deferred Rendering:
Mode 0: ~90fps
Mode 1: ~65fps
Mode 2: ~65fps
Mode 3: ~90fps
Mode 4: ~90fps

PM 1.6
Geforce 6800 GO
1920x1200

Forward Rendering: ~75fps

Deferred Rendering:
Mode 0: ~85fps
Mode 1: ~90fps
Mode 2: ~30fps!!!!!!!!!!!!!!!!!!!
Mode 3: ~90fps
Mode 4: ~70fps

Some very weird stretching going on here.

Resolutions are the natives for my screens. No idea what res the program was running at.

Share this post


Link to post
Share on other sites
Athlon XP 2500+ (1.8GHz)
GeForce 6600GT (AGP 8x, ForceWare 77.72 drivers, DX 9.0c)

I'll just the lows and highs since the fps was jumping around a bit. Also, this was with the default camera position, and no other options changed...I just press "R" to change renders, and "M" to change between the modes. Video mode was the demo's 800x600 resolution.

Forward: 59 to 78 fps
Deferred:
- mode 0: 59 to 73 fps
- mode 1: 49 to 64 fps
- mode 2: 41 to 59 fps
- mode 3: 39 to 49 fps
- mode 4: 54 to 69 fps

Share this post


Link to post
Share on other sites
Quote:
Original post by pbryant
Deferred Rendering:
Mode 0: ~85fps
Mode 1: ~90fps
Mode 2: ~30fps!!!!!!!!!!!!!!!!!!!
Mode 3: ~90fps
Mode 4: ~70fps


Does the 30 FPS thing happen every time you switch to Mode 2?
Could you please try switching through all modes two times and then test once again?

BTW, I've just done a little card efficiency comparison table on demo site .

Share this post


Link to post
Share on other sites
BTW, you really should not work with world space vectors.

The best idea is to move world space into view space in the vertex shader, and pass down the values that way.

That way, you can use half precision in the shader, as well as storing in the frame buffer, without limiting your geometry, or how big your world can be.

Share this post


Link to post
Share on other sites
Quote:
Original post by SimmerD
BTW, you really should not work with world space vectors.

The best idea is to move world space into view space in the vertex shader, and pass down the values that way.

That way, you can use half precision in the shader, as well as storing in the frame buffer, without limiting your geometry, or how big your world can be.

Yep, that was definitely a good idea and the pixel shaders have one instruction less now (eye vector calculation).

I was wondering how do you imagine should this be done:
Quote:
For what you are doing, the easiest way to get stencil culling to work is to do the following.

a) don't change your stencil test, mask or reference value during the frame, you can enable and disable stencil though. In other words, do it like stencil shadow volumes.

How can I render following 2 passes _without_ changing stencil function between them ? :
1. increment (decrement) stencil values where needed <- Stencil function has to be ALWAYS
2. render light normally where stencil equals some value <- Stencil function has to be EQUAL

Am I missing something?

Thanks.

Share this post


Link to post
Share on other sites

This topic is 4526 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this