# *Version with FPS and debug infos* Deferred shading GLSL.

## Recommended Posts

b34r    365
Grab it here... It seems to be working for most people now so this is just an update for the curious ones. There's about 30000 polygons, a fairly larger number of triangles. Only two texture level at max used in this scene but as I'll find some more time to work on the renderer I'll post a more impressive use of the method. There's no key press delay so it might get on your nerve easily :). Thanks a lot. [Edited by - b34r on June 9, 2005 7:12:40 AM]

##### Share on other sites
OrangyTang    1298
GL version doesn't work for me (black screen, yet eats about 60% CPU). Output:
C:\Documents and Settings\Desktop\DiscoPhysics&gt;rb -glbfmk/naD by Emmanuel Julien (barr./niji).V.[2.1.5], internal version 2000~2005.http://barr.ninomojo.com/bfmknadStarting OpenGL2.0 renderer...Loading default shader library (profile GLDS:HighEnd;)...    Creating 'core::dsg'...GL2:    Done.    Creating 'core::n'...GL2:    Done.    Creating 'core::ds'...GL2:    Done.    Creating 'core::dts'...GL2:    Done.    Creating 'core::ambient'...GL2:    Done.    Creating 'core::pointlight'...GL2:    Done.    Creating 'core::linearlight'...GL2:    Done.Done.Vendor: NVIDIA CorporationRenderer: GeForce FX 5200/AGP/SSE/3DNOW!Version: 1.5.1** Warning, failed to setup texture 'dsg_map'.** Warning, failed to setup texture 'n_map'.** Warning, failed to setup texture 'ds_map'.Importing geometry '$data_path$/box.lwo'.Removed 1 material(s) from geometry.Import complete, 8 vertice for 6 polygon(s), 1 material(s).Importing geometry '$data_path$/chair.lwo'.Removed 1 material(s) from geometry.Import complete, 64 vertice for 48 polygon(s), 1 material(s).Importing geometry '$data_path$/book.lwo'.Removed 1 material(s) from geometry.Import complete, 8 vertice for 6 polygon(s), 1 material(s).Importing geometry '$data_path$/lib.lwo'.Removed 1 material(s) from geometry.Import complete, 56 vertice for 42 polygon(s), 1 material(s).Importing geometry '$data_path$/crate.lwo'.Codec ('JPEG_Small') could read image datas.Removed 1 material(s) from geometry.Import complete, 8 vertice for 6 polygon(s), 1 material(s).Tau alloc = 1572864 byte(s) (8192).Using 'core::dts' for material 'crate', no perfect match.Converting geometry to triangle list...Done, in 0ms.Using 'core::dsg' for material 'book', no perfect match.Converting geometry to triangle list...Done, in 0ms.Using 'core::dsg' for material 'Lib', no perfect match.Converting geometry to triangle list...Done, in 0ms.Using 'core::dsg' for material 'Box', no perfect match.Converting geometry to triangle list...Done, in 0ms.

DX version works but looks ugly and can barely make anything out its so dark. I'm on a lousy FX5200 and I don't know when I last updated the drivers so that might not be helping.

##### Share on other sites
b34r    365
Mmmh... it fails to create the attribute textures not sure why since they are 1024x1024 rgba32 and should be available on a FX5200 (I suppose?).

That's the problematic part...
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB8, t->GetWidth(), t->GetHeight(), 0, GL_BGRA_EXT, GL_UNSIGNED_BYTE, NULL);

Anybody with more expertise able to spot the error?
Yes, DX is very dark as it does only select the 8 closest lights to an object for rendering.

##### Share on other sites
JSoftware    318
Works?
Yeah it works
Runs?
sometimes awful and okay
CPU/GPU
amd xp "barton" 2500+ / Radeon 9600 pro ultra
Glitches?
I don't suppose so

looks pretty good

##### Share on other sites
b2b3    602
Works fine for me. It's not slow and looks relatively good, although it's little dark (OpenGL version).
Is there any limit on number of objects (cubes, chairs) in scene? I've tried holding q and s until screen was half covered with chairs and cubes [smile]. Then it started slowing down.
My config: A64 3000+, 1 gb ram, Radeon 9600, WinXP.

##### Share on other sites
b34r    365
Thanks for the feedbacks [smile].
Somehow I have the feeling it just doesn't work with nVidia boards... [headshake]

Quote:
 Is there any limit on number of objects (cubes, chairs) in scene? I've tried holding q and s until screen was half covered with chairs and cubes . Then it started slowing down.

There only is a display limit, objects will still be added to the scene and everything will slowdown even more and the console will be filled with even more warnings.

As a side note: the crate's mass is 40kg and the chair is only 1kg so dropping lot of crates on one chair is not fair [rolleyes].

##### Share on other sites
System Info:
============
Processor: AMD Athlon(tm)
Speed: 895 MHz
Memory: 1024 MB
Operating System: Windows XP
Videocard:
==========
Vendor: ATI Technologies Inc.
GL Renderer: RADEON X800 XT Platinum Edition x86/MMX/3DNow!/SSE
GL Version: 2.0.4955 WinXP Release

GL version worked okayish, but slow. I'd guess 2 fps or so. DX was too dark too see anything, but seemed a lot slower.

##### Share on other sites
MickePicke    184
It runs pretty slow, 20-30 fps I guess, and when the physics is working hard it goes down to about 5 or so. No glitches.

When I drop some chairs I get alot of "Warning, OBB/OBB invalid configuration {8:4} or {4:8}"

You should have a fullscreen option. My GL apps usually goes alot faster in fullscreen, though in this case it's probably the physics that slows it down.

AMD 3500+
ATI X800XL 256 MB
1 GB RAM

EDIT: rick_appleton you must have Cool & Quiet on right? No way you have a system like that and only a 900 MHz AMD :)

##### Share on other sites
b34r    365
Wow... 2 fps on a X800XT? That's baaaaad [smile]...
But I guess you were really CPU bound on a 800mhz system, did you try a smaller scene (d key)? The physics are the only reason I can see to the slowdown especially given your graphic card.

##### Share on other sites
b34r    365
Quote:
 Original post by MickePickeIt runs pretty slow, 20-30 fps I guess, and when the physics is working hard it goes down to about 5 or so. No glitches.When I drop some chairs I get alot of "Warning, OBB/OBB invalid configuration {8:4} or {4:8}"You should have a fullscreen option. My GL apps usually goes alot faster in fullscreen, though in this case it's probably the physics that slows it down.AMD 3500+ATI X800XL 256 MB1 GB RAMEDIT: rick_appleton you must have Cool & Quiet on right? No way you have a system like that and only a 900 MHz AMD :)

Good to know... It's strange that you don't get better results. I'm getting around 30fps on all scenes when the physics are sleeping (reported by ATI tray tools OSD) on a 9800XT with a P4@2.6 I wonder where is the bottleneck. If I get time tomorrow I'll try to write a profiler for the various passes. I wonder if it might be the glCopyTexImage2D that kills the X800 serie.

About fullscreen, I do have one but I usually don't force fullscreen when I ask people to try my apps... you never know if things go wrong [smile]. The OBB/OBB message is not important, just a reminder that I need to fix something in the contact determination.

Thanks

##### Share on other sites
I took another look. In scene 2 and 3 the fps is a bit higher. I'd guess around the 20 or so maybe.

And I seem to remember reading someplace else (or here, but another thread) that that function is indeed 'slow' on the X800 series.

##### Share on other sites
Quote:
 Original post by MickePickeEDIT: rick_appleton you must have Cool & Quiet on right? No way you have a system like that and only a 900 MHz AMD :)

Tbh. I have no idea what Cool & Quiet is, but I'm guessing it's some kind of stepping mechanisn in the CPU? At any rate, I'm underclocking the cpu at the moment. It's fast enough for me, and I have a Shuttle barebone, so I'm trying to keep the temperatures down a bit. The videocard really heats up the system when it starts up, so I'm saving all I can.

##### Share on other sites
b34r    365
Quote:
 Original post by rick_appletonI took another look. In scene 2 and 3 the fps is a bit higher. I'd guess around the 20 or so maybe.And I seem to remember reading someplace else (or here, but another thread) that that function is indeed 'slow' on the X800 series.

Ok thanks.
I too remember reading that here [smile]. In fact glCopyTexImage2D() resize the framebuffer to the texture dimensions (so 3 nice 1024x768->1024x1024 copy might be problematic depending on the hardware implementation).

That's a first draft of implementation, there's quite a lot of things to iron out not to mention missing ones. I'm wondering wether deferred shading is going to stay or not. The light culling thing is very nice but the hardware cost might still be a bit too high right now and nextgen boards have SM3.0 wich should help on light limitations a lot...
Not to mention that AA does not work with deferred shading.

Time for rest [smile].

##### Share on other sites
superpig    1825
You /can/ do multisampling with deferred shading, though. Just render to a target larger than the screen and downsample with filtering as a final step.

How are you tackling transparent/translucent materials, btw?

##### Share on other sites
b34r    365
Quote:
 Original post by superpigYou /can/ do multisampling with deferred shading, though. Just render to a target larger than the screen and downsample with filtering as a final step.How are you tackling transparent/translucent materials, btw?

Yes indeed... but that's a pretty nasty trick. Especially with all the nVidia superAA marketing hype [smile].

Well I'm not there yet with transparency but I already thought about it. The deferred shading pass will be used only for opaque and alpha mapped (binary alpha or almost so) materials.
Transparent materials will use a final pass using my current GLSL implementation wich grab the closest 3 point lights plus a global linear light and has obviously some limitations with how you can mix things.
Obviously transparency is a problem. But I really don't see the >new< problems deferred shading brings with transparency. IMHO since we started batching everything transparency has become a problem. Deferred shading only increase the problem because you can potentially have very complex opaque materials with complex lightning but it does not add any basic restriction on transparency, it has been a pain to handle, still is but not worse :).

I've also thought of a 2nd alternative that might well be very doable where transparent materials would be deffered shaded offscreen and then composited to the framebuffer with whatever the alpha happens to be (texture, value, shader etc)... this is a straightforward extension of the method but might be very costly. I'm already breaking convex material triangle strips to keep them 'convex' so that transparency sorting stays coherent. So there's some batching going on already with transparent materials so the compositing cost might still be acceptable...

I've been writing some more shaders for the attribute passes and it's starting to look good. I'll try to post an update later on (I'm in the middle of breaking everything right now [smile]).

the really nice thing is that going crazy texturing all channels (color/diffuse/normal/spec/gloss/etc) is not only possible but also very fast and its not so much of a pain to maintain the shader set...

##### Share on other sites
YengaMatiC    172
Are you forced to use glCopyTexImage to do deferred shading? I thought you could link directly the G-Buffer render targets to a texture and pass them directly to the shader.

##### Share on other sites
b34r    365
New exe.

Should work on nVidia cards, the ATI Linux drivers just silently correct incorrect texture sizes... How useful...
This one has no physic and more consequent graphics, still nothing that justifies the use of deferred shading.

New controls:

Q Show/hide buffers,
Z Add fancy light (8 meters range).
S Remove fancy light,

About 30000 polygons, a fairly larger number of triangles.
Only two texture level at max used in this scene but as I'll find some more time to work on the renderer I'll post a more impressive use of the method.

The base lightning wich is fixed but still recomputed every frame is made of one 20m point light, 1 infinite point light and 1 linear light, the fog requires yet another fullscreen pass at the end of the light passes.

The light system is still quite naive and lights fully in fog are still rendered... The far plane is also farther than the 16bit attribute zbuffer texture can handle so very far lights might popup out of nowhere.

##### Share on other sites
b34r    365
Quote:
 Original post by YengaMatiCAre you forced to use glCopyTexImage to do deferred shading? I thought you could link directly the G-Buffer render targets to a texture and pass them directly to the shader.

No I'm not but I'd like to keep Linux compatibility wich the case so far and there is no pbuffer available yet... Well in fact there is the GLX_ATI_render_texture extension wich is documented: nowhere :).

I'm using glCopyTexSubImage now but I don't think it will make any difference for the X800 serie.

##### Share on other sites
999999999    162
Just tried. GeForce 6800 Ultra. P4 3.2.

I haven't measured the FPS exactly but to me it seems more than 60. Works very well.

##### Share on other sites
b34r    365
Ah great [smile]
Good, looks like that bug is fixed then... Thanks a lot.

##### Share on other sites
Saruman    4339
I just tried running it on a test machine here at work that is pretty low spec, AMD 1.2Ghz, 1GB RAM, 5700LE.

It ran.. obviously < 5fps probably a bit less. Are you going to add performance information into your test?

##### Share on other sites
xsirxx    170
Runs absolutely great on my 6600gt not running in SLI mode. I dont know the framrate but seems to be over 60fps.

##### Share on other sites
b34r    365
Thanks everybody.

Quote:
 Original post by SarumanI just tried running it on a test machine here at work that is pretty low spec, AMD 1.2Ghz, 1GB RAM, 5700LE.It ran.. obviously < 5fps probably a bit less. Are you going to add performance information into your test?

I'm working on a generic histogram class so I will add benchmark output for various part of the renderer.
Thanks again.

##### Share on other sites
Erik Rufelt    5901
Works great

Opteron 1,8ghz

It actually uses 0% cpu time according to the task manager.. which is nice =)
adding fancy lights gets it up to 50% when the whole view is filled with lights

##### Share on other sites
ZQJ    496
Geforce 5700LE
AMD 1700+ (1467 Mhz)

Just under 8fps with the 3 lights.