How does this SSAA look?

Full Name · 2014-04-03T14:51:46

I've been developing a brand new set of shaders that can accept both traditional and physically based parameters. I'm testing a lot without diffuse textures to see how the lighting looks, so only rendering with a normal map and a gray texture for diffuse map. And a AO map. But I've been noticing some pretty bad shader aliasing: http://dl.dropboxusercontent.com/u/45638513/rs05.png So I decided to implement as an experiment full lighting shader surface super-sampling antialiasing solution. Obviously, since this is surface supersampling, post-processing AA like FXAA/SMAA is not needed and neither is Toksvig AA or any other specular AA solution. But geometry borders are not affected by surface supersampling, so for best results traditional MSAA should be used. Here is the same shot with ridiculously high SSAA: http://dl.dropboxusercontent.com/u/45638513/rs06.png For real games something like this might work on dual Titans or whatnot, but average consumer level GPU will tank. Here is a more normal example: Before: http://dl.dropboxusercontent.com/u/45638513/rs07.png After, with SSAA 4x: http://dl.dropboxusercontent.com/u/45638513/rs08.png For these screenshots I did not SSAA the AO map. I did implement this later but found that there is almost zero difference, so I wouldn't recommend it. As said before, SSAA does not need specularity AA. Before with diffuse: http://dl.dropboxusercontent.com/u/45638513/rs09.png After: http://dl.dropboxusercontent.com/u/45638513/rs10.png Closeup before: http://dl.dropboxusercontent.com/u/45638513/rs11.png Closeup after: http://dl.dropboxusercontent.com/u/45638513/rs12.png Closeup after with SMAA: http://dl.dropboxusercontent.com/u/45638513/rs13.png I'll do some heavy duty tests with a batch count of about 2500 and no instancing comparing no AA with only SSAA 4x to see if this is doable in real time.

Graphics and GPU Programming Programming

Started by DwarvesH March 10, 2014 10:05 AM

18 comments, last by DwarvesH 10 years ago

KoldGames

222

March 24, 2014 11:48 PM

Hey! How is bench-marker coming along?

DwarvesH

510

Author

March 26, 2014 03:32 PM

Hey! How is bench-marker coming along?

Really really bad!

Not because of the SSAA, which seems to be pretty well behaved, but because of the other 10 billion tasks that take up my time.

I spent some time finalizing my permutations system with custom binary indexed shader blobs.

And this week, with all the base created, I started heavy duty porting form XNA to SharpDX of my game. I ported almost 200 KiB of code this week and now I need to test everything thoroughly before moving on to the rest.

And I couldn't even do that, because I needed a new texture manager GUI so I've spent today writing this:

http://dl.dropboxusercontent.com/u/45638513/mat05.png

With the bulk of that time being spent on updating the ListBox control to support multi-column modes and custom item rendering.

My blegh: http://dwarvesh.blogspot.com

fir

-460

March 26, 2014 03:53 PM

As for the "ramset", it stats off with an average of about 120000 but eventually it goes up to 190000. It must be caching.

alrright, this is about to be quick, (I mean 0.12-0.19 milisecond)

On my old pentium 4 it was about 1.2 ms, On core2 duo it was about 0.26 (0.22 -0.3)

fir

-460

March 26, 2014 03:58 PM

If you can help in testing you could be run my ram set small benchmark

https://www.dropbox.com/s/d0epr8d1drsa4bs/ramset.zip

and say how your result is... (no maloware just set of 1mb ram to zero)

As soon as I started it up, it was in the range of 190,000 - ~200,000. And after a little bit, this showed up:

You must be incidentally pressed 'a' key, i was doing some

timer under that (dont even remember what it measured), so

this number is not revelant, 0.19 -0.20 is number of ramset

(not much better than my old core2duo here (0.22 -0.3 mean about 0.26) this eould be my 4GB/s against your 5 GB/s and dwarwesh 5 GB - 8 GB/s

DwarvesH

510

Author

April 02, 2014 12:07 PM

OK, I am almost at the point where I can start the final bench-marking and assessment if the engine has adequate performance or not.

But before I had to write and finish the most powerful forward renderer I was able to.

I created this test scene:

http://dl.dropboxusercontent.com/u/45638513/l01.png

Since this is forward rendering, using multiple points lights is problematic if a lot of them affect a single object. So I'm rendering there 100 floor tiles. And I'm rendering 100 lights. The light are chosen to be problematic: i.e. they have such a size that they affect a surprisingly large radius around them.

For simplicity, point lights are blended on top of the ambient + directional result, so if an object is affected by at least 1 point light, it get's drawn again. So the 100 floor tiles will result in at least 200 draw calls.

All the lights are 100% dynamic and so are the objects. Very powerful optimization can be achieved for static scenes, but I don't care about those. So for dynamic scenes, a spatial portioning scheme gives me every frame what light affect what object and the engine takes care of batching and blending. The portioning scheme is pretty fast and should handle thousands of lights spread over realistic levels. Since there is overlap in the lighting, the 200 draw calls become 267. Most floor tiles are affected by at least 9 point lights, sometimes more.

The initial version of my scheme used 1268 draw calls to render 100 objects using this lighting setup. 267 has a lot better performance.

One large compromise that was needed to allow this was introducing clip radius to point lights. Beyond the radius no pixels are affected. This is not physically based, but is needed to optimize dynamic lights.

Now to test with some real life scenes, some rooms and corridors.

So my question is: can one achieve a better result without exponentially more effort put into it? These results look pretty good to me. Forward rendering will never have such a batch count as deferred (o + l), but at least I'm not in o * l territory. o + l gives 200.

My blegh: http://dwarvesh.blogspot.com

Hodgman

52,717

April 02, 2014 12:30 PM

These results look pretty good to me. Forward rendering will never have such a batch count as deferred (o + l)

Forward+ has one less (o) than tiled-deferred (o + 1), which has many less than deferred (o + l)

. 22 Racing Series .

DwarvesH

510

Author

April 03, 2014 12:20 PM

These results look pretty good to me. Forward rendering will never have such a batch count as deferred (o + l)
Forward+ has one less (o) than tiled-deferred (o + 1), which has many less than deferred (o + l)

Well I'm still on DirectX 9 so Forward+ is out of the question.

Anyway, that is far too complicated. And far too little documentation on the subject. I'll probably use something like that when it's as common and well documented as physically based rendering with optional material/BRDF layering.

And with the way I'm trying to render things, even deferred becomes far too complicated.

I'm just trying to create the best possible forward renderer under the circumstances than handles simple but fully dynamic and flexible scenes. I could push the render calls to "o" levels, as in rendering every single object once with ambient layers, directional and any number of point lights all using a single pixel shader, but that is much more work and I"m not sure it is worth it.

My blegh: http://dwarvesh.blogspot.com

Mona2000

1,967

April 03, 2014 12:45 PM

Forward+ has one less (o) than tiled-deferred (o + 1), which has many less than deferred (o + l)

Isn't that notation misleading? Forward+ is actually o * 2 (depth pass + main pass).

Hodgman

52,717

April 03, 2014 01:22 PM

Isn't that notation misleading? Forward+ is actually o * 2 (depth pass + main pass).

On Dx9 you can't use a compute shader to build the per-tile light lists (which would use a depth pass as input), so I'd build them on the CPU (like here) and then just render the scene as usual with forward rendering. If you're doing a full 11 version, then yep, I misspoke

You could do a z-pass first to see if it helps reduce overdraw, but that's an optional optimization (you can do the same optimization for deferred if your g-buffer/attribute pass is expensive due to overdraw, e.g. if everything is parallax mapped).

Regular forward is the same though -- you build per object light lists and then draw every object once (or twice if you decide to do a z-pre-pass).

I could push the render calls to "o" levels, as in rendering every single object once with ambient layers, directional and any number of point lights all using a single pixel shader, but that is much more work and I"m not sure it is worth it.

What do you do at the moment - one pass per light per object? Or is there some amount of looping to do multiple lights per draw?

. 22 Racing Series .

DwarvesH

510

Author

April 03, 2014 02:51 PM

DwarvesH, on 03 Apr 2014 - 3:20 PM, said:
I could push the render calls to "o" levels, as in rendering every single object once with ambient layers, directional and any number of point lights all using a single pixel shader, but that is much more work and I"m not sure it is worth it.
What do you do at the moment - one pass per light per object? Or is there some amount of looping to do multiple lights per draw?

Currently I draw every object at least once. The first pass has ambient and directional lights, the things that are constant.

Point lights are drawn in another pass. When moving lights in the world a spatial portioning scheme is updated. Then each object can easily consult the spatial partitioning scheme to determine potential light sources. This potential light sources are the culled based on object bounding box light bounding sphere intersection.

One pass has an arbitrary maximum number of point lights, currently 10. If an object is lit by more than 10 point lights the engine will use a third pass. Any number of lights is supported this was, but I'm hoping that in practice most objects will be lit by at maximum a handful of lights. There are basically no point lights outside except in special places and on the insides each room is lit separately.

There is no looping, each light setup has a loop-less pixel shader. For each render pass, two pseudo DirectX 9 pixel shader constant buffers are set, one light position and light radius packed float4 and one light color and light clip radius packed float4.

So basically one pass + one pass for every 10 point lights. All passes set only once the vertex shader and pixel shader constant buffers, except for the second pass which sets two extra vectors.

So basically pretty complicated but gives exponentially better results than things I tried before.

Potential future directions:

add a few extra permutations to handle common things like ambient + 1 directional + up to 3-5 point lights in one pass. Each lighting setup change requires a pixel shader change.
add all permutations and render all lights in one pass, with a maximum global point light count. Each lighting setup change requires a pixel shader change.
replace the permutations with loop and some sort of dynamic branching/break. One single pixel shader per material type.

My blegh: http://dwarvesh.blogspot.com

How does this SSAA look?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How does this SSAA look?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines