How does this SSAA look?

Started by
18 comments, last by DwarvesH 10 years ago

Hey! How is bench-marker coming along?

Advertisement

Hey! How is bench-marker coming along?

Really really bad! smile.png

Not because of the SSAA, which seems to be pretty well behaved, but because of the other 10 billion tasks that take up my time.

I spent some time finalizing my permutations system with custom binary indexed shader blobs.

And this week, with all the base created, I started heavy duty porting form XNA to SharpDX of my game. I ported almost 200 KiB of code this week and now I need to test everything thoroughly before moving on to the rest.

And I couldn't even do that, because I needed a new texture manager GUI so I've spent today writing this:

http://dl.dropboxusercontent.com/u/45638513/mat05.png

With the bulk of that time being spent on updating the ListBox control to support multi-column modes and custom item rendering.

As for the "ramset", it stats off with an average of about 120000 but eventually it goes up to 190000. It must be caching.

alrright, this is about to be quick, (I mean 0.12-0.19 milisecond)

On my old pentium 4 it was about 1.2 ms, On core2 duo it was about 0.26 (0.22 -0.3)


If you can help in testing you could be run my ram set small benchmark

https://www.dropbox.com/s/d0epr8d1drsa4bs/ramset.zip

and say how your result is... (no maloware just set of 1mb ram to zero)

As soon as I started it up, it was in the range of 190,000 - ~200,000. And after a little bit, this showed up:

4izb.png

You must be incidentally pressed 'a' key, i was doing some

timer under that (dont even remember what it measured), so

this number is not revelant, 0.19 -0.20 is number of ramset

(not much better than my old core2duo here (0.22 -0.3 mean about 0.26) this eould be my 4GB/s against your 5 GB/s and dwarwesh 5 GB - 8 GB/s

OK, I am almost at the point where I can start the final bench-marking and assessment if the engine has adequate performance or not.

But before I had to write and finish the most powerful forward renderer I was able to.

I created this test scene:

http://dl.dropboxusercontent.com/u/45638513/l01.png

Since this is forward rendering, using multiple points lights is problematic if a lot of them affect a single object. So I'm rendering there 100 floor tiles. And I'm rendering 100 lights. The light are chosen to be problematic: i.e. they have such a size that they affect a surprisingly large radius around them.

For simplicity, point lights are blended on top of the ambient + directional result, so if an object is affected by at least 1 point light, it get's drawn again. So the 100 floor tiles will result in at least 200 draw calls.

All the lights are 100% dynamic and so are the objects. Very powerful optimization can be achieved for static scenes, but I don't care about those. So for dynamic scenes, a spatial portioning scheme gives me every frame what light affect what object and the engine takes care of batching and blending. The portioning scheme is pretty fast and should handle thousands of lights spread over realistic levels. Since there is overlap in the lighting, the 200 draw calls become 267. Most floor tiles are affected by at least 9 point lights, sometimes more.

The initial version of my scheme used 1268 draw calls to render 100 objects using this lighting setup. 267 has a lot better performance.

One large compromise that was needed to allow this was introducing clip radius to point lights. Beyond the radius no pixels are affected. This is not physically based, but is needed to optimize dynamic lights.

Now to test with some real life scenes, some rooms and corridors.

So my question is: can one achieve a better result without exponentially more effort put into it? These results look pretty good to me. Forward rendering will never have such a batch count as deferred (o + l), but at least I'm not in o * l territory. o + l gives 200.


These results look pretty good to me. Forward rendering will never have such a batch count as deferred (o + l)
Forward+ has one less (o) than tiled-deferred (o + 1), which has many less than deferred (o + l) wink.png


These results look pretty good to me. Forward rendering will never have such a batch count as deferred (o + l)
Forward+ has one less (o) than tiled-deferred (o + 1), which has many less than deferred (o + l) wink.png

Well I'm still on DirectX 9 so Forward+ is out of the question.

Anyway, that is far too complicated. And far too little documentation on the subject. I'll probably use something like that when it's as common and well documented as physically based rendering with optional material/BRDF layering.

And with the way I'm trying to render things, even deferred becomes far too complicated.

I'm just trying to create the best possible forward renderer under the circumstances than handles simple but fully dynamic and flexible scenes. I could push the render calls to "o" levels, as in rendering every single object once with ambient layers, directional and any number of point lights all using a single pixel shader, but that is much more work and I"m not sure it is worth it.

Forward+ has one less (o) than tiled-deferred (o + 1), which has many less than deferred (o + l) wink.png

Isn't that notation misleading? Forward+ is actually o * 2 (depth pass + main pass).

Isn't that notation misleading? Forward+ is actually o * 2 (depth pass + main pass).

On Dx9 you can't use a compute shader to build the per-tile light lists (which would use a depth pass as input), so I'd build them on the CPU (like here) and then just render the scene as usual with forward rendering. If you're doing a full 11 version, then yep, I misspoke smile.png

You could do a z-pass first to see if it helps reduce overdraw, but that's an optional optimization (you can do the same optimization for deferred if your g-buffer/attribute pass is expensive due to overdraw, e.g. if everything is parallax mapped).

Regular forward is the same though -- you build per object light lists and then draw every object once (or twice if you decide to do a z-pre-pass).

I could push the render calls to "o" levels, as in rendering every single object once with ambient layers, directional and any number of point lights all using a single pixel shader, but that is much more work and I"m not sure it is worth it.

What do you do at the moment - one pass per light per object? Or is there some amount of looping to do multiple lights per draw?

DwarvesH, on 03 Apr 2014 - 3:20 PM, said:
I could push the render calls to "o" levels, as in rendering every single object once with ambient layers, directional and any number of point lights all using a single pixel shader, but that is much more work and I"m not sure it is worth it.
What do you do at the moment - one pass per light per object? Or is there some amount of looping to do multiple lights per draw?

Currently I draw every object at least once. The first pass has ambient and directional lights, the things that are constant.

Point lights are drawn in another pass. When moving lights in the world a spatial portioning scheme is updated. Then each object can easily consult the spatial partitioning scheme to determine potential light sources. This potential light sources are the culled based on object bounding box light bounding sphere intersection.

One pass has an arbitrary maximum number of point lights, currently 10. If an object is lit by more than 10 point lights the engine will use a third pass. Any number of lights is supported this was, but I'm hoping that in practice most objects will be lit by at maximum a handful of lights. There are basically no point lights outside except in special places and on the insides each room is lit separately.

There is no looping, each light setup has a loop-less pixel shader. For each render pass, two pseudo DirectX 9 pixel shader constant buffers are set, one light position and light radius packed float4 and one light color and light clip radius packed float4.

So basically one pass + one pass for every 10 point lights. All passes set only once the vertex shader and pixel shader constant buffers, except for the second pass which sets two extra vectors.

So basically pretty complicated but gives exponentially better results than things I tried before.

Potential future directions:

  • add a few extra permutations to handle common things like ambient + 1 directional + up to 3-5 point lights in one pass. Each lighting setup change requires a pixel shader change.
  • add all permutations and render all lights in one pass, with a maximum global point light count. Each lighting setup change requires a pixel shader change.
  • replace the permutations with loop and some sort of dynamic branching/break. One single pixel shader per material type.

This topic is closed to new replies.

Advertisement