Better/faster transparency with software rasterization

Started by
7 comments, last by Krohm 13 years, 1 month ago
As I'm close to be done with what I'm doing "soon", I have to start thinking about what I'll do next. As a coincidence, I've seen a few threads about transparency recently and even managed to find the links I needed.
Let's start from the problem I am having now: I (in the sense of my system) am bad at transparency (in the sense of blend). What I'm currently doing is the following: there's a limited set of blending operations available in the material definition, if a blending operation is specified, it looked up to understand if it's order dependant.
If it is, the objects using the order-dependant material will be rendered back-to-front.

I already have a few problems up to now.
  1. The blending operations are essentially fixed pipeline. This is a bit of an anticlimax to me considering all the rest is largely shader-driven. I'd like to hear some opinions on how to make this look "more programmable" than it really is. The main problem to me would then be to find out if an operation needs to preserve order or not.
  2. It is unclear how, for example "rusted windows" might have to interact as they truly have to force full background render. I know some systems have a specific per-material blend priority setting but I cannot quite make it work in my head. It seems to me that a full sort would suffice even for some effects. The main trouble I see here is, for example, transparent decals on transparent glass windows but in line of concept, the decals would be closer to the camera and render last.
  3. Because the sort is currently per-object, some objects won't render correctly against themselves. Given back face culling, the small amount of demos I've given up to now and the complete lack of goblets, I've managed to get satisfactory results.

I can live with (1) and (2) for now but I really feel the need to deal with (3) so I already put in place some of the machinery required to sort object batches. I don't plan to resolve per-triangle sorting for the time being, it seems way too much work. Goblets would still not render correctly but objects using multiple materials will as long as they play nice.
This would potentially introduce the need of a lot of batches. That made me think about the possibility to evaluate overlap on a finer basis. I admit it's probably not completely my idea, I think I got the inspiration from Dice's presentations about software occlusion culling in Frostbite.

So what I was thinking is this:
  1. always generate order-dependant geometry as triangles (user set a special flag if material is order-dependant). This might take some minor work as some assets might be using strips.
  2. Rasterize z (or "some metric") for the triangles obtained (probably 1/4 res), when overlapping, put the overlapping batch "on a new layer".
  3. Merge batches of triangles in the same layer.
  4. Draw in order.

For me, this will result in generally low batch count... perhaps too low. I see this useful for the future but it's a distant one and it sounds more like a premature optimization than anything else. It is unclear how this will fit in the system for dynamic geometry.
What's really troublesome it's the way geometry would have to get transformed as nobody says it'll be a standard MVP transform. Because of a previous design, this is not a problem but performance will be terrible (probably a very good candidate for multi-threading). As it stands now, the system won't consider opaque geometry which would result in extra, unnecessary passes, but I don't see any possibility of transforming world geometry in the long run (I think it would be viable right now).

At this point, I might even try full OIT for what it takes!

But there's something else with the software rasterization when it comes to particle systems, which is another system I need to bump up quite a bit!
Since particle systems are good mostly for fuzzy objects, I could somehow instruct the z-resolver in issuing "no more than N layers". This would result in more efficient usage of blending operations for the interior part of the PS, which could probably go unnoticed given sufficient particles. It's ironically a very "fuzzy" idea.

Suggestions and further considerations are welcome.

Previously "Krohm"

Advertisement
I might be speaking nonsense, but would it be possible to do per-pixel sorting like the PowerVR2 did? It was a tile-based renderer which worked without a full z-buffer, and it could do pixel-perfect translucency sorting on the hardware (the reason why Dreamcast emulators always have translucency glitches). There's an old article on how it works here:

http://www.beyond3d.com/content/articles/38/2

It is possible using DX11. I recommend you to read this : http://diaryofagraph...ansparency.html
Edit : Cannot access Wolfgang website now, search for pixel linked list and translucency.

I might be speaking nonsense, but would it be possible to do per-pixel sorting like the PowerVR2 did?
You're not talking nonsense... the software rasterizer above would do just that (but on "CPU fragments" as opposed to "real" fragments).



It is possible using DX11.
I already know that. Problem is, I am very close to having green light for D3D10 (ten) with low feature set. If memory serves, that implemented a full CS fragment sort using lists... or perhaps it was about stochastic sampling?
Seriously, I'll consider myself blessed if I'll be able to get green light for D3D10 by the end of the month.

Previously "Krohm"

AMD has a techdemo that solved the problems already

http://developer.amd.com/samples/demos/pages/atiradeonhd5800seriesrealtimedemos.aspx

here is the paper that explains how it's done:

http://developer.amd.com/gpu_assets/OIT%20and%20Indirect%20Illumination%20using%20DX11%20Linked%20Lists_forweb.ppsx




Yes, software rasterizer can solve that problem, I've implemented one for exactly that purpose some time ago (but it was for pre-visualization in a content creation software, so 5fps was fine, as long as it works properly, on quite high poly counts, tho).








AMD has a techdemo that solved the problems already

http://developer.amd...ltimedemos.aspx

here is the paper that explains how it's done:

http://developer.amd...sts_forweb.ppsx
I totally forgot about that! The algorithm is very convincing and has a nice feeling to me. But... I'm not quite sure I can ask for PS5, a lot of interlocked operations. It would definitely be a "total" solution. I will have to think at this. Perhaps another temporary solution will be sufficient until this can be implemented on my target hardware.

Yes, software rasterizer can solve that problem, I've implemented one for exactly that purpose some time ago (but it was for pre-visualization in a content creation software, so 5fps was fine, as long as it works properly, on quite high poly counts, tho).
I take you were rendering everything (not only Z) so I suppose this is good news for me!
Thank you very much for sharing the experience!

Previously "Krohm"

Here is the link i was looking for Before : http://www.confettispecialfx.com/order-independent-transparency-ii

It is possible using DX11. I recommend you to read this : http://diaryofagraph...ansparency.html
Edit : Cannot access Wolfgang website now, search for pixel linked list and translucency.


Wow, I really should read more on DX11. I didn't know that in SM5.0 you could have a pixel shader output to arbitrary positions in non-color non-depth general purpose buffers and perform interlocked operations on shared data! This is crazy, so much possibilities! I'm watching AMD's per-pixel linked list presentation and my mind is blown.
Ok, I haven't got any feedback about availability of Shader Profile 5 for the future, but I'm fairly sure it will be negative.

I also took some time to consider the ROI of the various systems here and the subtle problem of sorting things. Long story short: none of the above will be implemented. I will simply improve sorting to sub-mesh level with "blend priorities" below and nothing else as it seems to be the only thing with a ROI I can sustain.
My main problem was resolving conflicts for which metrics would resolve to similar issues. I have considered how to fix this with little success - this issue is not explicitly noted in the "batch sorting" article (it's clear that it can be somewhat be used to solve the problem however) and as far as I understand, is just completely ignored by the PS5 linked list method. The only viable way seems to have the assets/shaders specifically provide that information. This is a serious source of uncertainty for me as I cannot really trust anybody around here understands the numbers involved but this does not seems to be a problem for the short to medium term anyway...

So, here's what I'll do. Have a set of "shader blend types" such as
  1. Opaque: draws before everything else. Cannot really be specified, this is used for everything not having blend parameters.
  2. Commutative: default for everything having blend parameters which are recognized to be additive/order independent.
  3. Transparent: the geometry is to be rendered in-order back-to-front

Those can be automatically inferred (good).
Commutative geometry sliced between two transparent geometry sets will render in undefined order (by bucket) but before the nearest transparent object.
2 and 3 will have to be sorted (I don't think I'll have a batch-key sort for now as I need to better finalize various subsystems first). There's the problem of resolving conflicts as the metric might eventually fail to produce clear cuts. I was thinking about something like a set of "blend order" values such as
  1. Base: default value assigned. When two objects are considered so close to each other that they cannot be "reasonably" sorted, the base blend order is rendered first.
  2. Coplanar/decal: used for mark on walls (very likely with polygon offset), supposed to be used to model dirt or stuff that adheres to the "base" surface.
  3. Near: used to model objects that will likely end up very close to other planes, while modelling a different object. I'm not quite sure I need this. It's meant to be some sort of "super-decal". Perhaps Coplanar and Nearest should be swapped in order. I see some uses for this mainly with certain special effects based on multipass techniques.
  4. Top: always guaranteed to render last in the hierarchy of "coplanar" geometry. I cannot quite figure out what this could be used for but it seems reasonable to have at least an extra layer.

I wonder if I am able to explain the problem I'm thinking about. Perhaps those could be floating point values and allow the asset to specify a specific blend order such as "2.2". The problem I'm trying to solve here is solving the issue of ordering stuff which is essentially coplanar by building an extra "virtual depth" to use (and just hope ppl doesn't mess up with the values).
At the end, each system should reorder everything needed according to those rules. The order in which the various subsystems are rendered for the time being, will be left implicitly defined by current design.

In the future, this will be referred as "fast" transparency system as opposed to the future "nice" transparency system which will be based on the Pixel Shader 5 and/or the original software rasterization technique (which is really orthogonal to what's going on).

Can anybody see any drawbacks? I seriously hate this for some reason and I'd like to make sure I don't touch this anymore for a while.

Previously "Krohm"

This topic is closed to new replies.

Advertisement