Question about "SPU Usage" from "Deferred Rendering in Killzone 2" Presentation

Recommended Posts

Halifax2    295
I was reading through the Guerilla Games presentation: Deferred Rendering in Killzone 2. Most of it is pretty easy to understand, but I can't seem to comprehend what they are doing with their object rendering system. I'm just going to quote the specific slide (44):
Quote:
 * Everything is data driven - No "virtual void Draw()" calls on objects - Objects store a decision-tree with DrawParts - DrawParts link shader, geometry, and flags - Decision tree used for LODs, etc. * SPUs pull rendering data directly from objects - Traverse scenegraph to find objects - Process object's decision-tree to find DrawParts - Create displaylist from DrawParts
Little bits of this make sense to me, such as iterating over the list of registered objects, grabbing their data, evaluating it and finally adding it to a 'displaylist' to be processed later. But what exactly is the data that the decision-tree holds? Can anyone provide some pseudocode that implements what they speak of?

Share on other sites
Hodgman    51234
Maybe something like this:
class DecisionNode{public:  DecisionNode( const bool& c, RenderNode* t, RenderNode* f ) : pFalse(f), pTrue(t), condition(c) {}  virtual RenderNode* GetChildNode()  {    if( condition )      return pTrue;    else      return pFalse;  }private:  RenderNode* pFalse;  RenderNode* pTrue;  bool& condition;};class Player{  RenderNode lowDetail;  RenderNode highDetail;  bool lowLodMode;  DecisionNode node;  Player() : node(lowLodMode, &lowDetail, &highDetail) { lowLodMode = false;}  virtual DecisionNode& GetRenderDecisionTree() { return node; }};

You create some kind of data-structures, probably with virtual functions (like my GetChildNode func above) that can choose between different renderables. Decision nodes could be used to select the right LOD model, decide whether to render a type of special-effect, etc...

Instead of each object having it's own draw function where you make these decisions using code, you use these 'decision nodes' to describe the conditions for which a renderable should be drawn.

Share on other sites
Halifax2    295
Ah, okay, thanks for the input Hodgman. That definitely makes sense, and is certainly very data-driven. I guess I overcomplicated the situation because in my mind I was imagining something of or relating to the decision trees they use for AI. :D

Well, once again, thanks. I'm going to try to implement this in a little test tech-demo of sorts I guess.

Share on other sites
osmanb    2082
It's worth pointing out that they *probably* aren't making much (any?) use of virtuals, and it's highly doubtful that they'd do anything clever like storing references to external bools. All of the data needed to evaluate decisions is almost certainly in a tight block of data coupled to the object itself. This is meant to run on SPU, after all...

Share on other sites
Halifax2    295
Quote:
 Original post by osmanbIt's worth pointing out that they *probably* aren't making much (any?) use of virtuals, and it's highly doubtful that they'd do anything clever like storing references to external bools.

Yeah, I have to agree, they are probably bagging the vtable.

Share on other sites
Matt_D    247

although its worth pointing out that SPU's have very minimal scratchpad memory, and wont be doing things like large tree traversal etc within the memory they have (they are also not very suited to random access memory pattern tasks). its likely the PPU will do the actual traversal, or they have a nice DMA friendly tree structure that they can process linearly without needing to perform random memory accesses.

they wont be using virtual functions, or playing with vtables, the code will be compact, simple, and with as few branches as possible. think of SPU's as sort of like shaders with local memory, bidirectional DMA, and seriously kickass math performance.

cache, and DMA are critical to SPU's. they are very very powerful, but its tricky to keep them fed with data to process.

anyway, its mostly academic unless you actually have a ps3 to play with. the things to take away from the article are that they hand most of the processing off from the main CPU for render list generation (a good thing to copy, especially on multicore cpu's) and that they are doing a lot of processing, both in shaders, and on CPU. itd be interesting to see actual metrics on time and performance of such a solution. it sure does look good when its all said and done, but damn thats one complex renderer :)

Share on other sites
Quote:
 Original post by Matt_Dthey are also not very suited to random access memory pattern tasks)
True in general but DMA lists are available and do allow it for when its needed. I implemented a radix sort that works on large data sets with random access and can keep it going. I had to pad the individual DMA transfers to 16bytes though

Quote:
 anyway, its mostly academic unless you actually have a ps3 to play with. the things to take away from the article are that they hand most of the processing off from the main CPU for render list generation (a good thing to copy, especially on multicore cpu's) and that they are doing a lot of processing, both in shaders, and on CPU. itd be interesting to see actual metrics on time and performance of such a solution. it sure does look good when its all said and done, but damn thats one complex renderer :)

I think how the evolution of every PS3 engine is like this,

1) Implement renderer on PS3 really quickly to show producers that you got it working
2) Profile - RSX is too slow
3) Move subset of render pipeline to SPU
4) Profile - RSX is too slow
5) Move subset of render pipeline to SPU
..
..
..
9) PPU is too slow
10) Move subset of simlation to SPU
..
..

A few years later you have complicated ping pong of data structures between 5 spu's, a ppu, the RSX, and probably some random middle ware on that 6th SPU.

-= Dave

Share on other sites
Halifax2    295
Yeah, Matt_D, I'm actually working with a PS3 Linux setup and I was interested in trying to implement something similar to Guerilla Games for a little technical demonstration.

Share on other sites
Matt_D    247
Quote:
 Original post by Halifax2Yeah, Matt_D, I'm actually working with a PS3 Linux setup and I was interested in trying to implement something similar to Guerilla Games for a little technical demonstration.

except the PS3 linux side of things doesnt allow for shaders. so you can only do half the job.

Share on other sites
Matt_D    247
Quote:
 Original post by David NeubeltA few years later you have complicated ping pong of data structures between 5 spu's, a ppu, the RSX, and probably some random middle ware on that 6th SPU.-= Dave

i know exactly what you mean :) seen it happen a few times.

luckily ive also seen well designed stuff from the get go. it tends to be a lot stabler, has less bugs and is far easier to maintain than some overly complex spu ping pong :)

nice work on the radix sort :) what kind of throughput were you getting?

Share on other sites
The numbers are in microseconds

data size [      10], qsort:       3, radix:    81, speedup: 0.037037data size [     100], qsort:      50, radix:    85, speedup: 0.588235data size [    1000], qsort:     748, radix:   156, speedup: 4.794872data size [   10000], qsort:    9689, radix:   633, speedup: 15.306478data size [  100000], qsort:  113642, radix:  6943, speedup: 16.367853data size [  500000], qsort:  678878, radix: 40381, speedup: 16.811817data size [ 1000000], qsort: 1365621, radix: 82697, speedup: 16.513550

I have on my todo to optimize DMA transfers and do a couple more SIMD style optimizations, I'm sure I can still get it at least 3-4x faster. I'm also looking at for when the data fits completely into local store to use another algorithm to increase speed.

Also, the timing is for a synchronous sort so not only is this implementation faster then the standard C++ sort but it also has the ability to run parallel for even greater overall frame speed.

Maybe some day when its finished I'll post the algorithm up here.

-= Dave

Share on other sites
Halifax2    295
Well here is a little closure for anyone who is interested. It's basically as expected. This e-mail came from Michal:
Quote:
 I hope I'm going to be able to describe what we do there - it's really no magic involved - we just wanted something simple.The decision tree is quite simple structure - leaf nodes are individual drawparts (drawpart is a structure that contains vertex array, index array, shader, matrices, constants... - the minimum data to render something without any higher logic required) and non-leaf nodes pretty much contain rules/conditions determining what child node to pick - since we don't want to have any virtual calls there, we only have predefined set of conditions such as LOD node (based on distance from camera), "switch" node (explicitly pick certain child) and couple more (just imagine a giant switch statement with all possible node types). The whole tree is one linear block of memory so we can easily DMA it onto SPU, start in the root and go through the decision nodes and output list of the relevant leaf drawparts. We then merge all drawparts for the frame and just render the list.The reason why the tree contains only couple of different node types is because all higher level game logic is done at different point - therefore the representation layer of the game can stay simple.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account