olaolsson

Members
  • Content count

    7
  • Joined

  • Last visited

Community Reputation

142 Neutral

About olaolsson

  • Rank
    Newbie

Personal Information

  1.   The way we implemented it is to allocate a 'reasonable' buffer and then to grow it when (if) needed. I think Emil covered how they deal with this in our talk from Siggraph this year: 'Practical Clustered Deferred and Forward Shading'   This talk should provide a few insights into both the general gist of the algorithm and the practical implementation at Avalanche.   Hope it helps. .ola
  2. Now, just because I'd hate for this to turn into another deferred lighting / shading terminology kerfuffle:   Tiled Forward <=> Forward+, these use 2D tiling (same as Tiled Deferred), with a pre-z pass (optional) + separate geometry pass for shading. Light Indexed Deferred, Builds the lists per pixel, which can be viewed as a 1x1 tile, and then it is really the same as Tiled Forward. The practical difference is pretty big, though... Clustered Forward, performs tiling in 3D (or higher). othwewise as above. Tiled/Clustered Deferred Shading, do tiling as their forward counterparts, but start with a G-Buffer pass and end with a deferred shading pass.   Hope this clears up, and/or prevents, some confusion.
  3. Revival of Forward Rending?

    [quote name='MJP' timestamp='1333393678' post='4927595'] He didn't want to do a reduction because of the extra shared memory pressure that it would add (which makes sense, considering he was already using quite a bit of shared memory for the light list + list of MSAA pixels), but it might be worth it if you're just outputting a light list for forward rendering. [/quote] In my implementation I always build the grid in a separate pass. It's a fairly trivial ammount of extra bandwidth and you remove shared memory limitations, and it is inherently more flexible. I implemented lauritzens single kernel version too, more or less a straight port, but with parallel depth reduction, (was significant at least on a gtx 280), it did not perform as well (but was only fairly marginally slower). [quote name='MJP' timestamp='1333393678' post='4927595'] I wouldn't expect very big gains since the light/tile intersection tends to be a small portion of the frame time, but it could definitely be an improvement. [/quote] Well, since you are brute forcing (lights vs tiles) you just need to ramp the lights up, and voila it'll become an issue sooner or later. This is also highly (light) overdraw dependent, so I think the portion of frame time can vary quite a bit. Sorry to say, I can't run your demo, because I've only got access to a windows xp machine at the moment, so I cant offer any comments based on how your setup looks. [quote name='MJP' timestamp='1333393678' post='4927595'] Everybody always just does point lights in their demos. [img]http://public.gamedev.net//public/style_emoticons/default/tongue.png[/img] [/quote] Yes, guilty as charged... damn those paper deadlines
  4. Revival of Forward Rending?

    [quote name='phantom' timestamp='1333356726' post='4927407'] It would be the same as a normal forward lighting system; render transparent objects back to front. You'd just get early rejection for objects which are behind the layed down z-pass. [/quote] Just note that the restriction applies to lights as well, so when you build the grid you can only reject lights entirely behind the scene (only use max depth). Obiously one could elaborate on this with a min depth buffer, but before you know it we'll have implemented depth peeling Otherwise I think the fact that you can reuse the entire pipeline including shader functions to access the grid, is one of the really strong features of the tiled deferred-forward combo. Easy to to tiled deferred for opaque objects, and then add a tiled forward for transparent, if that is what works. It is very easy to move between tiled deferred and forward shading, and this got to be good for compatibility/scaling/adapting to platforms.
  5. Revival of Forward Rending?

    [quote name='MJP' timestamp='1333335328' post='4927356'] ...If you didn't do this you could build a list of lights just using the near/far planes of the camera, but I would suspect that the larger light lists + lack of good early z cull would cause performance to go right down the drain. [/quote] I did look at that in my paper [url="http://www.cse.chalmers.se/~olaolss/jgt2011/"]'Tiled Shading'[/url] that someone posted a link to above. And the short answer is that no indeed, it does not end too well. On the other hand, I imagine that it would be a useful technique simply to manage lights in an environment with not too many lights in any given location and limited views (e.g. RTS camera or so), as the limited depth span makes the depth range optimization less effective. I've got an open gl demo too, which builds grids entirely on the CPU (so it's not very high performance, just there to demo the techniques). Btw, one thing that could may affect your results that I noticed is that you make use of atomics to reduce the min/max depth. Shared memory atomics on NVIDIA hardware serialize on conflicts, so to use them to perform a reduction this way is less efficient than just using a single thread in the CTA to do the work (at least then you dont have to run the conflict detection steps involved). So this step gets a lot faster with a SIMD parallel reduction, which is fairly straight forward, dont have time to dig out a good link sorry, I'll just post a cuda variant I've got handy, for 32 threads (a warp), but scales up with apropriate barrier syncs, sdata is a pointer to a 32 element shared memory buffer (is that local memory in compute shader lingo? Anyway, the on-chip variety.). [code]uint32_t warpReduce(uint32_t data, uint32_t index, volatile uint32_t *sdata) { unsigned int tid = index; sdata[tid] = data; if (tid < 16) { sdata[tid] += sdata[tid + 16]; sdata[tid] += sdata[tid + 8]; sdata[tid] += sdata[tid + 4]; sdata[tid] += sdata[tid + 2]; sdata[tid] += sdata[tid + 1]; } return sdata[0]; }[/code] Same goes for the list building, where a prefix sum could be used. Here it'd depend on the rate of collisions. Anyway, thinking this might be a difference between NVIDIA and AMD (Where I don't have a clue how atomics are implemented). As a side note, it's much more efficient to work out the screen space bounds of each light before running the per tile checks, saves constructing identical planes for tens of tiles, etc. Anyway, fun to see some activity on this topic! And I'm surprised at the good results for tiled forward. Cheers .ola
  6. [quote name='Ardilla' timestamp='1325668370' post='4899522'] mmm, interesting, Im going to implement a light volume technique in a first moment (I understand it better), and then I will try to implement the tile-based to see the performance difference [img]http://public.gamedev.net//public/style_emoticons/default/smile.png[/img] . Thanks for the answers! [/quote] So, to underline the main difference: Traditional deferred shaing is typically memory bound, whereas tiled deferred shading completely eliminates this bottleneck and is squarely compute bound. Given this, you can get an idea of how much better it will perform on your platform, either by looking at performance numbers, or by simple experimetation (e.g. vary G+Buffer bit depth). Both xbox 360 and PS3 have a very high compute to bandwidth ratio, and this is true for modern GPUs as well, and increasingly so. As I found in my experiments, going between GTX 280 and GTX 480, shading performance doubles for tiled deferred, whereas my implementation of traditional deferred shading scales by the expected 30%, corresponding to the increase in memory bandwidth. Anyway, of course, if you have massively complex shaders you may not be memory bandwidth bound (yet) but its a pretty safe bet you will be sooner or later as memory bandwidth fall further and further behind. If rumours about GTX 680 are to be believed we'll see this gap widen significantly again in this new generation. Cheers .ola
  7. Hi, Just thought I'd point you towards a paper about tiled shading, and associated OpenGL demo, by, *ahem*, myself. The paper is sadly paywalled by JGT, but I've put up a preprint, which is not hugely different from the published paper (it contains some bonus listings that were removed dues to space restrictions), on my web site. You may be able to access the published paper from a uni library or similar. [url="http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&id=tiled_shading"]http://www.cse.chalm...d=tiled_shading[/url] The main takeaway is a much more thorough performance evaluation and analysis, the introduction of tiled forward shading (which enables easy handling of transparent geometry). In relation to the discussion here. I go a different way to the others and do the tile intersection by first transforming the lights to screen space, and then testing the screen space extents against each tile. On the CPU I do it scan line fashion, which is as efficient as it gets, but somewhat hard to do in parallel. Therefore the GPU version does a brute force tiles-test-all-lights approach, much like others have done, but with a much cheaper aabb/aabb test (2D extents + depth range). This saves constructing/testing identical planes all over the place. The demo only implements the CPU variety, and without depth range (though I may update that). Hope you find this useful. Cheers .ola