Jump to content

  • Log In with Google      Sign In   
  • Create Account

Frenetic Pony

Member Since 30 Oct 2011
Offline Last Active Today, 06:01 PM

#5163386 Simulating lighting using volumetric meshes.

Posted by Frenetic Pony on 28 June 2014 - 01:44 AM

Sounds a lot like Crytek's Light Propagation Volumes, which never had enough precision for anything more than secondary illumination and increases memory requirements by squared or more as you extend the range of the illumination. Alternatively it also sounds like the same hack used by Bungie in Destiny/Irrational in Bioshock Infinite/Epic currently in UE4. All of them render lighting information to a volume texture which is then used for deferred rendering of transparencies. Trouble there is it's all too low of a resolution to get very good shadow information or good specular.


Here are some of the examples: http://advances.realtimerendering.com/s2013/Tatarchuk-Destiny-SIGGRAPH2013.pdf   http://www.crytek.com/download/Light_Propagation_Volumes.pdf

#5158799 Screenspace Shadow Mapping Help!?

Posted by Frenetic Pony on 06 June 2014 - 04:13 PM


"Efficient virtual shadow maps for many point lights" It's

I know this paper, it is still more theocrafting than practically useful (~15-20ms frametimes on NVIDIA GTX Titan GPU + Intel Core i7-3930K CPU isn't awesome for games yet). I've hoped for a more pratically useful solution, we will see what useful changes happens once the new API approaches/consoles kicks in.



Found that with shadow map caching it can work well for many dozens of lights. But indeed while hundreds may "possible" it's not practical on most systems. And you need all the overhead of the culling scheme, which is great if you're targeting hundreds of point lights to begin with. But we're still a long way off from having a city scene with hundreds of proper point lights, and I suspect that will just have to be brute forced one way or another.

#5158623 Screenspace Shadow Mapping Help!?

Posted by Frenetic Pony on 06 June 2014 - 12:33 AM

"Efficient virtual shadow maps for many point lights" It's basically an extension of clustered shading to also allow culling for shadow mapping, along with a few hacks (the "virtual" part) for the maps themselves.


So if you've already got clustered deferred/forward going on your halfway there, which is nice.

#5157987 Metal API .... whait what

Posted by Frenetic Pony on 03 June 2014 - 08:05 PM

We are now in a fun situation where 3 APIs look largely the same (D3D12, Mantle and Metal) and OpenGL - while this won't "kill" OpenGL the general feeling outside of those who have a vested interest in it is that the other 3 are going to murder it in CPU performance due to lower overhead, explicate control and the ability to setup work across multiple threads.

It'll be interesting to see what, if any, reply Khronos has to this direction of development because aside from the N API problem the shape of the thing is what devs have been asking for (and on consoles using) for a while now.


This is why I just want something like this from OpenGl, at least on the driver overhead front and if possible (hardware guys make it so!) with memory control. 1 API to rule them, One API to run them, One API to render them all, and in the code bind them (or bindless if that's your thing).


But that's Khronos, at least I got a Lord of the Rings reference out of them.

#5157946 Ideas for rendering huge vegetation (foliage)

Posted by Frenetic Pony on 03 June 2014 - 03:41 PM

The Unigine guys had the great idea of building multiple billboard impostors for all their assets. They would, as far as I know, import any foliage asset and sample its image from multiple places across a sphere. Then just render the impostor closest to the viewing angle, and batch render as many as possible to keep CPU overhead low. Because they're all far away you can keep it low res and low memory, and because you've got a full sphere estimation they even inject them into shadow maps for shadow casting.


You don't notice much in the way of parallax error either as its only used for distant stuff. For closeup stuff take a look at Crytek's fancy grass management stuff: http://crytek.com/download/Sousa_Tiago_Rendering_Technologies_of_Crysis3.pptx

#5155931 Doing local fog (again)

Posted by Frenetic Pony on 25 May 2014 - 02:54 PM

For particles, "Weighted blended order independent transparency" should be helpful: http://jcgt.org/published/0002/02/09/ performant OIT for non refractive stuff. As I saw on twitter concerning rendering "It's all smoke and mirrors. Except smoke and mirrors, that's hard to render."


And yeah that Lords of the Fallen paper is great. I can already see multiple games implementing something like it (Some people at Ubisoft did something fantastically similar for AC already) and artists just abusing the heck out of it. A million godrays blinding you in every level here we come.


Ninja edit to your edit- Yeah smoke should definitely be done differently, as you're doing two different phenomena. "Fog" represents particles smaller than the wavelength of the light, thus scattering the results but not absorbing. Smoke has particles bigger than the wavelength and causes direct absorption.


If you're going deferred the Lords of the Fallen guys have a neat per vertex deferred for small particles that they use for smoke. If you're going forward there are ways to make forward lit particles and Z-blurring work at the same time. Doing a lot of particles today should only be a problem depending on your targeted systems. There are nice ways to batch everything and avoid overdraw, so if you've got the performance then thousands of particles (and more) is doable with some work.

#5152424 Kinds of deferred rendering

Posted by Frenetic Pony on 08 May 2014 - 05:17 PM

Like ATEFred already said, there are various techniques in popular use depending on the platform. To decide which is best, you really need to have a solid idea of what hardware you're targeting and what you need from your renderer. Tiled deferred in a compute shader will generally give you the best peak performance for many lights, but you need hardware and API's that support that sort of thing. Light prepass or tiled forward can be useful if there's a restriction on render target sizes, for instance on mobile TBDR GPU's or the Xbox 360 GPU.


For high end the popular choice is clustered forward/deferred. You can go deferred for opaque/generically shaded objects, while translucency/special lighting models can use forward. It's nice mostly because explicitly handles both at once while handling a large, or even very large number of lights better than anything, along with other fancy possibilities if you go for full cluster culling: http://www.humus.name/Articles/PracticalClusteredShading.pdf


Like MJP said though, you need the hardware to support it. Light pre-pass/forward+ is more popular for mobile solutions.

#5149041 Global illumination techniques

Posted by Frenetic Pony on 23 April 2014 - 02:50 PM

Good stuff, thanks Agleed!


Speaking of which, Lionhead seems to have advanced Light Propagation Volumes along: http://www.lionhead.com/blog/2014/april/17/dynamic-global-illumination-in-fable-legends/


Unfortunately there's no details. But I guess that means it should be somewhere in UE4, though I didn't see it. Still, occlusion and skybox injection is nice. It still seems a fairly limited ideal, you'd never get enough propagation steps to get long range bounces from say, a large terrain. But at least it would seem more usable for anyone looking for a practical solution that they can get working relatively quickly. And hey, maybe you can use a handful of realtime cubemaps that only render long distance stuff, and just rely on the volumes for short distance.


Could go along nicely with using a sparse octree for propogation instead of a regular grid: https://webcache.googleusercontent.com/search?q=cache:http://fileadmin.cs.lth.se/graphics/research/papers/2013/olpv/ which trades off more predictable performance impact for less memory and further/faster propagation. Assuming they don't use as such already.

#5128292 Volumetric lighting

Posted by Frenetic Pony on 02 February 2014 - 07:21 PM

Epipolar sampling is one of the main speedups. Basically instead of raymarching naively you raymarch in a regular fashion with samples radiating from the screenspace position of the lightsource out to the edges of the screen. Then take into account edge detection, which can again be done in screenspace, for high contrast variations and you suddenly have a lot less samples to go through.


1d Min/Max take advantage of the above. Epipolar sampling gives you what looks sort of like a 1d heightmap, which is then used to speed things up again.


A gross simplification, but I hope I just wrote something coherent enough. The intel paper ends up with only a little over 2ms on a GTX680, at least with their lowest quality setting.

#5128239 Volumetric lighting

Posted by Frenetic Pony on 02 February 2014 - 02:51 PM

Pre-filtered single scattering: http://www.mpi-inf.mpg.de/~oklehm/publications/2014/i3d/prefiltered_single_scattering-i3DKlehm2014.pdf


Similarly, and with explicit point light support: http://software.intel.com/en-us/blogs/2013/03/18/gtd-light-scattering-sample-updated

#5126811 Is it normal for 4x MSAA (with custom resolve) to cost ~3ms vs no MSAA?

Posted by Frenetic Pony on 27 January 2014 - 03:58 PM

I have a little more data. Today, I set it up to run normal (non-explicit) MSAA that just blits the multisampled texture to a texture, and then renders that texture to the back buffer, so that I could do a performance comparison. With a scene of perhaps medium-low complexity, running simple shaders (forward-rendered), tone mapped, but no other post-processed effects, I have the following render times:


No MSAA: 1.89ms

4x MSAA (standard): 2.67ms

4x MSAA (explicit): 4.69ms


Again, this is at 1080p with a GTX 680.


So, about a 0.77ms difference with just plain MSAA and a 2.81ms difference for explicit MSAA, which means that the explicit MSAA is costing me an extra 2ms. Oddly, this number goes down slightly (1.79ms) if I reduce the scene complexity a bit. This is somewhat alarming, because I don't see how scene complexity can affect the MSAA resolve (I'm blending all four pixels regardless of whether they're on an edge or not; maybe the default resolve does something smarter).


So, I don't know. I don't have much of a choice, it seems. If I want to do tone mapping and gamma correction correctly, explicit multisampling seems to be the way to go. The best I can do is profile and make sure that I'm optimizing my app/shader code. *shrug*


Can always just ditch MSAA. Give alternate AA techniques a look, personally I'm a fan of Crytek's implementation of SMAA: http://www.crytek.com/download/Sousa_Graphics_Gems_CryENGINE3.pdf


Shouldn't take much more than your standard MSAA results on a 680, and looks about as good I'd say. Plus if you're using deferred you don't have to worry about fiddling with transparencies, as it's essentially all post.

#5113934 Improving cascade shadow

Posted by Frenetic Pony on 02 December 2013 - 09:54 PM

yes, 3x3 pcf/soft shadow. (blurring)  But, still seeing 'chain-saw' edges, especially in distant cascades .... What type of filtering can improve the situation? maybe, EVSM? 


Variance shadow mapping (Of one implementation or another) can indeed get a lot softer (without noise) compared to PCF. I'm pretty sure Crytek uses VSM on their most distant cascades both to hackily simulate large penumbras as well as avoid any glaring aliasing.


Still, blurring is only going to get you so much. The first time I saw temporal aliasing was for shadows anyway, so you might want to try that. Crytek (again) had a reprojection technique to avoid ghosting for temporal aliasing. It was for scene aliasing rather than shadow aliasing, but the same principles should apply. You can find it somewhere under Crytek's presentations page. http://www.crytek.com/cryengine/presentations

#5111107 Cascaded Shadows maps, texture atlas or texture array?

Posted by Frenetic Pony on 21 November 2013 - 04:04 PM

I'v already done some experiments in my own engine and here are results (everything tested on complex scene (with large amount of small/medium scale vegetation objects (most of them instanced) with about 3 milion vertices casting shadows, 30% of them are skined, nvidia 680m, i7):


1] draw everything once, use GS to replicate vertices and use 2048x2048 texture with 4 1024x1024 quaters - SLOOOW - you need to use 2 custom clip planes to clip to the quater of atlas - GS is main botleneck (performance of whole frame around FPS = 21.1)


2] draw everything once, use GS to replicate vertices and use 1024x1024x4 texture array - SLOOW - but better than previous since no clipping planes are needed - GS is main bottleneck (FPS = 22.4)


3] draw everything once, into 2048x2048 texture with 4 1024x1024 quaters - this time for every drawcall multiply instances count by 4, and in VERTEX shader use (InstanceIndex&0x3) to output into specific quater of atlas (again 2 custom clip planes used) - FAST, FPS = 42.7 !!! (twice as fast as with GS path) - this time the bottleneck is in vertex shader for all those skined vertexes.


4] use texture array, but for each cascade submit their own set of draw calls, THERE _IS_ oprortunity to clip them independently, so the win is total number of vertices processed by VS (for points 1, 2 the total was 3M, for 3 it was 12M!, for 4 it was 6M) but the loos is total number of batches (for 1, 2, 3 it was 972, for 4 it was 1944)

FPS = 42.1 - if all is submited to base context, 44.1 if 4 deferred contexts are used and each is created on different sheduler task, then all of them are submited at once  into base context)


for 3 there is probably chance to outperform 4 if some neat way of clipping is introduced, but for now i have no time for this and i'm stick with it as-is since i need to get with batches as low as possible since other parts of engine demands them.


blink.png That seems quite low in terms of performance, but hey context and unoptimized. Thanks for sharing, a doubling in performance seems pretty clear.

#5110020 Cascaded Shadows maps, texture atlas or texture array?

Posted by Frenetic Pony on 17 November 2013 - 04:32 PM

If you want to do some profiling by all means share results!

#5108544 Voxel Cone Tracing Experiment - Part 2 Progress

Posted by Frenetic Pony on 11 November 2013 - 03:33 PM

That's a similar idea to what others already did, which is just downsample before tracing and then upsample the results (with some trickery for fine edges). The main problem with just doing cells is that an always present (and temporally stable) specular term is part of the thing that really sells GI to begin with. Still, it's an idea if you're really performance bound.


I think I mentioned a similar idea but just for particles, which are going to be diffuse only anyway for the most part and would be really helpful with layers of transparency. And now that I think about it, it would also work well for highly distant objects. While specular doesn't actually fall off of course, anything but primary specular (say from the sun) shouldn't be too noticeable really far away.


As for transparency, "inferred" or stippled transparency rendering would be really useful for cone tracing. I'm not sure you could also downsample the tracing simultaneously, but it would still prevent tracing from multiple layers of transparency.


As for using a directed acylic graph. I've been thinking that you'd need to separately store albedo/position information, mipmap that, and then figure out a way to apply lighting to different portions dynamically and uniquely using the indirection table. If you're missing what I'm talking about, a Directed Acylic Graph would converge identical copies of voxel areas into just one copy, and then use a table or "indirection table" to direct the tracing to where each copied block was in worldspace.