Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 19 Sep 2007
Offline Last Active Today, 06:24 AM

#5193843 Inner edge detecting

Posted by ATEFred on 20 November 2014 - 12:48 PM

A pretty effective and simple way can be analysing your depth buffer. Using a sobel filter is pretty common for that kind of operation. http://homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm

#5172828 Visual Studio 2013 graphics debugger vs Nvidia Nsight?

Posted by ATEFred on 11 August 2014 - 11:04 AM

My big problem with NSight is many of the really great features require you to remote in from another machine that is running your application.


Have you tried RenderDoc? It's free and an amazing graphics debugger.

If you are referring to stepping through shader code, this is no longer the case. You can now do everything on one single (nv) gpu machine. 
I think NSight is way superior to the vis graphics debugger. If you have an NV gpu, it is really worth it. It also works with OGL if that is of any use to you.

#5152252 Kinds of deferred rendering

Posted by ATEFred on 08 May 2014 - 02:45 AM

There are commercial game engines that use all of these approaches. Frostbite 3 uses the tiled CS approach, the Stalker engine was the first to use the deferred shading approach that I know of, loads of games have used light prepass ( especially 360 games to get around edram size limitations / avoid tiling ), Volition came up with and used inferred in their engine, forward+ seems to be one of the trendier approaches for future games, not sure if anything released uses that already.

The main thing is for you to decide what your target platforms are and what kind of scenes you want to render. (visible entity counts, light counts, light types, whether a single lighting model is enough for all your surfaces, etc.) 

For learning purposes though, they are all similar enough that you can just pick a simpler one (deferred shading or light prepass maybe), get it working, and then adapt afterwards to a more complex approach if needed. 

As for docs / presentations, there are plenty around for all of these. I would recommend reading the GPU pro books, there are plenty papers on this. Dice.se has presentations on their website you can freely access for the tiled approach they used on bf3. GDC vault is also a great place to look.

You can also find example implementations around, like here:
(authors are active one this forum btw)

#5152037 Kinds of deferred rendering

Posted by ATEFred on 07 May 2014 - 08:07 AM

The best way will depend on the type of scene you have and your target hw. They are mostly pretty similar, but here is an overview of a few of the popular ones:
- deferred shading, based on generating a gbuffer for opaque objects with all properties needed for both lighting and shading, followed by light pass typically done by rendering geometrical volumes or quad for each light and a fullscreen pass for sunlight, followed by composite pass where lighting and surface properties are combined to get back the final shaded buffer. This is followed by alpha passes, often with forward lighting, and post fx passes, which can use the content of the gbuffer if needed. Full or partial Z prepass is optional. Advantages include potentially rendering your scene only once and 

- light prepass/ deferred lighting involves the same kind of steps, only with a minimal gbuffer containing only what you need for the actual lighting ( often just depth buffer + one render target containing normals + spec power ), the same kind of light pass, but then another full scene rendering pass to get the final colour buffer. This means loads more draw calls, but much lighter gbuffers, which can be handy on HW with limited bandwidth, limited support for MRTs, or limited EDRam like the 360. Also gives more flexibility than the previous approach when it comes to object materials, since you are not limited to the information you can store in the gbuffer.

- inferred rendering, which is like light prepass, only with a downsampled gbuffer containing material IDs, downsample light pass, but high res colour pass which uses IDs to pick the correct values from the light buffer without edge artifacts. Kind of neat way of doing gbuffer and light pass much faster at the cost of resolution. Can also be used to store the alpha object properties in the gbuffer with a dithered pattern, and then excluding the samples you don't want / not for that layer during the colour pass. So no more need for forward lighting for alpha objects (up to a point).

- tiled deferred involves not rendering volumes or quads for your lights, which can be pretty extensive when you get alot of light overdraw, especially if your light volumes are not super tight, but instead divide your screen into smaller tiles, generate a frustum per tile, cull your lights on gpu for each tile frustum, and then light only the fragments in the tile by the final list. Usually done in CS, no overdraw issues at all, overall much faster, but requires modern HW and also can generate very large tile frustums when you have large depth discontinuities per tile. The last part can be mitigated by adding a depth division to your tiles ( use 3d clusters instead of 2d tiles ).

- forward+ is similar, but involves z prepass instead of gbuffer generation, then  pass to generate light lists per tile, same as above, but instead of lighting at that point, you render your scene again and light forward style using the list of lights intersecting the current tile. Allows for material flexibility and easy MSAA support at the cost of another full geo pass.

There are loads more variations of course, but these are maybe a good starting point.

#5124812 how to draw a moving object trace path?

Posted by ATEFred on 19 January 2014 - 04:22 AM

One approach is have attractors on the start and end of your trace creating object ( such as the hilt and tip of a sword ), and create bands of geometry every frame from these attractor positions. Simplest form of that is one band being 2 verts, one for each attractor / end of the band, have a dynamic vertex buffer which you fill up with all active band vertices, from newest to oldest ( or the other way around ). You can use vertex colours for fading it out. Then render it as a tri strip, and job done :)

#5117780 Are GPU constants really constants?

Posted by ATEFred on 18 December 2013 - 03:22 AM

I don't believe this is a GPU limitation, but rather a limitation of apis like open gl es 2 for example, which do not allow you to bind constants to registers, but rather assign them to "slots" associated with your shader program. So if you don't change shader, you don't need to reset them, but everytime you bind a new shader you will need to reset all constants. D3d9/11/ogles3/ogl4/etc. do not have that limitation.

#5114054 Its all about DirectX and OpenGL?

Posted by ATEFred on 03 December 2013 - 09:20 AM

Thanks for your reply Zaoshi!!


So, its a good idea to write a 'wrapper' layer to graphics API if you want to write a cross platform game engine... right? smile.png

Someone who use Playstation SDK and/or some 'Nintendo SDK'  may share some knowledge?? 

Thanks!! biggrin.png


yeah, you want to have your own wrapper around the different graphics apis (libgcm for ps3, libgnm for ps4, dx11, dx for xbob, ogl, etc.)
Coming up with the right level of abstraction can take some time, you need to learn the differences in between the APIs pretty well to get it right, but overall not too difficult. 

As Zaoshi mentioned, the constructs are pretty similar in between all major graphics apis. Some things are still a touch different such as constant management (gles2 vs dx11 /ogl3+ vs consoles), console graphics apis also usually expose alot more than is typically available on pc through dx and ogl.

#5101362 Many dispatch calls vs. higher ThreadGroupCount

Posted by ATEFred on 14 October 2013 - 01:35 PM

Thank you for the reply.


Yep I know about the thread group size. But for example if your shader is configured so that you have [numthreads(64, 1, 1)] as thread group size you could dispatch(1,1,1) (one) of that group and still have a full 64 threads running. As far as I know you should always run a multiple of 64.


So, if we take my first example again:

ID3D11DeviceContext::Dispatch(10,1,1); // dispatches 10 x 64 threads -> 640
for(int i = 0; i < 10; i++)
  ID3D11DeviceContext::Dispatch(1,1,1); // dispatches 10 x 64 threads -> 640

Lets say I am forced to use the second "solution" for some complicated reason. I wonder how much worse it will be. Or what other disadvantage I would have.

if you only have one thread group, whatever it's size, you will probably run into issues when the GPU stalls it's execution waiting for mem fetches / whatever. Typically it would then just start working on another warp/wavefront, and get back to the original one once the data was ready and it's current group stalled for some reason.

If you dispatch many groups one by one, there is no guarantee that they will run in parallel and allow the GPU to jump in between. ( In fact, I am pretty sure than on current PC gpus you are guaranteed that they won't ).

#5093732 Stippled Deferred Translucency

Posted by ATEFred on 13 September 2013 - 02:15 AM

That's similar to clustered shading, but instead of storing a list of lights per cell/texel in the volume, you're storing the (approximate) radiance at that location. I was thinking of using something similar for things where approximate lighting is ok, like smoke particles. Does it work well for you in these cases?

BTW, if you stored the light in each cell as SH, you could extract the dominant light direction and colour from this representation, and use it for some fake specular highlights ;)


That's pretty much it. It works really well for particles and fog with the single directionless approximated value, and it's lightning fast, once it is generated. I'll have to get a video capture done at some point.


atm I use HL2 basis rather than SH (simply because it was easier to prototype, and for alpha geo I only really care about camera facing stuff). Getting dominant direction from SH sounds like a good idea, now sure how computationally expensive it is? I'll need to look it up. 

#5093512 Stippled Deferred Translucency

Posted by ATEFred on 12 September 2013 - 03:32 AM

For alpha lighting, I generate a volume texture locked to the camera with lighting information (warped to match the frustum). Atm I fill this in a CS, similar to the usual CS light culling pass. I store both single non directional approximated lighting value, and a separate set of directional values. This allows me to do either a simple texture fetch to get rough lighting info when applying it (for particles for example), or higher quality directional application with 3 texture fetches and a few ALU ops.

It's a pretty simple system atm, downsides are lower lighting resolution in the distance, and it's not exactly free to generate. (That might be possible to optimize by at least partially generating it on CPU though). Also, no specular atm...

Pros are cheap lighting, even for a huge number of particles, and semi cheap volumetric lighting / light shafts for any shadow casting light in the scene, as I also march through the volume when I apply my directional light volumetric shadows (simple raymarch through shadowmap).

#5092719 ComputeShader Particle System DispatchIndirect

Posted by ATEFred on 09 September 2013 - 08:01 AM


if you plan to expand each vert into quads, it would be 10, 1, 0, 0.


I guess that is a typo and you mean 1,10,0,0?


But still it does not explain why it does not work if I set the initial value to 1,0,0,0 and use CopyStructureCount to update the count...


Hm... alright I am going to install vs 2012. (Since I also was not able to figure the buffer results out via NSight)


Could world both ways: if you wanted 10 verts which you would expand in GS, it would be 10,1,0,0 ( 10verts -> 10 quads * 1 instance of the 10 ). Or you can use the HW instancing. I have noticed differences in performance when generating hundreds of thousands of quads, instancing being slightly slower than single instance with loads of expanded verts.

If you set initial value of 1,0,0,0, you are specifying 1 vertex, 0 instances. So as long as you update the second parameter with your structured count, it should work.

#5092705 ComputeShader Particle System DispatchIndirect

Posted by ATEFred on 09 September 2013 - 06:58 AM

Should be vertex count, instance count, 0,0 (startvertloc and start inst loc).

So lets imagine you have a quad and you want to instance if 10 times, your indirect args buffer should be 4, 10, 0, 0.

if you plan to expand each vert into quads, it would be 10, 1, 0, 0.

(At least that's what I remember off the top of my head, I can check tonight when I get home).

Inspecting the results of the buffer is more annoying than it should, NSight refuses to show it to me. However the VS2012 graphics debugger displays it no probs (finally something it does well :) ) or you can go the way of copying to staging buffer and displaying in your app.

#5083488 Multiple SV_POSITION's for different RT

Posted by ATEFred on 06 August 2013 - 03:06 AM

You can do that through geometry shaders, but not through VS afaik. (expect some performance hit from using the GS to instance your geo n times, once for each output).

(Gs allows you to set one SV_position per triangle stream output)

#5082493 ComputeShader Particle System DispatchIndirect

Posted by ATEFred on 02 August 2013 - 08:40 AM

DrawInstancedIndirect will do what you want to do. Copy the SB size into an indirect args buffer and pass that to the indirect draw method. (I mean copying the size to the specific location of the arguments in the indirect args buffer you want. (control number of verts vs number of instances, etc.)

#5076358 C++ DX API, help me get it?

Posted by ATEFred on 09 July 2013 - 09:21 AM

None of this relates to performance at all


Exceptions have a noticeable performance impact, which is one of the reasons many game projects at least do not use them. From that perspective it makes sense from an API point of view to not rely on them (in addition to C legacy)


I don't know of any wrappers like that you are describing, but you could take just the API abstraction layer of any openly available engine and start with that instead of interfacing directly with d3d if you really wanted to. It would be better to use it natively if you want to learn how it works though I think.