Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Mar 2007
Offline Last Active Yesterday, 07:00 PM

#5226782 How 3D engines manage so many textures without running out of VRAM?

Posted by MJP on 01 May 2015 - 07:21 PM

So lets say you have a 100 models. each model have a 4k texture. Now each texture size is roughly around 67MB (4096x4096x32). So if we render all 100 models with 4K textures that would have a size of 6.3GB (100 x 67MB). Most video cards have 1-2GB of VRAM. So how do engines deal with that amount of data?

The driver does it for you in DX11 and GL.
When you submit a draw call (glDraw*) the driver will check what inputs is needed by the program, and for each texture, it will check if it's resident (ie in gpu accessible memory) and if not, it will fix it.
If there is not enough memory, driver will typically evict unused (using for instance a least recently used table) data in vram that is copying it to main memory (if it's not already there) and use the freed memory for texture that needs it.
Any texture in main memory may be paged out. The algorithm used to decide which texture to replace has a strong impact on performance obviously.

I will note that on Windows there is still a finite amount of system memory that it will use for paging out GPU memory, and so you can still exhaust your resources if you have too much data. Also in general, you really want to avoid having the driver page things in and out mid-frame. It's a great way to kill your performance in unpredictable ways.

#5225757 Irradiance environment map and visibility

Posted by MJP on 26 April 2015 - 10:10 PM

It is possible to approximate the "proper" integral if you have both signals represented using a higher-order basis, such as spherical harmonics. With SH you normally work with "double products", where you're computing Integral(A * B). The double product is pretty simple, since it basically boils down to a dot product of the coefficients. However in your case we already have 2 terms without the visibility, which means that you need to evaluate an SH "triple product" if you want to add visibility into the mix. Triple products are doable, but more complicated and more expensive than a double product. There have been some papers that used triple products for the purpose of pre-computing light transport through a scene (PRT) and storing it as SH, so you may find some info on how to evaluate a triple product efficiently.

It's also possible to do this with spherical gaussians, which have an analytical form for a vector product. Basically you can do L (cross) V which yields a new spherical gaussian lobe, which can then be convolved with an SG approximation of the cosine term by using a scalar product. For more info on this, I would suggest looking through the paper and presentation from "All-Frequency Rendering of Dynamic, Spatially-Varying Reflectance", which you can find here. In their paper they store visibility using signed distance fields, but it would also be possible to represent the occlusion directly as a set of SG's.

#5225675 Irradiance environment map and visibility

Posted by MJP on 26 April 2015 - 11:49 AM

So basically you have two signals: incoming lighting (L) and visibility (V). What you'd really want to do with this is "Integral(L * V * cos(thetaN))", but currently all you have is "Integral(L * cos(thetaN))" and "V". Probably the simplest thing to do is to also pre-integrate your visibility mask with the same cosine term, so that you end up with "Integral(L * cos(thetaN)) * Integral(V * cos(thetaN))". This is essentially the exact definition of ambient occlusion. It's not correct, but it's a decent approximation to our first equation (which wasn't really "correct" in the first place, since it ignores bounce lighting off the occluding surfaces).

#5224260 Localizing image based reflections (issues / questions)

Posted by MJP on 18 April 2015 - 07:42 PM

One of the (many) problems of pre-filtered, parallax-corrected cubemaps is that applying the pre-filtering before the parallax correction will result in errors, even if the proxy geo is a perfect fit for the actual surfaces being captured in the cubemap. I believe the DICE presentation touched on this in the context of specular reflections, but the problems is much worse for diffuse/irradiance since the filter kernel is much wider compared to a typical specular BRDF.

Honestly though, a cubemap is really overkill for storing irradiance at a sample point. You can store a pretty good approximation of irradiance in fairly small footprint by using something like spherical harmonics, SRBF's, or even something simple like Valve's ambient cube format (which is basically a 1x1 cubemap). If you do this, then you can store diffuse samples at a much higher density relative to your specular probes, which will mitigate issues caused by incorrect parallax. You can even use your dense irradiance samples to improve the quality of your specular probes by normalizing the specular intensity in a manner similar to what was presented by Activision at SIGGRAPH 2013.

#5223822 Choose texture slot to sample by number

Posted by MJP on 16 April 2015 - 07:45 PM

Texture arrays are the only way to dynamically index textures in D3D11. D3D12 will support dynamically indexing into arrays of separate texture SRV's, which is also supported by Nvidia's bindless texture extensions for GL (I imagine that Vulcan will support it as well). However, you should keep in mind that current hardware that supports this functionality will suffer from reduced performance for cases where the index is divergent among warps/wavefronts.

#5223499 How to set HLSL constant table without D3DX in D3D9

Posted by MJP on 15 April 2015 - 01:07 PM

That is essentially what D3DXFont does under the hood.

As an alternative, you can pre-generate a "font texture" that contains all of the glyphs that you need. You can then render your strings by drawing a one quad per character, with each quad having the appropriate texture coordinates for whichever character that they represent. There are even free tools available for generating these font textures:




#5223494 Constant buffer for shaders in SharpDX

Posted by MJP on 15 April 2015 - 12:56 PM

In HLSL, you declare a constant buffer and define the layout of data within that constant buffer. So if you were to have this in your shader:
cbuffer MyConstants
    float4x4 ViewTransform;
    float3 ObjectPosition;
Your shader would then expect a constant buffer to be bound that has the same layout. This means it would expect the buffer to start with 16 floats for the matrix, followed immediately by 3 floats for object position.

The idea here is that you specify your data layout in HLSL, and then in your application you would make sure to fill up your constant buffer resource with data using the same layout. There are many ways to do this, but probably the simplest is to do what Eric recommended in his post: make a matching C# struct with the same data layout, fill it with appropriate values, and then use either MapSubresource or UpdateResource to fill the buffer resource with the contents of your struct. For the simple constant buffer in my example, making a matching C# struct should be pretty simple:
public struct MyConstants
    public Matrix ViewTransform;
    public Vector3 ObjectPosition;
The one thing you really need to watch out for is that the packing rules for constant buffers can be a little weird. Most notably, if a vector type crosses a 16-byte boundary, then it will move the vector so that it sits on the next 16-byte alignment. As an example, let's add another float3 to our constant buffer layout:
cbuffer MyConstants
    float4x4 ViewTransform;
    float3 ObjectPosition;
    float3 LightPosition;
So the float4x4 is 64 bytes in size, and a float3 is 12 bytes. So you might think that "LightPosition" would be located at an offset of (64+12) = 76 bytes. However, this is not the case due to the alignment rule that I just mentioned. Since "LightPosition" would straddle a 16-byte boundary, it will give moved up 4 bytes so that it's located at an offset of 80 bytes. To make sure that our C# struct has the same data layout, LayoutKind.Explicit can be used:
public struct MyConstants
    public Matrix ViewTransform;

    public Vector3 ObjectPosition;

    public Vector3 LightPosition;
As an alternative, you can also just insert padding variables into your struct:
public struct MyConstants
    public Matrix ViewTransform;
    public Vector3 ObjectPosition;
    public Uint32 Padding;
    public Vector3 LightPosition;

#5222476 What to do when forward vector equals up vector

Posted by MJP on 10 April 2015 - 12:51 PM

Using an "up" vector to generate a basis is only necessary when you don't have a basis to begin with. In other words, you should only need to do it when all you have is one "forward" direction and nothing else. This typically doesn't need to be the case for a camera: it's usually easy to hold onto a full basis for your camera, and then there's no need for generating perpendicular vectors. How you would implement this depends on how your camera works, but if you share some details about how you handle camera orientation then I can probably help you modify it so that you don't need an "up" vector.

#5221771 Difference between SDSM and PSSM?

Posted by MJP on 06 April 2015 - 10:05 PM

This only happens when your shadow casters are clipped, due to not lying within the Z extents of your shadow projection. The common way to fix this is to use "pancaking", where you force force outlying triangles to get rasterized with depth = 0.0. It can be done easily in the vertex shader by forcing the output Z component to be >= 0, or in D3D11 you can do it by creating a rasterizer state with "DepthClipEnable" set to "false".

#5221565 Questions about compute shader

Posted by MJP on 05 April 2015 - 11:38 PM

1. No. It's specifying that each thread group should have 20x1 threads in it, so it will always execute with at least 20 threads. The actual number of threads executed depends on the number of thread groups that are dispatched, which is a parameter of the Dispatch() function called by the CPU.


2. SV_DispatchThreadID gives you GroupID * ThreadGroupSize + GroupThreadID. In this case it's being used to assign every thread to a particular index of the input and output buffers. So for instance thread 7 in group #5 would read and write using index (5 * 20 + 7) = 107.

#5221564 Difference between SDSM and PSSM?

Posted by MJP on 05 April 2015 - 11:32 PM

Is it really needed to do the whole SDSM steps or just the depth reduction pass is enough ?
I actually just do the depth reduction pass.


It does improve quality, if that's what your asking. Where it really helps is cases where the visible surfaces are very constrained along the view-space X and Y axis. For instance, consider looking down a narrow alleyway. I believe the original paper or presentation had some images demonstrating this case.

#5221040 BC6H Viewers?

Posted by MJP on 02 April 2015 - 06:35 PM

Are they DDS files? Visual studio 2012 and 2013 can view DDS files, and it will display BC6H/BC7. However it will only display the [0,1] range of BC6H textures, and it will clip the rest (there's no exposure or tone mapping controls).


RenderDoc can view DDS files if you just drag them onto the window. I think it has an exposure slider, which is nice.


HDRSee can open DDS files using FreeImage, but I don't know if the version of FreeImage that they use supports the newer DDS formats.

#5221007 Difference between SDSM and PSSM?

Posted by MJP on 02 April 2015 - 03:37 PM

If you use the GPU to produce the splits, do you have to read back the values on the CPU afterwards? So the CPU has to stall for the GPU to finish before continuing?


Yes, you need to read back the results if you want to do split setup, culling, and draw setup on the CPU. If you just wait immediately you'll get a stall on the CPU where it waits for the GPU to finish processing pending commands, and that will immediately be followed by a stall on the GPU (since its run out of commands to execute). Adding latency can allow you to avoid the stall, but if you decide to do a full frame of latency your split setup will be out-of-date. This can result in artifacts if the splits are constrained too much for the visible pixels, which can happen due to either camera movement or object movement. It's possible to compute instantaneous velocity and use that to predict a particular pixel's position for the next frame, which can be used during the depth buffer analysis to compute bounds that (hopefully) work for the next frame. This can work pretty well for simple camera movement, but will still break with more complex movement or for camera teleportation.


With modern API's it's possible to actually do all of the setup and culling on the CPU, which eliminates the need for reading back the data. It can still be awkward and limiting depending on the API and extensions used, but it's definitely doable. In fact the demo that I made for this article has a GPU-only implementation that makes use of a compute shader batching system and DrawIndirect, so you can look at that if you want. It's not really an implementation that would scale up to a real-world scenario (unless your game is pretty simple), but it might give you some ideas. DX12, Vulkan, and some of the newer GL extensions have even more functionality available for GPU-driven rendering, which should allow for more flexible implementations.

#5220835 Direct3D 12 documentation is now public

Posted by MJP on 01 April 2015 - 06:32 PM

A 2D texture will have N subresources, where N = ArraySlices * Mips. To get the subresource for a given mip level of a particular array slice, you do N =  (NumMips * ArraySlice) + MipLevel. There's actually a D3D11 function that does this for you.

#5220760 Difference between SDSM and PSSM?

Posted by MJP on 01 April 2015 - 12:19 PM

SDSM works by rasterizing to a depth buffer on the GPU, and then using a compute shader to analyze the depth samples in order to come up with optimal projects for the shadow map splits. This can me much more accurate than using object bounding boxes, especially when consider that it will handle occlusion.

The SDSM paper demo proposes a few different technique. The simplest one is to just compute the min and max Z value visible to the camera using the depth buffer, which you can then use to compute optimal split distances. It also proposes taking things a step further by transforming every depth buffer position into the local space of the directional light, and then fitting a tight AABB per split.