Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 07:20 PM

#5152143 Managing instancing

Posted by MJP on 07 May 2014 - 02:43 PM

We do everything at level build time based on the assets used in that level. The artists pick from different mesh assets from a database, and place instances of them in the level. Then when we build the level we figure out which mesh assets are used, and where the instances are located. Then we build dedicated data structures for each mesh that are optimized for submitted a list of currently visible instances at runtime.

#5151435 400% Raytracing Speed-Up by Re-Projection (Image Warping)

Posted by MJP on 04 May 2014 - 10:46 AM

From your brief description this sounds very much like the temporal antialiasing techniques that are commonly used with rasterization. For reprojecting camera movement you really only need depth per pixel, since that's enough to reconstruct position with high precision. However it's better if you store per-pixel velocity so that you can handle object movement as well (although keep in mind you need to store multiple layers if you want to handle transparency). Another major issue that you have when doing this for antialiasing is that often your reprojection will fail for various reasons. The pixel you're looking for may have been "covered up" the last frame, or the camera may have cut to a comepletely different scene, or there might be something rendered that you didn't track in your position/depth/velocity buffer. Those cases require careful filtering that will exclude non-relevant samples, which generally means taking drop in quality for those pixels for at least that one frame. In your case I would imagine that you have to do the same thing, since spiking to 5x the render time for even a single frame would be very bad.

#5150671 why unreal engine wait the gpu to finish rendering after every Present?

Posted by MJP on 30 April 2014 - 07:57 PM

Yeah that's a common trick used by D3D games to keep the CPU from getting too far ahead of the GPU. Basically you just sync on a query from the previous frame, and it forces the driver to wait for the GPU to catch up.

#5150193 Reconstructing position from depth

Posted by MJP on 28 April 2014 - 04:42 PM

If A and B are matrices, and AB = A * B, then (AB)' = B' * A'.


Also you need to divide the result by  w after transforming by your inverse view * proj matrix.

#5150190 Instance data in Vertex Buffer vs CBuffer+SV_InstanceID

Posted by MJP on 28 April 2014 - 04:31 PM


CBuffers are placed in a special constant memory. When a thread from thread group reads a value form cmem, the value is automatically *broadcasted* to all threads whitin the threadgroup.



Obviously that only applies to certain Nvidia hardware. On recent (GCN) AMD hardware it works differently, where there's not really much difference between a constant buffer or any other kind of buffer. The only differences come from whether a load is shared between a wavefront (scalar loads) or is unique per-thread (vector loads). In the case of instancing it will depend on whether multiple instances get batched into a single wavefront together, which is definitely the case on GCN. I don't know if this happens on Nvidia hardware, but if it does then I would assume that they also fall back to per-thread memory access when loading indexed data from a constant buffer.


Anyway the point is that if you're talking about performance then it really depends on the hardware. And not just AMD vs Nvidia, but also the specific architecture from that IHV.

#5149779 MipMaps as relevant as before?

Posted by MJP on 26 April 2014 - 11:19 PM


Even if this was done in shader code you're losing out on a whole lot.  Mipmapping is a time/space tradeoff.  You exchange 33.333% extra storage and in return get a nice precalculated lookup table that hardware has caches optimized for using.  So all those clock cycles you would have otherwise spent on implementing your own minification algorithm - you can now spend them on nice lighting or other effects instead.

Lanczos3 resampling seems to me implementable in real time. How much do you think it would ameliorate the aliasing? Assuming one is more interested in improving aliasing than lighting or other specialized shaders.


You need to sample at Nyquist Rate in order to avoid aliasing, which in the case of a texture means 2x the texel footprint. This is true regardless of the reconstruction filter being used.

#5149711 MipMaps as relevant as before?

Posted by MJP on 26 April 2014 - 03:17 PM

Of course. Without mipmaps, your textures will have significant aliasing when they're minimized. Anisotropic filtering is not a replacement for mipmaps, it works alongside it.

#5149521 HLSL postion only semantic, no color

Posted by MJP on 25 April 2014 - 06:18 PM

Back in D3D9 the COLOR semantic was special, and was treated as a low-precision value that was clamped to [0, 1]. It was suitable for RGBA colors, but not general-floating point data. Use TEXCOORD0 instead.

#5149520 Understanding the “sampler array index must be a literal expression” error in...

Posted by MJP on 25 April 2014 - 06:11 PM

You can do it quite easily by setting an appropriate viewport. the D3D11_VIEWPORT structure specifies the width and height of the viewport, as well as the X and Y offset. So for instance, let's say you had a 256x256 render target that and you wanted to render to the top left corner. You would set TopLeftX = 0 and TopLeftY = 0, and then set Width and Height to 128. Then if you wanted to render to the top right corner you would keep the same Width and Height, but set TopLeftX = 128. And so on, until you rendered all 4 corners.

#5149251 IBL Problem with consistency using GGX / Anisotropy

Posted by MJP on 24 April 2014 - 06:46 PM

I just use a compute shader to do cubemap preconvolution. It's generally less of a hassle to set up compared to using a pixel shader, since you don't have have to set up any rendering state.


You can certainly generate a diffuse irradiance map by directly convolving the cubemap, but it's a lot faster to project onto 3rd-order spherical harmonics. Projecting onto SH is essentially O(N), and you can then compute diffuse irradiance with an SH dot product. Cubemap convolution is essentially O(N^2). 

#5149237 Screenshot of your biggest success/ tech demo

Posted by MJP on 24 April 2014 - 04:45 PM

This pic from our old E3 trailer is pretty cool, and so is this one from a more recent demo.


Most of my tech demos are pretty boring to look at, but a long time ago I was working on an XNA game in my spare time that I never got close to finishing.

#5149205 what's the precondition of hdr postprocess

Posted by MJP on 24 April 2014 - 01:28 PM

The obvious disadvantage is that if you need destination alpha these formats are no good to you.  It's also the case that packing and unpacking a format such as RGBE costs some extra ALU instructions which need to be weighed against the extra bandwidth required by a full 64-bit FP format (which you can now safely assume is supported by all hardware).


I'll also add that hardware filtering is generally incorrect for these kinds of "packed" formats, although it may not be too noticable depending on the format and the content.

#5148652 Package File Format

Posted by MJP on 21 April 2014 - 11:04 PM

We call our packages "archives". The format is basically a table of contents containing a map of symbols (hashed string asset names) to a struct containing the offset + size of the actual asset data. All of our asset ID's are flat like in Hodgman's setup. The whole archive is compressed using Oodle (compression middleware by RAD Game Tools), and when we load an archive we stream in chunk by chunk asynchronously and pipeline the decompression in parallel. Once that's done we have to do a quick initialization step, where we mostly just fixup pointers in the data structures (on Windows we also create D3D resources in this step, because you have to do this at runtime).  Once this is done the users of the assets can load assets individually by asset ID, which basically just amounts to a binary search through the map and then returning a pointer once the asset is found.


As for loose files vs. packages, we support both for development builds. Building a level always triggers packaging an archive, but when we load an archive we check the current status of the individual assets and load them off disk if we determine that the version on disk is newer. That way you get fast loads by default, but you can still iterate on individual assets if you want to do that.

#5148405 BRDF gone wrong

Posted by MJP on 20 April 2014 - 02:50 PM

The most common cause of NaN in a shader is division by 0. In your case you will get division by 0 whenever NdotL or NdotV is 0, since your denominator has those terms in in it. To make that work with your current setup, you would need to wrap your specular calculations in an if statement that checks if both N dot L and N dot V are greater than 0. However in many cases it's possible to write your code in such a way that there's no chance of division by 0. For instance, take your "implicit G" function. This is meant to cancel out the NdotL * NdotV in the denominator by putting the same terms in the numerator. So in that case, it would be better if you canceled it out in your code by removing the implicitG function and then also removing the N dot L and N dot V from the denominator. 


Also I should point out another common mistake that you're making, which is that you need to multiply your entire BRDF by NdotL. If you look up the definition of the BRDF, you'll find that it's the ratio of lighting scattered towards the eye (which is the value you're computing in your fragment shader) relative to the irradiance incident to the surface. When you're dealing point lights/spot lights/directional lights/etc. the irradiance is equal to LightIntensity * LightAttenuation * Shadowing * NdotL. In your case you don't have shadows and you don't seem to be using an attenuation factor (which is fine), you 'll want to multiply your specular by (uLightColor * NdotL). A lot of people tend to associate the NdotL with diffuse, but really it's not part of the diffuse BRDF. A lambertian diffuse BRDF is actually just a constant value, the NdotL is part of the irradiance calculations.

#5147950 Speed - Texture Lookups and Structured Buffers

Posted by MJP on 18 April 2014 - 11:48 AM

Texture reads are expensive (relatively speaking) because the GPU has to fetch the data from off-chip memory and then wait for that memory to be available. Buffer reads have the same problem, so you're not going to avoid it by switching to buffers. When you're bottlenecked by memory access, the performance will heavility depend on your access patterns with regards to cache. In this regard textures have an advantage, because GPU's usually store textures in a "swizzled" pattern that maps the texels to hardware caches when fetched in a pixel shader. Buffers are typically stored linearly, which won't map as well to pixel shaders.