Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 06:33 PM

#5183208 Reading a texture inside if statement

Posted by MJP on 26 September 2014 - 03:36 PM

Most GPU's are really just SIMD machines. What this means for you, is that the GPU will effectively execute the same instruction for multiple "lanes" simultaneously (usually referred to as "threads"), with each lane having different data. So if you do write code that does "y = x + z", the GPU will execute an add instruction for multiple lanes simultaneously, where x and z might have different values. 

This makes branching complicated. Take a simple branch like this:

if(x > 1.0f)
    x = x * 10.0f;

On a GPU, multiple nearby SIMD lanes will be executing these instructions with different values of x. So for some lanes x might be greater than 1.0, and for some it might not. Since all of the lanes execute the same instructions, the GPU can't just multiply the value by 10.0. Instead what it does is it will evaluate the condition for all lanes, and use that to set a per-lane mask. This mask then controls whether any executed instructions actually have any effect. So the GPU will execute the multiply instruction, but it won't actually do anything unless the mask bit is set for that lane. This ends up giving you the same result as if you actually excecuted the branching behavior for each lane individually. The only time the GPU doesn't have to mess with this masking business is if you all lanes take the same path, resulting in a mask of all 0's or all 1's. In this case if the mask was all 0's, then the GPU could actually skip the multiply instruction.

So now we get to sampling and derivatives. In order to pick which mip level to use, GPU's look at nearby pixels in a 2x2 quad and calculate the difference in texture coordinates. This difference gives you the partial screen-space derivatives. If the difference is very small relative to the texture size, then the GPU can use a higher-resolution mip level. If the difference is large, it uses a low-resolution mip level. In practice, this "looking at nearby pixels" is done by packing these nearby pixels in nearby SIMD lanes, and executing special instructions that allow one lane to get the value of something for another lane. Most of the time this is perfectly okay, since the GPU always executes the same instructions for all lanes. Where it goes awry, is with the conditional masking that I explained above. If one lane is masked off but another is masked off, the masked off lane won't be able to give it's neighbor the value of its texture coordinate. Without that value it can't compute the partial derivatives, and so it can't sample with mipmaps.

The typical workaround is to compute the derivatives manually outside of the branch, and then pass them into a version of sample() that takes explicit derivatives. Or if you don't need automatic mipmap selection, you can use a version of sample() that lets you pick the mip level yourself.

#5183063 Single Channel Blend Possible?

Posted by MJP on 26 September 2014 - 12:25 AM

D3D11 blend modes support different operations for the alpha channel vs. the RGB channel, so it would be possible to do it if you store your translucency map in the alpha channel fo your texture.

#5181748 DX12 - Documentation / Tutorials?

Posted by MJP on 20 September 2014 - 12:50 PM

Conservative rasterization should allow for some film-quality AA solutions to be developed. It would be possible to store per-pixel, a list of every triangle that in any way intersects with that pixel's bounds. You could then resolve that list in post, sorting and clipping them against each other to get the actual area covered by each primitive.


Sure, although you'd have to sort the list of triangles by depth in order to get the correct result. You'd also have to forego standard z buffering.

#5181620 MSAA and offscreen render target issue

Posted by MJP on 19 September 2014 - 03:04 PM

You should still be able to get the debug messages by using DebugView to view the native debugger output stream.

#5181598 MSAA and offscreen render target issue

Posted by MJP on 19 September 2014 - 01:27 PM

You can't sample from an MSAA texture in a shader using a regular Texture2D object. If you do this, the debug runtimes will report an error when you draw using that shader. So your first step should be to turn on the debug runtimes if you don't currently have them enabled, since it's an invaluable tool for solving bugs like these. To turn it on, pass D3D11_CREATE_DEVICE_DEBUG as the Flags parameter of D3D11CreateDevice.

Now to make your setup work, the easiest thing to do is perform an MSAA resolve to produce an antialiased result from your MSAA texture. To do this, you'll want to create another 2D texture resource that's the same size and format as your MSAA texture resource, except with a sample count of 1. Then during your render loop, once you've finished rendering to your MSAA render target you can resolve the results to your non-MSAA texture using ID3D11DeviceContext::ResolveSubresource. Then you can have your shader sample normally from the non-MSAA texture, and it will work correctly. 

#5181476 HD remakes, what does it involve?

Posted by MJP on 19 September 2014 - 01:43 AM

I worked on the PS3 port of the God of War PSP games. For that port we didn't use a new engine or add any additional graphical features, we just straight-ported the PSP engine to PS3. The artist improved some of the models, up-rezed a bunch of textures (a lot of them automatically through photoshop scripts), and switch out the Kratos gameplay model for the hi-quality model used in cinematics. The bulk of the work involved was just programmers porting the engine bit by bit until it ran on PS3, and then coming up with ways to get all of the graphics functionality working on the PS3 GPU. A *lot* of looking at the PSP GPU docs to figure out what the hell it did when some obscure feature was enabled (texture doubling, argh!). The rest was just bug fixing (a lot of data bugs from endianness issuess), and optimization. There was also a large amount of work put into re-rendering our old cinematics in HD, so that we weren't just playing upscaled 480x272 videos.

It really depends though on the project...some remakes take things a lot further and add lots of new graphical features. But I would say most are similar to what we did, we're it's mostly a straight port.

#5181130 OIT (Order Independent Transparency)

Posted by MJP on 17 September 2014 - 03:50 PM

There's no real "best" here. Just like any other engineering problem there's different solutions, with different trade-offs. The Intel technique has good results, but requires a recent Intel GPU which usually makes it a deal-breaker. Blended OIT can work on almost any hardware, but can produce sub-par results for certain scenes and generally relies on per-scene tweaking of the blending weights. What's better for you will depend on what platform and hardware you're targeting, as well as the kind of content you'll have in your game or app.

#5180866 GPU Perfstudio error

Posted by MJP on 16 September 2014 - 06:13 PM

You should check out Renderdoc, which is an open-source replacement for PIX. It's very good!

#5180835 LOD in modern games

Posted by MJP on 16 September 2014 - 03:35 PM

If possible could you tell me which titles? I'd like to see some videos.
Progressive meshes from Huges Hoppe? I've seen it quoted a few places as well, It always seemed to me they actually used it.
Is Simplygon and other similar solutions that hit and miss requiring artist intervention?
Out of curiosity I seem to remember the 360 having some hardware tessellation, was it ever used in any titles?
Thanks for your help.


MLB: The Show used it on PS3. I believe they released some material about it, although this presentation is all I can find at the moment. I think Lair was using something similar as well. Insomniac was using it for a while with Ratchet and Clank on PS3, but dropped it before release. They have some slides here.


The problem with LOD is that you want something that has lower polygon counts but "still looks good", and that last part is very subjective. This is why you usually want an artist involved: so that they can make sure that the LOD's being generated are still visually pleasing. It's not always necessary, but ideally you'll want human feedback in the process as much as possible.


Xbox 360 did indeed have tessellation hardware. I don't believe it ever had widespread use outside of a few special cases. 

#5180538 LOD in modern games

Posted by MJP on 15 September 2014 - 03:13 PM

In the general case, hand done LODs of models at three or four distances, maybe plus an impostor billboard of needed. Simple alpha fade between them. Nobody is bothering with continuous LOD or other automated systems, as they have several flaws:
* Expensive on CPU
* Require dynamic GPU buffers
* Pretty much ruin any chance of instancing/batching - this is a huge problem
* The vast majority of methods are not able to cope with tex coords, normals, tangent spaces, and all the other real world stuff verts actually use. This becomes a huge n-dim optimization problem.
* In conjunction with the above, LOD changes create severe popping artifacts due to discontinuities in the various spaces
* Vertex processing isn't the limiting factor in nearly every system
The primary use of LOD right now is to prevent small triangles from being rasterized. Rasterization of tiny (less than 8x8 pixels in particular) triangles is catastrophically slow on modern hardware and negatively impacts overall fragment shading efficiency. There are a few special cases where more sophisticated LOD techniques are useful, notably (and almost exclusively) terrain.


Well it's not really true that nobody used continuous LOD systems (I know of a few games that did), but it definitely wasn't popular. In fact I've seen Progressive Meshes mentioned as one of the most frequently-cited but least-used techniques in graphics, which is pretty funny.

In general discrete LOD levels is definitely still the normal. Simplygon is becoming pretty popular as a tool for automatically generating LODs. It's pretty good, although you still generally want artist intervention. 

Tessellation used to be frequently mentioned as the solution to all LOD problems, but in reality it's found little use in big-budget games. The Call of Duty series seems to be the one notable exception.

#5180499 StructuredBuffer and matrix layout

Posted by MJP on 15 September 2014 - 11:46 AM

Yes, I've seen that behavior as well when using compute shaders. My workaround was to use a StructuredBuffer<float4> instead and use 4 loads. In terms of the compiled assembly this isn't really any less efficient, since a float4x4 by will get split into 4 loads when compiled to assembly (you can only load 4 DWORDs at a time from structured buffers).

#5180197 Using Texture2DMS in shader

Posted by MJP on 14 September 2014 - 12:59 AM

Like I said previously, Texture2DMS.Load takes integer pixel coordinates. You're passing the same [0,1] texture coordinates that you use for Texture2D.Sample, which isn't going to work. You can obtain the current pixel coordinate in a pixel shader by using the SV_Position semantic.


You're also only sampling one subsample from your Texture2DMS, and the index you're passing is too high. If you have 8 subsamples for your render target, then the valid indices that you can pass are 0-7. Try using a for loop that goes from 0-7, and averaging the result from all subsamples.

#5179778 Using Texture2DMS in shader

Posted by MJP on 11 September 2014 - 11:59 PM

You can't use any of the "Sample" functions on a Texture2DMS, since MSAA textures don't support filtering or any of the "normal" texture operations. The only thing you can with them is load a single subsample at a time, by specifying integer pixel coordinates as well as a subsample index. See the documentation for more info.


For your case, if you wanted to resolve the texture on-the-fly in the pixel shader you can just load all of your subsamples for a given pixel and then average the result. Also, note that for ps_5_0 you're not required to declare the Texture2DMS with the number of subsamples as a template parameter. You only had to do this for ps_4_0.

#5179741 Using Texture2DMS in shader

Posted by MJP on 11 September 2014 - 08:15 PM

You'll want to resolve your MSAA texture before you can sample from it using SpriteBatch. Just create another render target texture that has the same dimensions/format as your MSAA texture but with 0 multisamples, and then resolve to that before using it with SpriteBatch.

#5179419 Blur shader

Posted by MJP on 10 September 2014 - 03:12 PM

1. Samplers are generally only useful if you need filtering, or want special clamp/wrap/border addressing modes. For your case, not using a sampler is easier since you always want to load at exact texel locations.

2. Invalid addresses will return 0's when loading from textures or buffers. If that's not desirable, you should clamp your address manually to [0, textureSize - 1].