Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 05:31 PM

#5238487 Update a specific region of the render target?

Posted by MJP on 05 July 2015 - 04:51 PM

"fillrate" is an older term that typically refers to the maximum rate at which the hardware can output to a render target. It's not particularly useful for modern hardware, since you're almost never using a simple pass-through pixel shader anymore. Therefore it's commonly used to refer to bandwidth and ROP resources that can be bottlenecked by rendering lots of "dumb" pixels, especially if alpha blending is used.

As for the scissor test, all hardware that I'm familar with will apply the scissor during rasterization so that it can cull pixels before they'r ever shaded, since there would be no point in executing the pixel shader for a pixel that would just be discarded anyway. In fact in D3D10 they even moved the scissor test into the logical "Rasterizer Stage", although the logical pipeline still doesn't necessarily dictate when something happens on real hardware.

#5238372 Update a specific region of the render target?

Posted by MJP on 04 July 2015 - 01:11 PM

Screen Space quad sounds right to me.  A full screen quad with a scissor rect would waste alot of fillrate.  I suppose you could use a different viewport as well.

It wouldn't waste any fillrate at all, since the scissor culls pixels before the pixel shader even executes.

#5236844 Selecting texture formats

Posted by MJP on 25 June 2015 - 06:44 PM

The texture-fetch hardware converts from the in-memory representation into floats, on demand as the shader executes sampling instructions.
No matter whether it's 8bit, 8bit+sRGB, DXT/BC, 16bit, half-float... they all end up the same in the shader.

Small correction here: the hardware will convert UNORM, SNORM, FLOAT, and SRGB formats to single-precision floats when sampled by a shader. UINT and SINT formats are always interpreted as integers, and will be read as a 32-bit int or uint in the shader. Since they're always interpreted as integers, POINT is the only valid filtering mode for the integer formats.

#5236626 Best technique for lighting

Posted by MJP on 24 June 2015 - 03:29 PM

Shadow-casting lights aren't incompatible with a tile based deferred renderering, you just need to have all of your shadow maps available at once in your tiled lighting compute shader. The simplest way to do this is to make a texture array for all of your shadow maps, and render all of your shadow maps before running your tiled lighting compute shader. Then when you run the compute shader, you just use the light's shadow map index to look up the shadowmap texture from your texture array. The downside of this approach is that you now need to allocate enough memory for all of your shadow maps simultaneously, as opposed to re-using the same shadow map texture for every light. For the last game I worked on we just capped our renderer to 16 shadow-casting spotlights per frame, and made a fixed-size texture array with 16 elements.

If you need to save memory or you really need lots of shadow maps, you can do a hybrid approach. For example you could make your texture array hold 8 shadow maps, and then you could render 8 shadows at a time. So it would go like this:

- Render shadow maps for lights 0 through 7
- Run tiled lighting compute shader for lights 0 through 7
- Render shadow maps for lights 8 through 15
- Run tiled lighting compute shader for lights 8 through 15
- etc.

#5235983 What performance will AMD's HBM bring for graphics programmers?

Posted by MJP on 21 June 2015 - 01:32 AM

Unfortunately , the windows Kernel does have to patch your command buffers, depending on the memory hardware being used under the hood, for stability/visualization/security purposes. Consoles don't have this problem (they can do user-mode submission), but only because a lot more trust is placed on the developers to not undermine the operating system.
Newer hardware with better memory and virtualization systems might be able to fix this issue.

With D3D12/WDDM 2.0 the address patching actually isn't necessary anymore, since GPU's can use a per-process virtual address space that's managed the OS. However that still doesn't mean that user mode code is allowed to poke at the low-level GPU registers for submitting command buffers. With D3D12/Mantle/Vulkan user-mode code can initiate the submission of a command buffer, but it's still ultimately going to be kernel-mode code that mucks with the registers. I don't see that changing anytime soon, for a lot of reasons. If you can't submit command buffers quickly, then I suppose you're left with trying to do something like JTS patching on PS3 where you'd initially submit a command buffer with stalls in it, and then overwrite the stalls with new commands. This isn't really ideal, and would need API support to overcome all of the low-level hardware issues.

I have more thoughts on the idea of low-latency rendering, but I think that they will have to wait until tomorrow.

#5235982 Uses of curves in graphics?

Posted by MJP on 21 June 2015 - 01:17 AM

Our curves had arbitrary numbers of control points, and so it was simpler for the particle simulation shader to just have the curves baked down into textures. The shader didn't have to care how many control points there were or even what kind of spline was used, it could just fetch and that was it.

As for performance, it's not necessarily so straightforward. When the curve is baked into textures you just end up with one texture fetch per particle attribute. With arbitrary control points you would need to perform N memory accesses inside of a dynamic loop, which can be slow for per-thread latency since you can't pipeline memory access as well in a dynamic loop. You also have the issue that every warp/wavefront will need to iterate for the worst case of all threads in that group, and so you may end up with some wasted work if you have particles from different systems packed into the same warp/wavefront. For us though, the performance didn't even matter in the end: we used async compute for the particle simulation and it just got totally absorbed by the normal rendering workload.

#5235815 linking to a static library which references other static libraries?

Posted by MJP on 20 June 2015 - 01:09 AM

Static libs aren't actually linked, in fact the linker (link.exe) isn't even used at all. Instead the librarian (lib.exe) is used, and all that does is take a bunch of .obj files and package them all together into a .lib file. All of the linking and resolving of symbols isn't actually done until you link the static lib into an exe or DLL, which is why you need to list your engine's dependencies in the linker options of your game.

There are 3 ways that I know of to get around manually listing the dependencies:

1. Build your engine as a DLL instead of a static library. If you do this, then your static lib dependencies will be linked into the DLL and your game won't have to put them in the linker settings. One downside of this is that can potentially end up with the same code being linked into both your exe and your DLL, due to both of them referencing the same static lib.

2. You can use the "Additional Dependencies" setting of the Librarian to tell lib.exe to include additional .lib or .o files when creating your static library. The major downside to doing this is that you can end up with linker warnings if the same symbol is defined multiple times. This is bad, because you then have ambiguity as to which symbol should get linked into the exe/DLL.

3. Create a property sheet for your library, and add it to any projects that link to it. In that property sheet you can then specify any necessary linker inputs (including the lib itself)

#5235791 What performance will AMD's HBM bring for graphics programmers?

Posted by MJP on 19 June 2015 - 06:09 PM

Yes, that's exactly what I'm talking about.  The problem isn't memory transfers, whether or not they happen, and how fast they are if they do.  The problem is that the CPU and GPU are two separate processors that operate asynchronously.  If you have one frame of latency and you need to do a readback every frame, the fastest memory transfer in the world (or no memory transfer) won't help you; you'll still halve your framerate.

With current APIs, sure. With ones designed around shared memory not necessarily. On a shared memory system, there's no reason why after issuing a command it couldn't be running on the GPU nanoseconds (or at the very least microseconds) later (provided it wasn't busy of course). With that level of fine-grain control you could switch back and forth between CPU and GPU easily multiple times per frame.

Even with a shared memory architecture it doesn't mean that you can suddenly run the CPU and GPU in lockstep with no consequences. Or at least, certainly not in the general case of issuing arbitrary rendering commands. What happens when the CPU issues a draw command that takes 3 milliseconds on the GPU? Does the CPU now sit around for 3ms waiting for the GPU to finish? It also totally breaks the concurrency model exposed by D3D12/Mantle/Vulkan, which are all based around the idea of different threads writing commands to separate command buffers that are later submitted in batches. On top of that, the GPU hardware that I'm familiar with is very much built around this submission model, and requires kernel-level access to privileged registers in order to submit command buffers. So it's certainly not something you'd want to do after draw or dispatch call.

Obviously these problems aren't insurmountable, but I think at the very least you would need a much tighter level of integration between CPU and GPU for the kind of generalized low-latency submission that you're talking about. Currently the only way to get anywhere close to that is to use async compute on AMD GPU's, which is specifically designed to let you submit small command buffers with compute jobs with minimal latency. With shared memory and careful cache management it is definitely possible to get your asysnc compute results back pretty quickly, but that's only going to work well for a certain category of short-running tasks that don't need a lot of GPU resources to execute.

#5235779 [Solved]IBL(ggx) problem

Posted by MJP on 19 June 2015 - 04:29 PM

The split-sum approximation doesn't fix this issue. The error is actually caused by the assumption that N = V = R for pre-integrating the environment map, which causes incorrect weighting at glancing angles. If you look at Figure 4 in the course notes, you'll see that the results when using a cubemap are pretty similar to what you're getting.

#5235771 How does games manage different resolutions?

Posted by MJP on 19 June 2015 - 04:07 PM

1. Call IDXGIOutput::GetDisplayModeList. It will give you the list of possible resolutions and refresh rates that you can use for creating your fullscreen swapchain on that particular monitor.

2. Many games actually don't handle multi-GPU and multi-monitor very well: they'll just default to the first adapter and use the first output on that adapter. What I would probably do is have either a configuration dialog or command line options that let advanced users pick which GPU to use, and then pick which monitor to use. If they just use default options, then you can just do what other games do and use the first adapter and the first output from that adapter. This will always be the "primary" display for Windows, so it's usually not a bad choice. If the user picks an adapter that has no outputs, then you can either try to gracefully fall back to a different adapter that does have outputs, our you can just output an error message.

3. Like I said above, I would probably just default to the first output and then provide an advanced settings menu for letting users choose a different display. You could try and be smart by looking at all displays and picking the biggest, but I think that defaulting to the primary display is still a sensible choice.

4. For windowed mode you just find out the size of the window's client area, and then use that for your backbuffer size when creating your swap chain. Or alternatively, just specify 0 as the width and height when creating your swap chain and it will automatically use the client area size. To handle resizing, you just need to check for WM_SIZE messages from your window and then call IDXGISwapChain::ResizeBuffers to resize the back buffer. Once again you can either ask the window for its client area size, or just pass 0 to let DXGI do that for you. Also if it's not convenient to handle window messages, you can instead just ask the window for its client area during every tick of your update loop, and then call ResizeBuffers if the size is different from last frame.

#5235768 Backbuffer resolution scale filter

Posted by MJP on 19 June 2015 - 03:48 PM

Anything that happens after rendering to the backbuffer is out of your control. In fact you don't even know if it's the GPU that's upscaling to native resolution or the monitor itself (typically this is an option in the driver control panel).

If you want to maintain the "blocky" look, then I would suggest that you just always go to fullscreen at the monitor's native resolution. Typically this is the highest resolution mode given by IDXGIOutput::GetDisplayModeList, but you can also ask the output for the current desktop resolution using IDXGIOutput::GetDesc.

#5235170 Uses of curves in graphics?

Posted by MJP on 16 June 2015 - 12:47 PM

We used tons of curves for driving the behavior of our GPU-simulated particles. How they spawned, how the moved, what color they were, etc. For the most part we didn't really evaluate curves directly on the GPU, we would instead quantize the curves into a big lookup texture that contained the quantized values for all curves being used by the active particle systems. We did similar things for having the artists specify distance and height-based falloffs for fog effects.

I'm not sure if this is exactly what you're looking for, but you should take a look into tessellation and subdivision surfaces. These algorithms generally work by treating the mesh positions as control points on a spline, and then evaluating the curve to generate intermediate vertices. The Open Subdiv project has tons of info about doing this on a GPU using compute shaders and tessellation hardware.

#5235164 Can i resolve a multisampled depth buffer into a texture?

Posted by MJP on 16 June 2015 - 12:34 PM

The part you're missing here is that your shadow map isn't lined up 1:1 with your screen. Instead it's projected onto a frustum that's oriented with your light's position, with the resulting shadows being projected onto your screen. As a result, you can end up with projective aliasing artifacts where the projection of the shadow-casting light causes the shadow map resolution to be less than the sampling rate of your back buffer. You'll see this as very jaggies in your shadow, where the jagged edges are actually bigger than a pixel on your screen. Increasing the size of your shadow map will increase the shadow map resolution, which will in turn increase the relative sampling rate of your shadow map depth vs. your screen pixel sampling rate. The end result will be that the shadows will look less jagged.

Since a shadow map projection isn't related to your screen projection, it's common to pick a resolution that's not at all tied to your screen resolution. Typically you'll just pick a size like 512x512 or 1024x1024.

#5234984 HLSL questions

Posted by MJP on 15 June 2015 - 06:30 PM

In case it's not clear from the assembly, what's happening in your version of the shader (the one with gOutput) is that the compiler is generating an "immediate" constant buffer for the shader to use. In HLSL assembly there's no stack and you don't have the ability to dynamically index into registers, so the compiler has to place your array into an automatically-generated constant buffer so that it can be indexed by a dynamic value (SV_VertexID). For cases where the index is known at compile time (say, when unrolling a loop with a fixed iteration count), the compiler can instead just directly embed the array values into the assembly instructions which avoids the need to load the values from memory.

#5234983 Can i resolve a multisampled depth buffer into a texture?

Posted by MJP on 15 June 2015 - 06:15 PM

If all you want to do is increase the resolution of the shadow map, then you really should just increase the resolution of the shadow map. Multisampling isn't really useful for your particular situation.