Jump to content

  • Log In with Google      Sign In   
  • Create Account


MJP

Member Since 29 Mar 2007
Offline Last Active Today, 11:57 AM
*****

#4942932 Problem with saving a render target view's content to a texture

Posted by MJP on 24 May 2012 - 09:58 AM

In your "ModifyDeviceSettings" callback, set the "CreateFlags" member of the D3D11 settings to D3D11_CREATE_DEVICE_DEBUG


#4942796 Problem with saving a render target view's content to a texture

Posted by MJP on 23 May 2012 - 11:25 PM

The CopyResource call is probably failing because the backbuffer isn't the same format as the texture you're creating. You should pass the DEBUG flag when creating the device, and then it will tell you when something goes wrong.


#4942733 template texture object as function parameter in sm4+

Posted by MJP on 23 May 2012 - 06:19 PM

You can't template functions in HLSL. The best that you could do is use macros, but that would be pretty ugly. So I'm afraid that you're out of luck on this one.


#4941976 EVSM artifacts at depth discontinuities in the gbuffer

Posted by MJP on 21 May 2012 - 12:06 PM

In the past I've done something like this for decal projection:

float lod = max(textureSize.x, textureSize.y) * length(eyePos - surfacePos);
lod /= (lerp(displaySize.x, displaySize.y, 0.5f) * saturate(dot(normalize(eyePos - surfacePos), normal)));
lod = max(log2(lod), 0.0f);

I think I based it on something from a DICE presentation...I can't quite remember.

EDIT: it was from "Destruction Masking in Frostbite 2"


#4941733 EVSM artifacts at depth discontinuities in the gbuffer

Posted by MJP on 20 May 2012 - 03:42 PM

Mipmaps are automatically selected based on the derivatives of the texture coordinates you use to sample the texture. This causes problems in a deferred rendering scenario, since you get incorrect derivatives at discontinuities in the depth buffer for texture lookups that are based on a depth value. Shadow map lookups fall under this categeory, as do projected gobo textures for spot lights. The simple solution is to manually calculate a mip level yourself based on the depth value itself. Or you can store the derivatives of the depth buffer in your G-Buffer, and use that to compute the mip level. Either way you may not want to just disable mip maps for your shadow map, since being able to use mip maps is one of the key advantages of VSM techniques.


#4941586 hlsl texture Sample result?

Posted by MJP on 19 May 2012 - 11:10 PM

There's no way to specify the value that you get for channels that aren't present in the texture. In fact I don't think the documentation even specifies what the value should be, so I don' t know if you could even rely on it being 0 on all hardware.


#4941496 Branching & picking lighting technique in a Deferred Renderer

Posted by MJP on 19 May 2012 - 01:17 PM

A thread group (using compute shader terminology) is made up of multiple warps/wavefronts. So a thread group with 256 threads will have 8 warps, or 4 wavefronts. The warp/wavefront thing is actually transparent to the programmer, it's just an implementation detail of the hardware. So for instance you can have a thread group with 60 threads, but the hardware will end up executing a full wavefront and just masking off the last 4 threads.

OpenGL has no true equivalent of a compute shader, instead it allows you to interop with OpenCL. OpenCL has the same capabilities as compute shaders, they're just not as tightly integrated with the rest of the graphics pipeline.

You can render quads and process them with a fragment shader. The scheduling should be coherent if you use a grid of equal-sized quads. The one thing that you can't do compared to a compute shader is make use of thread group shared memory, which on-chip memory shared among the threads within a thread group. The cool thing you can do with deferred rendering is you can build up a shared list of light indices as you cull the lights per-tile, which allows you to cull the lights in parallel and then process the list per-pixel. You can't do that with a fragment shader, so instead it would probably make sense to build the per-tile lists ahead of time on the CPU.

EDIT: see this for more info


#4941280 Branching & picking lighting technique in a Deferred Renderer

Posted by MJP on 18 May 2012 - 03:00 PM

GPU's work in terms of groups of threads, where every thread in the group works on a SIMD hardware unit and shares the same instruction stream. If all of the threads in such a group go the same way in the branch then there's no problem, since they all can still execute the same instruction. But if some of the threads take a branch and some don't, then you have divergence and both sides of the branch must be executed. On Nvidia hardware these groups of threads are called "warps" and have 32 threads, and on AMD hardware they're called "wavefronts" and have 64 threads. GPU's will always execute entire warps/wavefronts at a time, so they're basically the smallest level of granularity for the hardware. Pixel/fragment shaders will (in general) assign the threads of a warp/wavefront to a group of contiguous pixels in screen space based on the macro-level coarse rasterization performed by the hardware. This is why you'll see people say that you want your branching to be coherent in screen space, since threads in a warp/wavefront will be next to each other in screen space. When you see people talk about tile-based deferred rendering they're generally going to be using compute shaders, where thread assignment is more explicit. With compute shaders/OpenCL/Cuda you explicitly split up your kernel into "thread groups", where a thread group is made up of several warps/wavefronts all executing on the same hardware unit (Shader Multiprocessor in Nvidia terminology, Compute Unit in AMD terminology). With compute shaders it's up to you to decide how to assign threads to pixels or vertices or whatever it is you're processing. In the case of deferred rendering, the common way to do it is to have thread groups of around 16x16 threads working on a 16x16 square of pixels. Then what you do is each thread group performs culling to create a per-tile list of lights to process, and then each thread runs through the list and applies each light one by one. There's no divergence in this case since each warp/wavefront uses the same per-tile list (since all thread in warp/wavefront are always belong to the same thread group), so you don't need to worry about that.

If you're looking for a nice intro to how GPU's work in terms of threading and ALU's, then this presentation is a good read.


#4941031 Speed up shader compilation (HLSL)

Posted by MJP on 17 May 2012 - 02:25 PM

In later versions of the SDK the shader compiler is entirely hosted in D3DCompile_xx.dll. That DLL is then used by fxc.exe and the D3DX functions, or it can be used directly.

You can specify the mipmap level explicitly with tex2Dlod. You can also use tex2Dgrad if you want to specify the UV gradients instead of a mip level.


#4940989 Map/unmap, CopyStructureCount and slow down

Posted by MJP on 17 May 2012 - 12:02 PM

So just to clarify, when an app *reads* a GPU buffer using map/unmap() will *always* cause the CPU to wait for the GPU?


Yup. The data you need doesn't exist until the GPU actually writes it, which means that the command that writes the data (and all previous dependent commands) have to be executed before the data is available for readback.

Compared to when an app *writes* to a dynamic buffer, which doesn't always cause the cpu to wait (I guess because under the hood dx seems to maintain multiple buffers for dynamic writes).


Indeed, the driver can transparently swap through multiple buffers using a technique known as buffer renaming. This allows the CPU to write to one buffer while the GPU is currently reading from a different buffer.

Also, when you say that the CPU "sits around waiting for the GPU to execute all pending commands", does that truly mean that all dx commands queued up for that frame have to be executed before a buffer can be read, or does it mean that only commands involving the particular append buffer to be read have to be waited for?


That would depend on the driver I suppose. I couldn't answer that for sure.


#4940848 Map/unmap, CopyStructureCount and slow down

Posted by MJP on 17 May 2012 - 12:55 AM

Normally the CPU and GPU work asynchronously, with the CPU submitting commands way ahead of when the GPU actually executes them. When you read back a value on the CPU (which is what you're doing with the staging buffer), you force a sync point where the CPU flushes the command buffer and then sits around waiting for the GPU to execute all pending commands. The amount of time it has to wait depends on the number of pending commands and how long they take to execute, which means it could potentially get much worse as your frames get more complex. I'm not sure how you're determining that the GPU is "pausing", but I would doubt that is the actual case.

Swapping the order can potentially help, if you can keep the CPU busy enough to absorb some of the GPU latency .


#4940733 Materials Blending a'la UDK

Posted by MJP on 16 May 2012 - 12:58 PM

I'm not sure how Unreal implements it, but in our engine we do all of the material layer blending in a single pass. To make it nicer for the artists the tool supports having a layer derive its properties from a separate material, and then we build a shader with all of the seperate material properties specified for each layer. This obviously requires some pretty complex tools and build pipeline support.

IMO this is really the only way to do it, because rendering in multiple passes is expensive. It also prevents you from blending together the material properties from multiple layers...for instance a lot of times you want to blend the normals from multiple layers and then apply lighting to the surface (imagine a layer of water running over some bricks). With deferred rendering you can blend G-Buffer properties, but of course only if your G-Buffer uses blending-friendly texture formats and packing. And of course you have to blend multiple render targets, which may not be very cheap.


#4940473 Adaptive V-Sync

Posted by MJP on 15 May 2012 - 11:50 AM

RAGE can do it because Nvidia specifically added it to the driver for them...the driver probably just detects that the game you're playing is in fact RAGE, and then turns it on (they do stuff like this all of the time).

Unfortunately D3D11 doesn't really provide you with the low-level control and timing information that you need to pull off a soft VSYNC. Presenting and syncing is actually pretty complicated, since the CPU and GPU are working asynchronously. DXGI does provide you with some timing information via IDXGISwapChain::GetFrameStatistics (which is only available in fullscreen mode, btw), but I haven't had much success in using that to implement a soft VSYNC.


#4939976 The state of Direct3D and OpenGL in 2012?

Posted by MJP on 14 May 2012 - 12:03 AM

I can't speak for every game developer, but we work with whatever API is available on the hardware. Windows is the only place we have a choice, and we happen to use D3D11. I'm sure other studios use OpenGL on Windows. Personally you'd have to pay me a *lot* of money to choose GL over D3D11 on Windows, but that's just me. Posted Image Most people probably wouldn't care that much.


#4939700 Best lighting technique?

Posted by MJP on 12 May 2012 - 09:49 PM

Not to mention the trade off between either rendering many lights at once efficiently and allowing many of these physically accurate BRDFs, since methods like deferred rendering don't play nicely with lots of different material types. Although light pre-pass rendering makes this less of a problem, but at the cost of having to render your geometry twice


Light pre-pass really doesn't give you anything at all in the way of material variety, at least if you're going with physically-based lighting approaches. All of the interesting variety comes from things that you need as input to the lighting pass, so the whole "minimal G-Buffer" thing doesn't really pan out.

With GI I'd always advise going for a dynamic solution, since baked GI or PRT will always require additional pre-processing (IMO something you'll want to avoid) and will break immersion quite soon in more dynamic scenes in my experience. And since current techniques allow for decent real-time dynamic GI implementations (like Crytek's diffuse indirect illumination through LPVs for example) this becomes a viable option.


This discussion is wayyyyyy premature judging by the experience level of the OP. He needs to learn to walk before he tries to run.




PARTNERS