Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Mar 2007
Offline Last Active Yesterday, 11:05 PM

#4989884 XNAMath vs D3DX10Math && *.fx files vs *.psh and *.vsh

Posted by MJP on 13 October 2012 - 02:55 PM

There are a few things to consider:

1. D3DX is essentially deprecated at this point. The library is totally functional and there's nothing "wrong" with it per se, but it is no longer being developed. Any future updates will be for DirectXMath (DirectXMath is the new name for XNAMath), and not for D3DX.

2. DirectXMath is designed to map well to modern SIMD instruction sets (SSE2-4 and ARM NEON), and allows for high performance math code if used right. D3DX math can use older SSE instructions, but it does so in a sub-optimal way. One result of this is that DirectXMath has a steeper learning curve, and generally requires you to write more code since you have to explicitly load and store SIMD values. However it's possible to write a wrapper that simplifies the usage, if you don't care very much about performance.

3. DirectXMath can be used in Windows 8 Metro apps, and like I mentioned earlier supports ARM instructions. So if you ever want to release on Windows Store, you can't use D3DX at all.

With all of that said, my recommendation would be to go with DirectXMath.

Now when you talk about .fx files, what you're really talking about is whether or not you should use the Effects Framework. The Effects Framework has a bit of a weird history. It started out as part of D3DX, and was essentially a support library that helped you manage shaders and setting constants, textures and device state. Then in D3D10 it became part of the core API, and was moved out of D3DX. Then for D3D11 they moved it out of both, stripped out some functionality, and provided it as source code that you can compile if you want to use it. Once again there are a few considerations:

1. Like I said before the Effects Framework is a helper library that sits on top of D3D. It helps you manage shaders and states, but it doesn't do anything that you couldn't do yourself with plain shaders and core API's.

2. In D3D9 the Effects Framework provided a pretty good model for mapping to the shader constant setup used by SM2.0/SM3.0 shaders, as well as the render state API. For D3D10 and D3D11 it is no longer such a good fit for constant buffers and immutable state objects, at least in my opinion. Like I mentioned earlier certain functionality was stripped out for the Effects11 version, which also makes it less useful than it used to be.

3. Like D3DX you can't use it for Metro applications. This is because it uses D3DCompiler DLL to compile shaders and obtain reflection data, and this functionality isn't available to Metro apps.

Personally, I wouldn't recommend using Effects11. It's not really very convenient anymore, and I feel like you're better off just getting familiar with out how shaders, states, constant buffers, and resources work in the core API.

#4989605 ShaderReflection; stripping information from an Effect

Posted by MJP on 12 October 2012 - 05:15 PM

When you declare a constant buffer in a shader, the shader doesn't really care about the actual D3D resources that you use to provide the data. So for instance if you have a shader with this constant buffer layout:

cbuffer Constants : register(b0)
    float4x4 World;
    float4x4 ViewProjection;

When you compile a shader with this code, there's no allocation of resources for that constant buffer or anything like that. All that code says is "when this shader runs, I expect that constant buffer with 128 bytes (32 floats * 4 bytes) should be bound to slot 0 of the appropriate shader stage". It's then your application code's responsibility to actually create a constant buffer using the Buffer class with the appropriate size and binding flags, fill that that buffer with the data needed by the shader, and then bind that buffer to the appropriate slot using DeviceContext.<ShaderType>.SetConstantBuffer. If you do that correctly, your shader will pull the data from your Buffer and use it.

Now let's say you have two vertex shaders that you compile, and both use the same constant buffer layout in both shaders. In this case there is no "duplication" or anything like that, since it's your responsibility to allocate and manage constant buffer resources. So if you wanted to, it's possible to share the same Buffer between draw calls using your two different shaders. You could bind the buffer, draw with shader A, and then draw with shader B, and both shaders will pull the same data from the buffer. Or if you wanted, you could set new data into the buffer after drawing with shader A, and then shader B will use the new contents of the buffer. Or if you wanted you could create two buffers of the same size, and bind one buffer for shader A and bind the other for shader B.

An interesting consequence of this setup is that you don't necessarily need the exact same constant buffer layout in two shaders in order to shader a constant buffer. For instance shader B could just have this:

cbuffer Constants : register(b0)
    float4x4 World;

In that case it would be okay to still use the same constant buffer as shader A, since the size of the buffer expected by shader B is still less than or equal to the size of the constant buffer that was bound. But it's up to you to make sure that in all cases the right data gets to the right shader. In practice I wouldn't really recommend doing something like I just mentioned, since it can easily lead to bugs if you update a constant buffer layout in one shader but forget to do it in another. Instead I would recommend defining shared layouts in a header file, and then using #include to share it between different shaders.

#4988817 Unreal 4 voxels

Posted by MJP on 10 October 2012 - 01:25 PM

The indirect light does indeed get shadowed due to the cone tracing, although not perfectly due to the approximations introduced by the voxelizations and the tracing itself. At the SIGGRAPH presentation they mentioned that they were still using SSAO to add some small-scale AO from features that weren't adequately captured by the voxelization, but I think that's a judgement call that you'd have to make for yourself.

#4988815 Cascading shadow maps - best approach to learn

Posted by MJP on 10 October 2012 - 01:18 PM

The DX11 sample is complicated because it demonstrates a lot of (advanced) functionality. Cascaded shadow maps in general can be pretty complicated, and some of the examples you'll see will reflect that. I agree with you that DXUT is pretty bad, but that's just something we have to live with. Posted Image

Anyway, I definitely think you should have a firm handle on using render targets and depth buffers if you haven't done so already. If you haven't done shadow mapping yet, you might want to try doing it for a single spot light first. A spot light is a *much* easier case than directional lights. After that, you can try getting a basic directional light shadow map without cascades. That will introduce you to some of the particular issues tackled by CSM, without having to implement the whole thing.

#4988814 How to clone a mesh in DX11?

Posted by MJP on 10 October 2012 - 01:13 PM

I assume that when you say "DX9", you're talking about the ID3DXMesh class? I'm not sure what you're using now for managing your meshes, but cloning a mesh is pretty simple:
  • Make sure that you have the original mesh data available in CPU memory. If you already loaded it into a vertex buffer, you can copy it to a staging buffer and Map that to get a pointer to the data.
  • Allocate enough CPU memory to fit however many vertices you have, using the new vertex layout
  • For each element in the new vertex layout, check if that element exists in the old layout. If it exists, store the byte offset of the element in the old layout. If it doesn't exist, store some value indicating that it doesn't exist (for instance, you could use -1). You can store this all in an array or std::vector containing one integer per new vertex element.
  • Loop over all vertices. For each vertex, loop over your table of offsets that you built in the previous step. If the the element exists in the old layout, copy the data from the old vertex data to the new vertex data using the offset that you stored. When you're done with all elements, advance your pointers to the vertex data by the strides so that they point to the start of the next vertex
  • Create a vertex buffer with the new vertex data

#4988612 How many textures can be bound at once in DX11?

Posted by MJP on 09 October 2012 - 11:15 PM

The max shader resource slots is 128, and for feature level 11 the max texture array size is 2048.

#4988548 CryENGINE 3 Irradiance Volumes vs Frostbite 2 Tiled Lighting

Posted by MJP on 09 October 2012 - 06:40 PM

What I meant that what they are doing anything that's radically new at a fundamental level...radiosity has been around for a very long time and there has been a lot of research devoted to optimizing it. Most of what Enlighten offers is a framework for processing scenes and handling the data in a way that's optimized for their algorithms. I'm sure they've devoted a lot of time to optimizing the solving of the radiosity, but I don't that's really necessary for understanding what they're doing at a higher level.

What they're doing isn't magic...they're techniques only work on static geometry so a lot of the heavy lifting can be performed in a pre-process. They also require you to work with proxy geometry with a limited number of patches, which limits quality. They also limit the influence of patches to zones, and only update a certain number of zones at a time (you can see this if you've ever seen a video of their algorithm where the lighting changes quickly).

I don't to sound like I'm trivializing their tech or saying it's "bad" in any way (I'm actually a big fan of their work), my point was just that their techniques stem from an area of graphics that's been around for a long time and is well-documented.

#4988439 Are GPU drivers optimizing pow(x,2)?

Posted by MJP on 09 October 2012 - 01:10 PM

When I've cared enough to check the assembly in the past, the HLSL compiler has replaced pow(x, 2) with x * x. I just tried a simple test case and it also worked:

Texture2D MyTexture;
float PSMain(in float4 Position : SV_Position) : SV_Target0
    return pow(MyTexture[Position.xy].x, 2.0f);

dcl_globalFlags refactoringAllowed
dcl_resource_texture2d (float,float,float,float) t0
dcl_input_ps_siv linear noperspective v0.xy, position
dcl_output o0.x
dcl_temps 1
ftou r0.xy, v0.xyxx
mov r0.zw, l(0,0,0,0)
ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t0.xyzw
mul o0.x, r0.x, r0.x
// Approximately 5 instruction slots used

I wouldn't be surprised it he HLSL compiler got tripped up every once in a while, but there's also the JIT compiler in the driver too. So you'd have to check the actual microcode to know for sure, if you access to that.

#4987440 Questions about Intel Sample

Posted by MJP on 06 October 2012 - 10:59 AM

Also the main function of the Pixel Shader(GBufferPS) returns a struct:

struct GBuffer
	float4 normal_specular : SV_Target0;
	float4 albedo : SV_Target1;
	float2 positionZGrad : SV_Target2;

Instead of a float4.How does the gpu even work with this?I mean a pixel shader can only return a color,right?Not a whole struct.

This features is called "multiple render targets" (MRT), and like the name suggests it allows the GPU to output to up to 8 render targets simultaneously. Honestly it's a pretty basic GPU/D3D feature, and if you're not familiar with such things yet I would stick to simpler material before looking at the Intel sample (which is quite advanced!).

#4987181 Precompiled effect files and macros

Posted by MJP on 05 October 2012 - 11:13 AM

It's been a very long time, but I think you can assign a sampler state a value from a global int declared in your .fx file. Then at runtime you can set the int value with ID3DXEffect::SetInt. Otherwise you could always just call SetSamplerState yourself.

#4987026 ShaderResourceView w/ D3D11_TEX2D_SRV

Posted by MJP on 05 October 2012 - 12:24 AM

Texture arrays are intended for cases where the shader needs to select a single texture from an array at runtime, using an index. Usually this is for the purpose of batching. For instance, if you had 5 textured meshes and you wanted to draw them all in one draw call, you could use instancing and then select the right texture from an array using the index of the instance.

In your case for a tetris game, I don't think it would be necessary. You probably won't ever need to batch with instancing, in which case texture arrays won't give you any performance advantage. You should be fine with just creating a bunch of textures, and then switching textures between each draw call.

#4987016 Beginner Question: Why do we use ZeroMemory macro for Swap Chain object ?

Posted by MJP on 04 October 2012 - 11:25 PM

It's just a way of initializing the structure data, since the struct doesn't have a constructor. This has always been considered the idomatic way to initialize Win32 structures as long as I can remember. You don't have to do it if you don't want to, you just need to make sure that you set all of the members of the struct.

#4985758 Can you use tessellation for gpu culling?

Posted by MJP on 01 October 2012 - 08:22 AM

Geometry shaders in general are typically not very fast, and stream out can make it worse because all of the memory traffic. IMO it's a dead end if you're interested in scene traversal/culling on the GPU. Instead I would recommend trying a compute shader that performs the culling, and then fills out a buffer with "DrawInstancedIndirect" or "DrawIndexedInstancedIndirect" arguments based on the culling results. I'd suspect that could actually be really efficient if you're already using a alot of instancing.

In general you don't want to just draw broad conclusions like "the CPU is better than the GPU for frustum culling" because it's actually a complex problem with a lot of variables. Whether or not its worth it to try doing culling on the GPU will depend on things like:
  • Complexity of the scene in terms of number of meshes and materials
  • What kind of CPU/GPU you have
  • How much frame time is available on the CPU vs. GPU
  • What feature level you're targetting
  • How much instancing you use
  • Whether or not you use any spatial data structures that could possible accelerate culling
  • How efficiently you implement the actual culling on the CPU or GPU
One thing that can really tip the scales here is that currently even with DrawInstancedIndirect there's no way to avoid the CPU overhead of draw calls and binding states/textures if you perform culling on the GPU. This is why I mentioned that it would probably be more efficient if you use a lot of instancing, since your CPU overhead will be minimal. Another factor that can play into this heavily is if you wanted to perform some calculations on the GPU that determine the parameters of a view or projection used for rendering, for instanced something like Sample Distribution Shadow Maps. In that case performing culling on the GPU would avoid having to read back results from the GPU onto the CPU.

#4985385 Updating engine from dx9 to dx11

Posted by MJP on 30 September 2012 - 10:09 AM

The initial port probably won't be too hard for you. It's not too hard to spend a week or two and get a DX9 renderer working on DX11. What's harder is actually making it run fast (or faster), and integrating the new functionality that DX11 offers you. Constant buffers are usually the biggest performance problem for a quick and dirty port, since using them to emulate DX9 constant registers can actually be slower than doing the same thing in DX9. Past that you may need a lot of re-writing for things like handling structured buffers instead of just textures everywhere, having texturs/buffers bound to all shader stages, changing shaders to use integer ops or more robust branching/indexing, and so on.

#4984654 What is the current trend for skinning?

Posted by MJP on 28 September 2012 - 01:59 AM

With DX11.1 you can output to a buffer from a vertex shader, which is even better than using a compute shader or stream out since you can just output the skinned verts while rasterizing your first pass.