Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Mar 2007
Offline Last Active Yesterday, 11:19 PM

#5009703 How are games compiled for multiple operating systems at once?

Posted by MJP on 11 December 2012 - 11:00 PM

Just to add to the above posts...while you can certainly author platform-specific code using the pre-processor, in practice that's really messy. It's not hard to imagine how convoluted a real window-creation function would look if you just put the code for multiple platforms all in the same places with #if's and #ifdef's thrown in everywhere. So it's generally better (IMO) to avoid that whenever possible by using other means to selectively compile code. For instance at my current company, we tag platform-specific cpp files with a suffix that tells our build system what platform it should be compiled for. That way each file can contain a whole bunch of implementation-specific code for a single class or a group of related functions.

#5008974 Constant Buffers

Posted by MJP on 09 December 2012 - 09:58 PM

You probably just need to transpose your matrices. By default shaders expect column-major matrices in constant buffers, which means transposing row-major matrices when setting them into a constant buffer. The effects framework does this for you, so a lot of people hit this bug when handling constant buffers themselves for the first time.

FYI, you can tell the compiler to expect row-major matrices using a compile flag. You can also mark the matrix with "row_major" in your HLSL code to do the same thing on a per-matrix basis.

#5008642 Disappearing Data in Structured Buffers

Posted by MJP on 08 December 2012 - 06:16 PM

This isn't your problem, but you don't need to set the "ElementOffset" or "ElementWidth" members of D3D11_BUFFER_SRV. The structure is defined like this:

typedef struct D3D11_BUFFER_SRV
		UINT FirstElement;
		UINT ElementOffset;
		}  ;
		UINT NumElements;
		UINT ElementWidth;
		}  ;

Since they're in unions, you can only set one or the other in each union pair. For a structured buffer you want to use "FirstElement" and "NumElements".

Anyway, I suspect your problem is with your C++ struct. If you check sizeof(SInstance), you'll find that it's 80 bytes in size. This is because the alignment requirements of XMMATRIX (XMMATRIX is already 16-byte aligned, you don't need to add the alignment manually) will cause the struct to have 16-byte alignment, which will cause the compiler to insert 12 bytes of padding after dTemplateType. However if you were to declare something like this in HLSL..

struct SInstance
	float4x4 matLocation;
	int dTemplateType;

...this struct will be 68 bytes in size. Structs for structured buffers don't have 16-byte alignment requirements, that was an incorrect assumption on your part. HLSL really only works in terms of 4-byte values, so structs used for structured buffers will pretty much always have 4-byte alignment. If you stick to using 4-byte types in your C++ struct, you should be fine. This means you should avoid the DirectXMath SIMD types like XMVECTOR and XMMATRIX, since they have 16-byte alignment. Try changing your struct to this:

struct SInstance
	XMFLOAT4x4 matLocation;
	int dTemplateType;

If you do this, the stride of your structured buffer should match the stride expected by your shader. If you have a mismatched stride, the runtime will transparently set your buffer to NULL which will cause you to get 0's when your shader attempts to access it. If you create the device with the DEBUG flag, the runtime will output an error message to tell you that this has occurred.

#5008581 Does stretchrect or UpdateRect take out alpha? Please help!

Posted by MJP on 08 December 2012 - 01:53 PM

StretchRect doesn't really "draw" anything, it basically just does a copy. It won't use alpha-blending even if you enable it with blend states.

I would suggest using ID3DXSprite for something like this. It supports blending, and more complex transformations such as rotations.

#5008578 SSAO and skybox artifact

Posted by MJP on 08 December 2012 - 01:49 PM

A warp consists of either 16 or 32 threads grouped together.

I think you mean "32 or 64" Posted Image

I thought a Wavefront on AMDs architecture consists of 16 execution units. Or am I wrong? (I just used warp as a general term, because I like it more Posted Image)

Nah there's 64 threads in a wavefront. In their latest architecture (GCN) the SIMDs are 16-wide, but they execute each instruction 4 times to complete it for the entire wavefront (so a single-cycle instruction actually takes 4 cycles to execute).

#5008337 SSAO and skybox artifact

Posted by MJP on 07 December 2012 - 08:45 PM

When dealing with shaders, ALL code is executed, including ALL branches, all function calls, etc. The ONLY exception for this is if something is known at compile time that will allow the compiler to remove a particular piece of code.

This is how all graphics cards work, AMD, NVIDIA, etc. So, your additional cost is of the if statement, and in your example, you are adding an extra if instruction. This is a zero cost on gpus. If you want to read on it, check out vectors processors and data hazards.

If you somehow split our shader up and added an if statement to the middle thinking that it would speed up your code, you would get NO speedup. because ALL paths will be executed.

This is completely wrong, even for relatively old GPU's (even the first-gen DX9 GPU's supported branching on shader constants, although in certain cases it was implemented through driver-level shenanigans). I'm not sure how you could even come to such a conclusion, considering it's really easy to set up a test case that shows otherwise.

#5008261 [SharpDX] Questions about DirectX11

Posted by MJP on 07 December 2012 - 04:50 PM

If you set a constant buffer into slot 2, it gets bound to register 2. No exceptions.

What we were talking about was how the compiler assigns registers to resources declared in your shader code. In general the compiler will assign in order of declaration, but this isn't really something you can rely on. If you're not going to assign registers in your code, then you probably want to use the reflection interfaces to query the register based on the resource name and type.

#5008258 [DX11] InterlockedMin works wrong?

Posted by MJP on 07 December 2012 - 04:47 PM

InterlockedMin takes a uint parameter, hence it only works with unsigned integers. If you pass it an int, it will treat it as an unsigned integer which will cause negative numbers to be treated as greater than positive numbers.

#5007985 Mapping a DXT1 compressed texture

Posted by MJP on 06 December 2012 - 10:05 PM

Instead of decompressing in software, you can just write a super-simple compute shader that reads each texel and writes it to an output texture with a R32G32B32A32_FLOAT format. Then you can copy that to a staging buffer, and read the un-compressed data.

#5007898 float2 dx = ddx(uv.xy * size.x); // does this make sense?

Posted by MJP on 06 December 2012 - 03:59 PM

ddx is the partial derivative of the value with respect to screen-space X, ddy is the partial derivative of the value with respect to screen-space Y. In other words ddx tells you how much the value changes if you were to move to right a pixel, while ddy tells you how much the value changes if you were to move down a pixel. That is why there are two different functions.

I can't tell you whether your code makes sense without knowing what you're actually trying to accomplish, and also knowing what "size" is.

#5007894 [SharpDX] Questions about DirectX11

Posted by MJP on 06 December 2012 - 03:50 PM

You don't have to specify registers in the shader. It can be convenient to do so if you want to be explicit, but it's not required. If you don't assign one yourself, the compiler will bind all used resources to a register based on the resource type. There are 16 "s" registers for samplers, 128 "t" registers for shader resource views, 16 "b" registers for constant buffers, and 8 "u" registers for unordered access viewers. These registers then correspond to the "slots" that you specify to API calls like PSSetShaderResources. So if a texture is bound to register t2, then in your app code you'll want to bind your shader resource view to slot 2.

If you're going to hard-code the slots in your app code, then you'll want probably want to explicitly specify the register in your shader code. Otherwise you might run into the cases where the compiler doesn't allocate the register that you think it will, often because a resource doesn't end up getting used in the shader and therefore gets optimized out entirely.

#5007318 Best practises for writing shaders

Posted by MJP on 05 December 2012 - 12:42 AM

I had a feeling that multi-pass shaders had fallen out of favour in recent years. I'm guessing the reason for multi-pass shaders historically was to get around the lower instruction counts for the shader models at the time?

Yes, and also as a workaround for hardware that didn't support multiple render targets.

#5006345 GPU runtime optimization by the graphics driver

Posted by MJP on 02 December 2012 - 01:16 PM

The problem with queries for performance measurements is that they don't always measure what you want them to measure. Looking at the difference in GPU timestamps gives you the amount of time that it takes the GPU to process all commands in the command buffer that were between the timestamps, which doesn't necessarily tell you the total time required to fully execute those commands. This is because a GPU is pipelined, and can be working on multiple commands (even draw commands) simultaneously. Therefore they're not so great for fine-grained measurements. They'll work better if you can place them around natural sync points for the GPU, such as render target switches.

#5006102 DeviceContext->map() using system and GPU memory?

Posted by MJP on 01 December 2012 - 03:46 PM

If IMMUTABLE suits your needs, then you should definitely use it. Doing this will allow the driver to properly optimize for this use case by placing the resource in the appropriate memory location (this usually means it will be placed directly in high-speed GPU memory). Just be aware that you can't use CPU_ACCESS_READ with IMMUTABLE resources, that's only valid for STAGING resources.

#5005909 Is there anyway to create curved 3D model?

Posted by MJP on 30 November 2012 - 07:26 PM

If we're talking about GPU hardware rasterization, then the only supported primitives are points, lines, and triangles. To approximate curved surfaces with actual geometry you would need to tessellate to triangles. If you don't want to do that, you can shade surfaces as if the surface geo was actually curved. This can be accomplished with normal maps, or even with really simple techniques like interpolating smooth vertex normals across a triangle. It's also possible to discard pixels and manually output depth values, which can give you the outline, shading, and resulting depth buffer of a curved surface. However doing this isn't really practical, since it doesn't play nice with GPU hardware optimizations.