Jump to content

  • Log In with Google      Sign In   
  • Create Account

MJP

Member Since 29 Mar 2007
Offline Last Active Today, 04:55 PM

#5047375 Handling depth sorting key and hardware instancing

Posted by MJP on 27 March 2013 - 02:46 PM

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.




#5046034 How do you benchmark your game?

Posted by MJP on 23 March 2013 - 01:47 PM

Yeah average framerate alone definitely isn't enough. You really want a graph of frame times (don't use FPS, it's non-linear) so that you can clearly see if there's any spikes, fluctuations, or stuttering. A highly variable frame time results in a very bad playing experience, even if the average frame rate is still "high". So a game that's locked at a constant 33.3ms (30fps) will often be perceived as "smoother' than a game that averages close to 16.6ms (60fps) but fluctuates from frame to frame.




#5044715 [SharpDX][.Net] Rendertargets dont display correctly when used as textures

Posted by MJP on 19 March 2013 - 05:35 PM

It looks like you still have the render target textures bound as outputs when you bind those textures as inputs, which means that the runtime will set them to NULL in order to avoid a read/write conflict. The runtime will warn you about these kinds of things if you pass the DEBUG flag when creating your device. However the messages will go to the native debugging output, so you need to either enable managed debugging for your project (which you can only do with paid versions of Visual Studio) or you can use a separate program like DebugView.




#5044348 Correct compiler syntax for compiling shader using fxc

Posted by MJP on 18 March 2013 - 03:11 PM

Just run fxc /?, it will give you a list of all of the parameters that you can pass. All of the flags have direct mappings to command-line parameters.

 

D3D10_SHADER_DEBUG = /Zi
D3D10_SHADER_SKIP_OPTIMIZATION = /Od
D3D10_SHADER_WARNINGS_ARE_ERRORS = /WX




#5043164 DirectX sprite size units?

Posted by MJP on 14 March 2013 - 03:17 PM

ID3DXSprite works in pixel coordinates, with (0, 0) being the top left of the screen. Your issue is probably that your texture is being scaled up to the next power-of-2 dimension (256x512 in your case). This is the default behavior of D3DXCreateTextureFromFile, and it's done in order to accommodate older harder that has no or limited support for non-power-of-2 dimensions. If you don't care about that, you can call D3DXCreateTextureFromFileEx with parameters that indicate that you don't want it upscaled.




#5042933 Cuda / OpenCL vs Rendered Textures

Posted by MJP on 13 March 2013 - 07:26 PM

If you need interaction with a more traditional graphics pipeline, then I wouldn't use either. Both D3D and OpenGL support compute shaders, and generally they are better to work with if you need to use the results of your computation for rendering. I would say a particle simulation falls into this category, since after you simulate them you will probably want to render them as quads. Also using compute shaders leaves the door open for DrawIndirect, which can be really useful for particles. For a raytracer, you probably won't need to interact with a D3D/GL context so CUDA and CL might make more sense. I'm not really experienced with OpenCL so I can't really give you a good comparison against Cuda. In terms of Cuda vs. compute shaders, Cuda is much closer to writing standard C/C++ code.




#5042606 Create and fill texture array

Posted by MJP on 13 March 2013 - 12:06 AM

You only want to use DYNAMIC usage if the resource will actually be dynamic. In other words, that usage is designed for cases where the CPU will need to update the contents of the resource many times during the lifetime of that resource. If you just want a static read-only texture, then you should use IMMUTABLE usage and initialize the contents of the texture using the pInitialData parameter. For the case of a texture array, you'll want to pass a pointer to an array of N D3D10_SUBRESOURCE_DATA, where N is the number of array slices in your texture. Then for each D3D10_SUBRESOURCE_DATA, you need to set pSysMem to a pointer to the texture data for a single array slice.




#5042497 ComputeShader ConsumeStructuredBuffer Problem

Posted by MJP on 12 March 2013 - 04:58 PM

That "NumElements" is the maximum number of elements in the buffer. Append/Consume buffers maintain a separate counter indicating how many elements out of the total number are in use at any given period of time. It's conceptually similar to the difference between capacity() and size() in std::vector: the former tell you how much memory was allocated for the internal array of elements, while the latter tells you how many elements you've actually added to the array. Like I said before you need to change the counter you either need to Append to the buffer, you or you need to specify the count when binding the UAV.




#5042461 ComputeShader ConsumeStructuredBuffer Problem

Posted by MJP on 12 March 2013 - 03:44 PM

The way that Append/Consume buffers work is that there's a "hidden" counter on the buffer that holds the number of items in it. This number of items <= the number of elements in the buffer when you create it. There are three ways to change this counter: by calling Append (increments the counter), but calling Consume (decrements the counter), or by manually specifying the count when calling CSSetUnorderedAccessViews. In your particular case, if you don't specify the count when binding the UAV it will be 0 and Consume won't give you back an element from the buffer.

In general, you should be careful to consider whether you actually need to use an Append or Consume buffer. In any case where you know the number of elements ahead of time you typically don't need to use them. For instance if you run 100 threads and only some of them may output a value, then you want to use an Append buffer since it's unknown how many you will end up with. However if you then run another compute shader that reads in the N elements that were output and does some processing on them, then you don't need to use a Consume buffer since you can just copy the hidden counter out of the buffer and into a constant buffer. Then you can just access the elements without calling Consume, which is actually faster since the hardware doesn't need to do a global atomic decrement.




#5042389 3d Model importer; managing material, texture and normal information

Posted by MJP on 12 March 2013 - 12:09 PM

Yes, that is what I meant. As a simple example, take a cube. A cube only has 8 unique positions. However, if you want each face of the cube to have a flat normal then you need 6 unique normals. To combine these so that you can use a single index buffer, you end up needing 24 interlaced vertices.




#5042052 Artefacts using big model

Posted by MJP on 11 March 2013 - 03:49 PM

Yeah those are wayyyyy too far apart. Bring in the far plane or push up the near plane.




#5042026 3d Model importer; managing material, texture and normal information

Posted by MJP on 11 March 2013 - 02:48 PM

1. Typically you'll assign materials to faces, not vertices. This is because vertex data is often shared between multiple faces. Usually what you'll do is you'll sort faces by material, and then you'll keep track of the start and end index of each material. So Material A would be faces 0-500, and Material B would be faces 500-750. Then you can use the start index and number of indices as parameters for DrawIndexed.

2. For most organic shapes you will use "smooth normals", where normals are shared between neighboring faces and are often different at each vertex. Then at runtime you interpolate the normal across the face during rasterization, and this lets you have the appearance of smooth geometry without the underlying polygons actually being "smooth". The following image shows the difference between "smooth normals" and "one normal per face":

 

SmoothNormals.jpg

 

3. GPU's can only work with one index buffer. Typically you'll duplicate vertex data so that you end up with one interleaved vertex buffer with all of your vertex data, and then you'll index into that using a single index buffer.




#5042016 Texture Mapping Units

Posted by MJP on 11 March 2013 - 02:38 PM

On modern hardware, the "texture unit" is a functional unit that's bolted onto one of the "cores" of the GPU (Shader Multiprocessor in Nvidia lingo, Compute Unit for AMD). When triangles get rasterized the hardware will spawn many many pixel shader threads, and those threads will get assigned to different cores. All of the threads on a given core will then share a texture unit for all texture fetches.

Why would you think that there would be a bottleneck if multiple texture units try to access the same texture? Modern GPU's actually have a shared L2 cache for all cores that texture fetches will go through, so it's actually likely that you would get a lot of cache hits.




#5041654 Use of immediate context with multithreaded rendering

Posted by MJP on 10 March 2013 - 06:16 PM

You don't have to issue Draw commands on the immediate context if you don't want to, the diagram is just showing that it's possible to do it.




#5041341 Vertex and index buffers in DirectX.

Posted by MJP on 09 March 2013 - 07:25 PM

Yeah GPU's can only work with a single index buffer, so typically you will duplicate vertices so that you have enough for all necessary combination of position/normal/UV/tangent/etc. If you're using a vertex shader and DX10+ it's possible to implement more complex means for storing vertex data since you can load arbitrary data from buffers, however like I said before the GPU is limited to one index buffer and your vertex shader will (usually) only run once for each vertex specified in the index buffer. Therefore it can get pretty complicated to do something like this.






PARTNERS