Jump to content

  • Log In with Google      Sign In   
  • Create Account

MJP

Member Since 29 Mar 2007
Online Last Active Today, 05:19 PM

#5049450 How does the pipeline know the index what it need when it draws a quad using...

Posted by MJP on 03 April 2013 - 12:01 AM

The triangles you output from the GS are not indexed at all, since there are no indices. Instead you output one or more triangle strips. The vertices that you append to a TriangleStream form a strip according to the standard rules for a triangle strip topology. To mark the end of a strip, you call RestartStrip. If you want to just output a triangle list you can do so by appending 3 vertices, calling RestartStrip, and then repeating.

So in the case of your quad output from the GS, if you append all 4 verts and then call RestartStrip your 2 triangles will be (v0, v1, v2) and (v1, v2, v3), which means the winding order will be clockwise.




#5049030 Compute Shader - Reading and writing to the same RGBA16 texture?

Posted by MJP on 01 April 2013 - 06:04 PM

Yeah it has to be an R32 format if you want to read and write to it in the same shader. The only workaround is to use a StructuredBuffer instead of a Texture2D, since you can read and write to regardless of the struct size.




#5048727 DirectX 11 Rigging in Shader

Posted by MJP on 31 March 2013 - 05:20 PM

You'll want to use a shader resource view (SRV). SRV's are read-only views of a resource, so you would use it in a case like this where the shader only reads to it and doesn't write to it. Unorderd access views (UAV) provide read and write access, so you only use those when you need to write to a resource.




#5048126 Is my frustum culling slow ?

Posted by MJP on 29 March 2013 - 03:10 PM

I would suggest reading through this presentation.




#5047886 Compute Shader

Posted by MJP on 28 March 2013 - 11:31 PM

You can certainly use a compute shader to implement a technique that's normally performed in a pixel shader using a full-screen quad. Rendering a full-screen quad will spawn a thread for each pixel, so it will also be massively multithreaded. In fact the same exact shader implemented as a pixel shader and compute shader will almost always be faster as the pixel shader version, since there is some overhead associated with using compute shaders. In general you have to make use of an optimization only possible with compute shaders (usually shared memory) in order for the compute shader version to be faster.

Choosing the optimal number of threads in a thread group is a balancing act. On one hand you need enough threads in a thread group to allow the hardware to hide latency from memory access. On the other hand having more thread groups can allow the shader to better saturate the many cores present on a GPU. The best balance depends on the shader, the hardware, and what else is currently executing on the GPU. You should also keep in mind that the hardware will always launch threads in groups of threads known as warps (Nvidia) or wavefronts (AMD). A warp has 32 threads, while a wavefront has 64. If you pick a thread group size that isn't an even multiple of the warp/wavefront size, the hardware will round up the number of threads to the next multiple of the warp/wavefront side. Sticking with a multiple of 64 ensures that you won't waste threads when running on either architecture, but if you only run on Nvidia you can consider using a multiple of 32.

The book in my signature has a lot of material regarding compute shaders. You can also consider reading CUDA or OpenCL resources, since the overall concepts are very similar between the three platforms.




#5047713 Handling depth sorting key and hardware instancing

Posted by MJP on 28 March 2013 - 12:09 PM

For certain special cases of meshes you can actually pre-sort them to always render in the correct order regardless of viewing direction. For instance, say you have a transparent sphere that you wanted to be "double-sided" so that you can see both the front and the back of the sphere. A common way to do this is to duplicate all of the faces and flip the winding order so that they show up when back-facing. If you duplicate and append them to the end of the index buffer you get the wrong sorting order, since the front will render before the back. But if you append it to the beginning of the index buffer, the back faces will render first and the front faces will render second giving you the correct blending order.




#5047537 Handling depth sorting key and hardware instancing

Posted by MJP on 28 March 2013 - 12:15 AM

 

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.

Are you sure about this behavior?  How can this be assured when multiple primitives are being rasterized in parallel?  There is also some gray area regarding generated primitives too (via tessellation or the geometry shader) as they can be generated in parallel instances of the shaders...

 

I have always heard that the order is roughly equivalent to the order they are submitted in, but that they are explicitly not guaranteed to be processed in exact order.


Definitely. You're never guaranteed about the order in which vertices/primitives/pixels are processed in the shader units, but the ROPS will guarantee that the final results written to the render target match the triangle submission order (which is often done by buffering and re-ordering pending writes from pixel shaders). This is even true for geometry shaders, which is a big part of what makes them so slow.




#5047455 Questions about batching static geometry

Posted by MJP on 27 March 2013 - 07:42 PM

To batch like this you either need to pre-transform the vertices, or in the vertex shader you need to be able to look up which mesh any given vertex belongs to so that you can retrieve the correct world matrix. However you're talking about static meshes here, so you could just pre-transform once when building the level or at load time. That said, your approach may not scale well for scenes with high-polygon scenes since you will have to move and process a whole lot of data on CPU. There can also be GPU performance implications from accessing memory that's CPU-writable.

The more modern approach would be to use instancing, which is where you draw the same mesh many times in a single draw call. Of course, this relies on you having meshes that are duplicated many times.




#5047375 Handling depth sorting key and hardware instancing

Posted by MJP on 27 March 2013 - 02:46 PM

It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.

It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.

The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.


The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.




#5046034 How do you benchmark your game?

Posted by MJP on 23 March 2013 - 01:47 PM

Yeah average framerate alone definitely isn't enough. You really want a graph of frame times (don't use FPS, it's non-linear) so that you can clearly see if there's any spikes, fluctuations, or stuttering. A highly variable frame time results in a very bad playing experience, even if the average frame rate is still "high". So a game that's locked at a constant 33.3ms (30fps) will often be perceived as "smoother' than a game that averages close to 16.6ms (60fps) but fluctuates from frame to frame.




#5044715 [SharpDX][.Net] Rendertargets dont display correctly when used as textures

Posted by MJP on 19 March 2013 - 05:35 PM

It looks like you still have the render target textures bound as outputs when you bind those textures as inputs, which means that the runtime will set them to NULL in order to avoid a read/write conflict. The runtime will warn you about these kinds of things if you pass the DEBUG flag when creating your device. However the messages will go to the native debugging output, so you need to either enable managed debugging for your project (which you can only do with paid versions of Visual Studio) or you can use a separate program like DebugView.




#5044348 Correct compiler syntax for compiling shader using fxc

Posted by MJP on 18 March 2013 - 03:11 PM

Just run fxc /?, it will give you a list of all of the parameters that you can pass. All of the flags have direct mappings to command-line parameters.

 

D3D10_SHADER_DEBUG = /Zi
D3D10_SHADER_SKIP_OPTIMIZATION = /Od
D3D10_SHADER_WARNINGS_ARE_ERRORS = /WX




#5043164 DirectX sprite size units?

Posted by MJP on 14 March 2013 - 03:17 PM

ID3DXSprite works in pixel coordinates, with (0, 0) being the top left of the screen. Your issue is probably that your texture is being scaled up to the next power-of-2 dimension (256x512 in your case). This is the default behavior of D3DXCreateTextureFromFile, and it's done in order to accommodate older harder that has no or limited support for non-power-of-2 dimensions. If you don't care about that, you can call D3DXCreateTextureFromFileEx with parameters that indicate that you don't want it upscaled.




#5042933 Cuda / OpenCL vs Rendered Textures

Posted by MJP on 13 March 2013 - 07:26 PM

If you need interaction with a more traditional graphics pipeline, then I wouldn't use either. Both D3D and OpenGL support compute shaders, and generally they are better to work with if you need to use the results of your computation for rendering. I would say a particle simulation falls into this category, since after you simulate them you will probably want to render them as quads. Also using compute shaders leaves the door open for DrawIndirect, which can be really useful for particles. For a raytracer, you probably won't need to interact with a D3D/GL context so CUDA and CL might make more sense. I'm not really experienced with OpenCL so I can't really give you a good comparison against Cuda. In terms of Cuda vs. compute shaders, Cuda is much closer to writing standard C/C++ code.




#5042606 Create and fill texture array

Posted by MJP on 13 March 2013 - 12:06 AM

You only want to use DYNAMIC usage if the resource will actually be dynamic. In other words, that usage is designed for cases where the CPU will need to update the contents of the resource many times during the lifetime of that resource. If you just want a static read-only texture, then you should use IMMUTABLE usage and initialize the contents of the texture using the pInitialData parameter. For the case of a texture array, you'll want to pass a pointer to an array of N D3D10_SUBRESOURCE_DATA, where N is the number of array slices in your texture. Then for each D3D10_SUBRESOURCE_DATA, you need to set pSysMem to a pointer to the texture data for a single array slice.






PARTNERS