Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

Quat

Member Since 15 Sep 2003
Offline Last Active Apr 12 2013 10:02 AM
-----

Topics I've Started

Building Shaders

11 April 2013 - 10:53 AM

How do pros compile their shaders as part of their build pipeline? 

 

Right now, I am using a custom build step in Visual Studio to call fxc.  I have to call fxc for each permutation (specifying the various #define), which is kind of annoying for shaders that have a lot of permutations.  On the other hand, the old "Effects" framework was not much better in that you had to define a technique for each permutation. 


Organizing Shader Programs

01 April 2013 - 04:26 PM

I'm trying to rework how I organize my shader programs.  I precompile my shaders at build time.  I would like to loop through all the compiled shader files in some folder and create the corresponding d3d11 shader interface (e.g., ID3D11ComputeShader).  However, given just the compiled shader bytecode, I don't think I can figure out the shader type, so I don't know whether to call CreateComputeShader, CreateVertexShader, etc. 

 

 

 

 

 

 


ComputeShader Particle System DispatchIndirect

27 March 2013 - 01:06 PM

So I finally got to implementing my CS particle system.  So I see that I can use the CopyStructureCount to copy the number of "alive" particles into a constant buffer and regular buffer (as the indirect argument buffer) for drawing. 

 

However, when it comes to dispatching thread groups, I need to use a formula like: NumThreadGroups = (NumAliveParticles + 255) / 256, where 256 is my thread group size.  This way I only dispatch as many thread groups as I actually need.  

 

However, I don't really see a way to do this without CPU intervention.  There is DispatchIndirect, but I only have NumAliveParticles in some d3d11 buffer, not the result of the calculation (NumAliveParticles + 255) / 256.

 

I noticed in Hieroglyph 3 ParticleStorm demo, he dispatches enough thread groups to handle the "maximum" particle count.  This will result in "empty" thread groups if the particle system is not near maximum capacity.  Is this a big deal or not?  I assume the GPU overhead is loading the thread group into the multiprocessor, doing a conditional statement to see if any work needs to be done.  If the thread group is "empty," all threads will have the same branch behavior in that no work needs to be done, and the thread group is done being processed.  So it seems pretty negligible.  But I wanted a 2nd opionion, and also to know if there is a way to do a calculation like (NumAliveParticles + 255) / 256 without CPU intervention. 


Multilayer Refraction

14 March 2013 - 05:50 PM

I have multiple overlapping water meshes that use a refraction map.  Right now my engine doesn't support this scenario.  Assuming the geometry does not intersect, one solution I thought of would be just to ping-pong the render target and refraction map, and render the overlapping water features in back-to-front order.  I've ping-ponged buffers before.  Would this be too expensive?  This seems kind of brute force, are there better solutions?

 


GPU Sort for Small List

08 March 2013 - 07:49 PM

I recently looked into GPU sorting algorithms as part of my goal for a compute GPU particle system.  I studied bitonic sort and the DXSDK implementation.  This looks good when I need to sort a lot of particles, but there is one case I have where I need sorted particles the list is relatively small < 100.  It is for fire, where I use video textures stored in a volume map and therefore do not need a lot of particles to get good results. 

 

I'm sure I can modify the bitonic sort to handle particle counts less than the thread group size.  But I'm wondering if I should just do a more bruteforce sort like brick sort (http://en.wikipedia.org/wiki/Odd%E2%80%93even_sort).

 

In either case, I'm not expecting huge gains from GPU implementation since the particle count will only run on one thread group in this particular case.  The goal is just to avoid CPU intervention. 


PARTNERS