• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Quat

ComputeShader Particle System DispatchIndirect

14 posts in this topic

So I finally got to implementing my CS particle system.  So I see that I can use the CopyStructureCount to copy the number of "alive" particles into a constant buffer and regular buffer (as the indirect argument buffer) for drawing. 

 

However, when it comes to dispatching thread groups, I need to use a formula like: NumThreadGroups = (NumAliveParticles + 255) / 256, where 256 is my thread group size.  This way I only dispatch as many thread groups as I actually need.  

 

However, I don't really see a way to do this without CPU intervention.  There is DispatchIndirect, but I only have NumAliveParticles in some d3d11 buffer, not the result of the calculation (NumAliveParticles + 255) / 256.

 

I noticed in Hieroglyph 3 ParticleStorm demo, he dispatches enough thread groups to handle the "maximum" particle count.  This will result in "empty" thread groups if the particle system is not near maximum capacity.  Is this a big deal or not?  I assume the GPU overhead is loading the thread group into the multiprocessor, doing a conditional statement to see if any work needs to be done.  If the thread group is "empty," all threads will have the same branch behavior in that no work needs to be done, and the thread group is done being processed.  So it seems pretty negligible.  But I wanted a 2nd opionion, and also to know if there is a way to do a calculation like (NumAliveParticles + 255) / 256 without CPU intervention. 

0

Share this post


Link to post
Share on other sites

Just run a very simple compute shader with 1 thread that reads the buffer count, calculates the number of thread groups needed to process that number of particle, and outputs it to a buffer. Then you can use that buffer as the args for a DispatchIndirect.

0

Share this post


Link to post
Share on other sites

I had originally tried reading back to the CPU what the particle count was, and then using that number to dispatch an appropriate amount of thread groups.  However, that was predictably slow, and I ended up coming to the solution that you mentioned (sending a fixed number of thread groups regardless of how many particles are present).

 

This solution works in particular for this example, since the particles have a fixed lifespan and can be reliably counted as dead after a certain time period.  The number of particles that are created are specifically throttled to ensure that this is true.  So after a startup period, there is always going to be nearly a full set of particles and there won't be any wasted thread groups anymore.

 

If you are able to have similar control on your particle system (i.e. you can reliably model the number of particles on the CPU side) then I would recommend the method used in ParticleStorm.  The method that MJP mentioned sounds like a good solution if you can't easily model the particles, and it only has a very small performance penalty of a single dispatch.  I would be interested to hear your results once you get it up and running though, and hear how your experience turns out.

0

Share this post


Link to post
Share on other sites

@MJP

 

I am having exactly the same problem. And I am trying to do what you suggested but I am still stuck since there is no example on the internet.

 

There are two things I do not unterstand yet.

 

1. What properties to set when creating the buffer for DispatchIndirect

2. I can imagine how to call the compute shader that calculates the thread groups but what then? The thread group size is stored in that buffer but how am I dispatching the actual compute shader with this information then?

 

It would be incredible helpful if you could provide some example.

0

Share this post


Link to post
Share on other sites

I got it finally working. Thank's a lot!

 

Yet I am stuck at the next similar problem. My particles are stored in a StructuredBuffer and when I am going to actually draw them I bind a SRV to the VertexShader and use the deviceContext->Draw(?,0) call.

 

Here I have the same problem as above. I don't know how many particles to draw on the CPU side since they are spawned and destroyed purely in my ComputeShaders.

 

I thought about using DrawAuto() but that requires the particles to be in a VertexBuffer. And I think I can't create UAV's of a VertexBuffer and manipulate it with the ComputeShaders.

0

Share this post


Link to post
Share on other sites

DrawInstancedIndirect will do what you want to do. Copy the SB size into an indirect args buffer and pass that to the indirect draw method. (I mean copying the size to the specific location of the arguments in the indirect args buffer you want. (control number of verts vs number of instances, etc.)

1

Share this post


Link to post
Share on other sites

Sorry for the late response, had a lot of things going on lately and no time to work on this project. Anyways...

 

I tried to use DrawInstancedIndirect with half success. I am not 100% sure what data has to be stored in ID3D11Buffer *pBufferForArgs.

 

 

I have created the buffer with no specific initial data and tried to copy the structure count with: 

m_pdevicecontext->CopyStructureCount(pbDrawIndirectArgs, 0, puavSimulationSateNew);

This draws nothing at all!

 

 

After that I tried to play a bit with the initial data of pbDrawIndirectArgs.

 

IndirectArgs indirectArgs;
indirectArgs._one = 0;
indirectArgs._two = 10;
indirectArgs._three = 0;
indirectArgs._four = 0;


D3D11_SUBRESOURCE_DATA InitData;
InitData.pSysMem = &indirectArgs;
InitData.SysMemPitch = 0;
InitData.SysMemSlicePitch = 0;


HRESULT result = m_pdevice->CreateBuffer(&bufferDesc, &InitData, &pbDrawIndirectArgs);

Now the strange thing happens. As soon as I set indirectArgs._two to anything but 0 it actually draws my particles.

 

After that I removed the CopyStructureCount call. And again I had a different behavior. Now the particles are blinking as if only a few at a time are drawn.

 

In conclusion I guess CopyStructureCount does actually work but only if i set indirectArgs._two to anything but 0. 

 

This totally confuses me and I have no idea why...

Edited by me_12
0

Share this post


Link to post
Share on other sites

The only resource that explains anything about that buffer structure is the book "Practical Rendering and Computation with Direct3D 11".

 

There it is something like:

 

Each of these numbers represent a 4 byte size:

0 = Alligned Byte Offset For Args (uint)
1 = Alligned Byte Offset For Args (uint)
2 = Alligned Byte Offset For Args (uint)
3 = Vertex Count Per Instance (uint)
4 = Instance Count (uint)
5 = Start Vertex Location (uint)
6 = Start Instance Location (uint)

0-2: Is space available for whatever data I want? Can this be expanded arbitrary? Is this the number of bytes I have to skip and can be used in  DrawInstancedIndirect as second parameter (AlignedByteOffsetForArgs)?

 

3: This must be 1 for me since I am drawing 1 vertex per particle and will create a billboard in the geometry shader.

 

4: I guess this is the actual number of particles that are drawn.

 

5, 6: Well no idea about these two. 

Edited by me_12
0

Share this post


Link to post
Share on other sites

Should be vertex count, instance count, 0,0 (startvertloc and start inst loc).

So lets imagine you have a quad and you want to instance if 10 times, your indirect args buffer should be 4, 10, 0, 0.

if you plan to expand each vert into quads, it would be 10, 1, 0, 0.

(At least that's what I remember off the top of my head, I can check tonight when I get home).

Inspecting the results of the buffer is more annoying than it should, NSight refuses to show it to me. However the VS2012 graphics debugger displays it no probs (finally something it does well :) ) or you can go the way of copying to staging buffer and displaying in your app.

2

Share this post


Link to post
Share on other sites

if you plan to expand each vert into quads, it would be 10, 1, 0, 0.

 

I guess that is a typo and you mean 1,10,0,0?

 

But still it does not explain why it does not work if I set the initial value to 1,0,0,0 and use CopyStructureCount to update the count...

 

Hm... alright I am going to install vs 2012. (Since I also was not able to figure the buffer results out via NSight)

0

Share this post


Link to post
Share on other sites

I guess that is a typo and you mean 1,10,0,0?

Both can work, depends how you setup your input layout wink.png

But still it does not explain why it does not work if I set the initial value to 1,0,0,0 and use CopyStructureCount to update the count

You're drawing 0 instances, so nothing gets drawn. If you want to change the second parameter with CopyStructureCount I think it should be:
[tt]
m_pdevicecontext->CopyStructureCount(pbDrawIndirectArgs, 4, puavSimulationSateNew);
[/tt]
2

Share this post


Link to post
Share on other sites

 

if you plan to expand each vert into quads, it would be 10, 1, 0, 0.

 

I guess that is a typo and you mean 1,10,0,0?

 

But still it does not explain why it does not work if I set the initial value to 1,0,0,0 and use CopyStructureCount to update the count...

 

Hm... alright I am going to install vs 2012. (Since I also was not able to figure the buffer results out via NSight)

 

Could world both ways: if you wanted 10 verts which you would expand in GS, it would be 10,1,0,0 ( 10verts -> 10 quads * 1 instance of the 10 ). Or you can use the HW instancing. I have noticed differences in performance when generating hundreds of thousands of quads, instancing being slightly slower than single instance with loads of expanded verts.

If you set initial value of 1,0,0,0, you are specifying 1 vertex, 0 instances. So as long as you update the second parameter with your structured count, it should work.

1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0