Jump to content

  • Log In with Google      Sign In   
  • Create Account

Million Particle System


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
25 replies to this topic

#1 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 14 March 2014 - 10:34 PM

Hi there!

 

My current particle system its based on the example: http://content.gpwiki.org/index.php/D3DBook:Dynamic_Particle_Systems

 

However I have one remaining doubt. During the development I noticed that the particles emission rate its capped at 128 per frame due the memory limits of the geometry stream output. However, I would like to implement a particle system able to handle millions of particles at the same time. This is possible with the current particle system, but I need to wait 130 seconds to reach the millions (using particles with 130 sec lifetime), which is not very optimal.

 

I was wondering if any of you knows a way to achieve this using the GPU.

 

Cheers,

José


Edited by neroziros, 15 March 2014 - 01:50 PM.


Sponsor:

#2 Jason Z   Crossbones+   -  Reputation: 5062

Like
2Likes
Like

Posted 15 March 2014 - 01:25 PM

I don't see any reference that you are pointing to, but can't you just execute the particle injection draw call more than once per frame?



#3 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 15 March 2014 - 01:53 PM

I don't see any reference that you are pointing to, but can't you just execute the particle injection draw call more than once per frame?

 

Sorry about that, just updated the post.

 

The problem its that the particle injection happens inside the geometry shader, so it has a limit of 1024b per emission (that its equal to 128 particles since each one have 8 components)

 

Cheers



#4 Jason Z   Crossbones+   -  Reputation: 5062

Like
6Likes
Like

Posted 15 March 2014 - 02:30 PM

Erm, I happen to be the author of that article :)

 

Like I mentioned above, just execute the injection shader more than once per frame.  Is there some limitation on doing that?



#5 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 15 March 2014 - 03:46 PM

Oh nice now I get what you mean. But in order to do that, wouldn't I need need to separate the Update logic of the particles from the Particle Emission? otherwise the particles will move too fast since I would be executing their update a lot of times per frame.

 

Thanks for the help and awesome article smile.png

 

PS: I tried to transfer the particle creation to the CPU using a 3rd buffer which appends the new particles to the streamOutput, however right now it doesn't work and I am not quite sure why. If anyone can check it out I would be more than grateful.

 

FX Code: https://github.com/neroziros/PSDX11/blob/master/ParticleEffect.fx

Class: https://github.com/neroziros/PSDX11/blob/master/Particleclass.cpp


Edited by neroziros, 15 March 2014 - 03:51 PM.


#6 neroziros   Members   -  Reputation: 218

Like
1Likes
Like

Posted 16 March 2014 - 05:03 PM

I believe the main problem its here, in the mapping procedure:

Class: https://github.com/neroziros/PSDX11/blob/master/Particleclass.cpp

 

I hope my code can also help anyone else interested in building a million particle system based on the GPU :) Cheers

    // CPU PARTICLE CREATION
    // Add new particles if they were created
    if (!newParticlesCreated)
    {
        // Map to the buffer and render the new particles to add them to the pipeline
        D3D11_MAPPED_SUBRESOURCE mappedResource;
        PARTICLE_VERTEX* dataPtr;


        m_D3D->GetDeviceContext()->Map(g_pNewParticles, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
        dataPtr = (PARTICLE_VERTEX*)mappedResource.pData;


        // Copy the data into the vertex buffer.
        memcpy(dataPtr, (void*)newParticlesArr, (sizeof(PARTICLE_VERTEX)* emissionRate));


        // Unmap the buffer
        m_D3D->GetDeviceContext()->Unmap(g_pNewParticles, 0);


        // Set the buffer as the input
        pBuffers[0] = g_pNewParticles;
        UINT stride[1] = { sizeof(PARTICLE_VERTEX) };
        UINT offset[1] = { 0 };
        m_D3D->GetDeviceContext()->IASetVertexBuffers(0, 1, pBuffers, stride, offset);
        m_D3D->GetDeviceContext()->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);


        // Point to the correct output buffer
        pBuffers[0] = g_pParticleStreamTo;
        m_D3D->GetDeviceContext()->SOSetTargets(1, pBuffers, offset);


        // Draw the buffers
        D3DX11_TECHNIQUE_DESC techDesc;
        g_pAdvanceParticles->GetDesc(&techDesc);
        for (UINT p = 0; p < techDesc.Passes; ++p)
        {
            g_pAdvanceParticles->GetPassByIndex(p)->Apply(0, m_D3D->GetDeviceContext());
            m_D3D->GetDeviceContext()->DrawAuto();
        }


        // Get back to normal
        pBuffers[0] = NULL;
        m_D3D->GetDeviceContext()->SOSetTargets(1, pBuffers, offset);


        // Free the particle array memory
        delete (newParticlesArr);


        // Reset variables
        newParticlesCreated = false;
    }


#7 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 16 March 2014 - 09:12 PM

Hi there, I managed to do as suggested and separated the emission logic from the update logic. Feel free to check the code here:

https://github.com/neroziros/PARTICLESYSTEM/

 

Now the problem its that I can only reach around 460800 particles before the fps start to drop too low (around 25-35 fps). I have a Phenom II X6 & NVIDIA 560GTX.

 

This is probably an optimization problem, any suggestion its welcomed.

 

Cheers!



#8 MJP   Moderators   -  Reputation: 11343

Like
1Likes
Like

Posted 17 March 2014 - 11:05 AM

Are you pursuing this technique because you're stuck on DX10-era hardware? Because this sort of thing can be done more efficiently using compute shaders.



#9 phantom   Moderators   -  Reputation: 7268

Like
1Likes
Like

Posted 17 March 2014 - 11:27 AM

And just to bring up an important point; I don't think you REALLY want to spawn 1,000,000 particles from one emitter in one frame... if nothing else it'll be bloody slow to do and you can do a lot with surprisingly few particles per emitter.

However if you DO want to do this, because you want an effect which has a lot of particles, then I suggest creating them at load time either on the CPU or via a one off GPU task instead of trying to spawn them like normal particles.

(As to why you don't want 1,000,000 particles consider this; a 1920*1200 screen only has 2,304,00 pixel on it so at 1million particles each one would be filling a little over 2 pixels on average. )

#10 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 17 March 2014 - 01:10 PM

@phantom: hmm I hadnt considered that. Though I am ok with just one million particles per simulation, no need for the 60.000.000 particles per second :P

@mjp: actually I must use DX11 ( I plan on using tesselation later for better particle lighting ). I thought that the compute shader performance was similar to the geometry shader's one. But if I am wrong I will gladly check that option

Thanka for the advice! Will read about compute shaders then :)

#11 phantom   Moderators   -  Reputation: 7268

Like
2Likes
Like

Posted 17 March 2014 - 01:39 PM

Geo-shaders are basically the shader stage best avoided if you can; by their nature they tend to serialise the hardware a bit.

For a particle system you'd be better off using compute shaders as you can pre-size your buffers and then have them read from one buffer and write to the other. You also only need one shader stage to be invoked; usage of the geo-shader implies a VS must be run first even as a 'pass-thru' due to how the logical software pipeline is arranged - a compute shader wouldn't have this.

That's not to say compute doesn't come with it's own set of potential pitfalls but it is better suited to this task smile.png

#12 JohnnyCode   Members   -  Reputation: 217

Like
0Likes
Like

Posted 17 March 2014 - 05:33 PM

Since you are unlikely to set 1 milion of particles positions explicitly, I gess you have a function for their positions upon a scalar or two. If this fact is true, you could get away with a 1 milion quads mesh, where each quad is over the other one with a certain distance - let's say in z direction. This would mean that each quad vertex equals in x,y but differs in z. Thus, z can be a value for your procedural funtion, along with some other factors (time, seed...) . The buffer for this 1 milion "quad pillar" would be static, not altered itself and you would process the verticies of it, upon the vertex z value and other factors. Imagine you could easily strip them like cards just by doing (x,y+z*10.0,z). This solution of course limits you to only procedural positioning. Usualy, there are fewer than 50 particles used.



#13 kalle_h   Members   -  Reputation: 1387

Like
1Likes
Like

Posted 17 March 2014 - 06:46 PM

There are actual use cases for million particle systems and they have been feasible many years already. 5 years old blog post about how to do it with dx9. http://directtovideo.wordpress.com/2009/10/06/a-thoroughly-modern-particle-system/

 

With modern api and clever coding it should not be any problem.



#14 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 17 March 2014 - 09:01 PM

@JohnnyCode: Thanks for the suggestion! Though I should have mentioned earlier that I am also looking forward making an interactive particle system (both with itself and the environment) so I cannot use procedural algorithms and must rely on State preserving PS

 

@Kalle_h: heh actually using textures to handle the particles was one of my first ideas to make the particle system, but the lack of examples and the fear that the dynamic particle creation and death would be too complex with that scheme made turn down the idea.  It is an excellent way to make a PS with a set amount of particles though, thanks!

 

I will keep looking into the directcompute idea, as before, I will post my results here so anyone else interested on making a highly interactive particle system with millions of particles can use it smile.png

 

Cheers


Edited by neroziros, 19 March 2014 - 10:21 AM.


#15 Jason Z   Crossbones+   -  Reputation: 5062

Like
0Likes
Like

Posted 18 March 2014 - 08:19 PM

In general, if you are going to be using DX11 then you really should consider the compute shader.  The article that you are referencing is actually from before DX11 was even released, so it wouldn't have considered compute shaders at all.

 

If you are interested in trying it out quickly, the Hieroglyph 3 framework has a ParticleStorm application in it that shows the basic pieces needed.



#16 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 31 March 2014 - 10:05 PM

Hi all!

 

I have been following the Hieroglyph example. And though I am suffering with the CPU side of the work, I think I am moving forward :D

 

I have a question: I have already created the append and consume buffers

 

BUFFER CREATION

// Create consume and append buffer
D3D11_BUFFER_DESC desc;
desc.ByteWidth = sizeof(PARTICLE_VERTEX) * maxParticles;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS;
desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
desc.StructureByteStride = sizeof(PARTICLE_VERTEX);
desc.Usage = D3D11_USAGE_DEFAULT;
desc.CPUAccessFlags = 0;


result = device->CreateBuffer(&desc, NULL, &appendBuffer);
if (FAILED(result))return false;
result = device->CreateBuffer(&desc, NULL, &consumeBuffer);
if (FAILED(result))return false;

However I am not sure how to send them to the shader. I already append one during the Particle insertion step, but I am not sure how to append both of them at the same time during the Update step.

 

 

SPAWN PARTICLES FUNCTION

// Create new particles (through the compute shader)
void Particleclass::SpawnNewParticles(float elapsedSeconds)
{
// Control variable
HRESULT hr;


// Update timer
newParticlesTimer += elapsedSeconds;


// Check if spawn must happen
if (newParticlesTimer >= spawnParticlesInterval)
{
// Set Compute Shader
m_D3D->GetDeviceContext()->CSSetShader(cs_pInsertParticles, nullptr, 0);


// Create this frame data
SpawnConstantBuffer c;


// Set emmitter location
c.emmiterPos = position;


// Create random vector
static const float scale = 2.0f; // Random Variance
float fRandomX = ((float)rand() / (float)RAND_MAX * scale - scale / 2.0f);
float fRandomY = ((float)rand() / (float)RAND_MAX * scale - scale / 2.0f);
float fRandomZ = ((float)rand() / (float)RAND_MAX * scale - scale / 2.0f);
D3DXVECTOR3 normalized = D3DXVECTOR3(fRandomX, fRandomY, fRandomZ);
// Normalize the random vector
float magnitude = (float)sqrt(normalized.x * normalized.x + normalized.y * normalized.y + normalized.z * normalized.z);
if (magnitude == 0.0)magnitude = 0.000001f;
normalized = D3DXVECTOR3(normalized.x / magnitude, normalized.y / magnitude, normalized.z / magnitude);
// Set random vector
c.randomVector = D3DXVECTOR3(normalized.x, normalized.y, normalized.z);


// Copy the new vector and position in the due buffer 
D3D11_MAPPED_SUBRESOURCE mapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pInsertCB, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
// Error Check
if (FAILED(hr)){
std::stringstream stream; stream << "Error Code::" << HRESULT_CODE(hr); MessageBox(NULL, stream.str().c_str(), "InsertCS Buffer Mapping Error", MB_OK); return;
}
memcpy_s(mapped.pData, sizeof(SpawnConstantBuffer), &c, sizeof(c));
m_D3D->GetDeviceContext()->Unmap(cs_pInsertCB, 0);

// Send the updated vector to the CS shader
m_D3D->GetDeviceContext()->CSSetConstantBuffers(0, 1,&cs_pInsertCB);

// Set append buffer
UINT counts[1]; counts[0] = -1;
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 1, &prevState, counts);

// Spawn New Particles
m_D3D->GetDeviceContext()->Dispatch(1,1,1);

// Reset timer
newParticlesTimer = 0;
}
}


#17 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 03 April 2014 - 09:03 PM

Problem solved! To add both buffer you must do:

UINT counters[2] = { mNumElements, 0 };
ID3D11UnorderedAccessView* uavs[2] = { prevState, currentState };
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 2, uavs, counters);

Tho I am not sure if  mNumElements is the current amount of particles in the system OR the max amount of particles that can be in the system.

 

Cheers


Edited by neroziros, 04 April 2014 - 10:25 AM.


#18 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 05 April 2014 - 11:42 AM

Ok, I am almost done with the Compute Shader step. However, I am having one last problem that I can't understand.

 

For some reason, the particle count stays at 8 if I dispatch the UpdateCS. The only explanation I can think of its that the compute shader is killing all the particles created in the InsertCS shader. ( If I don't execute the Update CS, the particle count goes up as it should)

 

Here is the Update function which calls the Update CS

// Update the particle system (advance the particle system)void Particleclass::Update(float elapsedSeconds)
{
// Control variable
HRESULT hr;


// Update total elapsed time
m_TotalTimeElapsed += elapsedSeconds*1000; // Miliseconds, so the RNG functions inside the GPU get diverse values


// Create new particles if needed
SpawnNewParticles(elapsedSeconds);


// Get current ammount of particles in the InputBuffer
RefreshCurrentParticleAmount();


// If there are zero particles, don't execute the updater
if (mNumElements <= 0) return;


// Update particles
// Set Compute Shader
m_D3D->GetDeviceContext()->CSSetShader(cs_pUpdateParticles, nullptr, 0);


// Create this frame data
SimulationParameters s;
s.EmitterLocation = position;
s.ParticlesLifeTime = lifeTime;
s.TimeFactors = 0;


// Pass the new frame data to the shader
D3D11_MAPPED_SUBRESOURCE UMapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pUpdateCB, 0, D3D11_MAP_WRITE_DISCARD, 0, &UMapped);
memcpy_s(UMapped.pData, sizeof(SpawnConstantBuffer), &s, sizeof(s));
m_D3D->GetDeviceContext()->Unmap(cs_pUpdateCB, 0);


// Send this frame particle amount to the Update CS
D3D11_MAPPED_SUBRESOURCE PCMapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pParticleCount, 0, D3D11_MAP_WRITE_DISCARD, 0, &PCMapped);
memcpy_s(PCMapped.pData, sizeof(UINT), &mNumElements, sizeof(mNumElements));
m_D3D->GetDeviceContext()->Unmap(cs_pParticleCount, 0);


// Send the updated constant buffers to the CS shader
ID3D11Buffer *Buffers[2] = {cs_pUpdateCB, cs_pParticleCount};
m_D3D->GetDeviceContext()->CSSetConstantBuffers(0,2,Buffers);


// Set append and consume buffer
UINT counters[2] = {0, mNumElements };
ID3D11UnorderedAccessView* uavs[2] = { OutputState, InputState };
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 2, uavs, counters);


// Dispatch the particles's updater 
m_D3D->GetDeviceContext()->Dispatch(maxParticles / 512, 1, 1);


// Swap the two buffers in between frames to allow multithreaded access
// during the rendering phase for the particle buffers.
ID3D11UnorderedAccessView *TempState = InputState;
InputState = OutputState;
OutputState = InputState;
}
And this is the UpdateCS
//-----------------------------------------------------------------------------
// Compute shader (Hieroglyph based)
//-----------------------------------------------------------------------------
// Particle Structure (relevant for the simulation)
struct Particle 
{
float3 position;
float3 velocity;
float  time;
};


cbuffer SimulationParameters : register(b0)
{
float4 TimeFactors;
float4 EmitterLocation;
float ParticlesLifeTime;
};


cbuffer ParticleCount : register(b1)
{
uint4 NumParticles;
};




// Compute shaders buffers (Entry Buffer and Output buffer)
AppendStructuredBuffer<Particle> NewSimulationState : register(u0);
ConsumeStructuredBuffer<Particle>   CurrentSimulationState  : register(u1);


[numthreads(512, 1, 1)]
void CSMAIN(uint3 DispatchThreadID : SV_DispatchThreadID)
{
// Check for if this thread should run or not.
uint myID = DispatchThreadID.x + DispatchThreadID.y * 512 + DispatchThreadID.z * 512 * 512;


// The statement must check if there are no more particles than it should
if (myID < NumParticles.x)
{
// Get the current particle
Particle p = CurrentSimulationState.Consume();


// Calculate the new position, accounting for the new velocity value
// over the current time step.
p.position += p.velocity * TimeFactors.x;


// Update the life time left for the particle.
p.time = p.time + TimeFactors.x;


// Only keep the particle alive IF its life time has not expired
if (p.time < ParticlesLifeTime)
{
NewSimulationState.Append(p);
}
}
}

I suspect the problem is in either the  "cs_pParticleCount" -> "cbuffer ParticleCount" mapping or when I send the NumParticles's UAV reference to the Compute Shader but I am really not sure (if the NumParticles is not being received in the GPU, then it will stay at zero and it will kill all the particles) . Any help is really appreciated!


Edited by neroziros, 06 April 2014 - 08:33 PM.


#19 neroziros   Members   -  Reputation: 218

Like
0Likes
Like

Posted 07 April 2014 - 09:18 PM

Hmm it seems that the problem is in the buffer swapping. Since I tried not executing the Update compute shader and the particle amount still went up. (If the swap was correct, then the empty outputstate should have overwritten the inputstate, and the amount of particles should had stayed at 8)

// Swap the two buffers in between frames to allow multithreaded access
// during the rendering phase for the particle buffers.
ID3D11UnorderedAccessView *TempState = OutputState; 
OutputState = InputState;
InputState = TempState;

Any idea about what I am doing wrong?

 

Cheers.



#20 Jason Z   Crossbones+   -  Reputation: 5062

Like
0Likes
Like

Posted 08 April 2014 - 06:08 PM

So you didn't do an update, but the particle count still increased?  How could that be?  I couldn't see in your code above where the number of particles is calculated - are you reading it back from the buffers with CopyStructureCount somewhere?






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS