# Million Particle System

## Recommended Posts

neroziros    234

Hi there!

My current particle system its based on the example: http://content.gpwiki.org/index.php/D3DBook:Dynamic_Particle_Systems

However I have one remaining doubt. During the development I noticed that the particles emission rate its capped at 128 per frame due the memory limits of the geometry stream output. However, I would like to implement a particle system able to handle millions of particles at the same time. This is possible with the current particle system, but I need to wait 130 seconds to reach the millions (using particles with 130 sec lifetime), which is not very optimal.

I was wondering if any of you knows a way to achieve this using the GPU.

Cheers,

José

Edited by neroziros

##### Share on other sites
Jason Z    6434

I don't see any reference that you are pointing to, but can't you just execute the particle injection draw call more than once per frame?

##### Share on other sites
neroziros    234

I don't see any reference that you are pointing to, but can't you just execute the particle injection draw call more than once per frame?

Sorry about that, just updated the post.

The problem its that the particle injection happens inside the geometry shader, so it has a limit of 1024b per emission (that its equal to 128 particles since each one have 8 components)

Cheers

##### Share on other sites
neroziros    234

Oh nice now I get what you mean. But in order to do that, wouldn't I need need to separate the Update logic of the particles from the Particle Emission? otherwise the particles will move too fast since I would be executing their update a lot of times per frame.

Thanks for the help and awesome article

PS: I tried to transfer the particle creation to the CPU using a 3rd buffer which appends the new particles to the streamOutput, however right now it doesn't work and I am not quite sure why. If anyone can check it out I would be more than grateful.

Edited by neroziros

##### Share on other sites
neroziros    234

I believe the main problem its here, in the mapping procedure:

I hope my code can also help anyone else interested in building a million particle system based on the GPU :) Cheers

    // CPU PARTICLE CREATION
// Add new particles if they were created
if (!newParticlesCreated)
{
// Map to the buffer and render the new particles to add them to the pipeline
D3D11_MAPPED_SUBRESOURCE mappedResource;
PARTICLE_VERTEX* dataPtr;

dataPtr = (PARTICLE_VERTEX*)mappedResource.pData;

// Copy the data into the vertex buffer.
memcpy(dataPtr, (void*)newParticlesArr, (sizeof(PARTICLE_VERTEX)* emissionRate));

// Unmap the buffer
m_D3D->GetDeviceContext()->Unmap(g_pNewParticles, 0);

// Set the buffer as the input
pBuffers[0] = g_pNewParticles;
UINT stride[1] = { sizeof(PARTICLE_VERTEX) };
UINT offset[1] = { 0 };
m_D3D->GetDeviceContext()->IASetVertexBuffers(0, 1, pBuffers, stride, offset);
m_D3D->GetDeviceContext()->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);

// Point to the correct output buffer
pBuffers[0] = g_pParticleStreamTo;
m_D3D->GetDeviceContext()->SOSetTargets(1, pBuffers, offset);

// Draw the buffers
D3DX11_TECHNIQUE_DESC techDesc;
for (UINT p = 0; p < techDesc.Passes; ++p)
{
m_D3D->GetDeviceContext()->DrawAuto();
}

// Get back to normal
pBuffers[0] = NULL;
m_D3D->GetDeviceContext()->SOSetTargets(1, pBuffers, offset);

// Free the particle array memory
delete (newParticlesArr);

// Reset variables
newParticlesCreated = false;
}

##### Share on other sites
neroziros    234

Hi there, I managed to do as suggested and separated the emission logic from the update logic. Feel free to check the code here:

https://github.com/neroziros/PARTICLESYSTEM/

Now the problem its that I can only reach around 460800 particles before the fps start to drop too low (around 25-35 fps). I have a Phenom II X6 & NVIDIA 560GTX.

This is probably an optimization problem, any suggestion its welcomed.

Cheers!

##### Share on other sites
MJP    19753

Are you pursuing this technique because you're stuck on DX10-era hardware? Because this sort of thing can be done more efficiently using compute shaders.

##### Share on other sites
_the_phantom_    11250
And just to bring up an important point; I don't think you REALLY want to spawn 1,000,000 particles from one emitter in one frame... if nothing else it'll be bloody slow to do and you can do a lot with surprisingly few particles per emitter.

However if you DO want to do this, because you want an effect which has a lot of particles, then I suggest creating them at load time either on the CPU or via a one off GPU task instead of trying to spawn them like normal particles.

(As to why you don't want 1,000,000 particles consider this; a 1920*1200 screen only has 2,304,00 pixel on it so at 1million particles each one would be filling a little over 2 pixels on average. )

##### Share on other sites
neroziros    234
@phantom: hmm I hadnt considered that. Though I am ok with just one million particles per simulation, no need for the 60.000.000 particles per second :P

@mjp: actually I must use DX11 ( I plan on using tesselation later for better particle lighting ). I thought that the compute shader performance was similar to the geometry shader's one. But if I am wrong I will gladly check that option

##### Share on other sites
_the_phantom_    11250
Geo-shaders are basically the shader stage best avoided if you can; by their nature they tend to serialise the hardware a bit.

For a particle system you'd be better off using compute shaders as you can pre-size your buffers and then have them read from one buffer and write to the other. You also only need one shader stage to be invoked; usage of the geo-shader implies a VS must be run first even as a 'pass-thru' due to how the logical software pipeline is arranged - a compute shader wouldn't have this.

That's not to say compute doesn't come with it's own set of potential pitfalls but it is better suited to this task

##### Share on other sites
JohnnyCode    1046

Since you are unlikely to set 1 milion of particles positions explicitly, I gess you have a function for their positions upon a scalar or two. If this fact is true, you could get away with a 1 milion quads mesh, where each quad is over the other one with a certain distance - let's say in z direction. This would mean that each quad vertex equals in x,y but differs in z. Thus, z can be a value for your procedural funtion, along with some other factors (time, seed...) . The buffer for this 1 milion "quad pillar" would be static, not altered itself and you would process the verticies of it, upon the vertex z value and other factors. Imagine you could easily strip them like cards just by doing (x,y+z*10.0,z). This solution of course limits you to only procedural positioning. Usualy, there are fewer than 50 particles used.

##### Share on other sites
kalle_h    2464

There are actual use cases for million particle systems and they have been feasible many years already. 5 years old blog post about how to do it with dx9. http://directtovideo.wordpress.com/2009/10/06/a-thoroughly-modern-particle-system/

With modern api and clever coding it should not be any problem.

##### Share on other sites
neroziros    234

@JohnnyCode: Thanks for the suggestion! Though I should have mentioned earlier that I am also looking forward making an interactive particle system (both with itself and the environment) so I cannot use procedural algorithms and must rely on State preserving PS

@Kalle_h: heh actually using textures to handle the particles was one of my first ideas to make the particle system, but the lack of examples and the fear that the dynamic particle creation and death would be too complex with that scheme made turn down the idea.  It is an excellent way to make a PS with a set amount of particles though, thanks!

I will keep looking into the directcompute idea, as before, I will post my results here so anyone else interested on making a highly interactive particle system with millions of particles can use it

Cheers

Edited by neroziros

##### Share on other sites
Jason Z    6434

In general, if you are going to be using DX11 then you really should consider the compute shader.  The article that you are referencing is actually from before DX11 was even released, so it wouldn't have considered compute shaders at all.

If you are interested in trying it out quickly, the Hieroglyph 3 framework has a ParticleStorm application in it that shows the basic pieces needed.

##### Share on other sites
neroziros    234

Hi all!

I have been following the Hieroglyph example. And though I am suffering with the CPU side of the work, I think I am moving forward :D

I have a question: I have already created the append and consume buffers

BUFFER CREATION

// Create consume and append buffer
D3D11_BUFFER_DESC desc;
desc.ByteWidth = sizeof(PARTICLE_VERTEX) * maxParticles;
desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
desc.StructureByteStride = sizeof(PARTICLE_VERTEX);
desc.Usage = D3D11_USAGE_DEFAULT;
desc.CPUAccessFlags = 0;

result = device->CreateBuffer(&desc, NULL, &appendBuffer);
if (FAILED(result))return false;
result = device->CreateBuffer(&desc, NULL, &consumeBuffer);
if (FAILED(result))return false;

However I am not sure how to send them to the shader. I already append one during the Particle insertion step, but I am not sure how to append both of them at the same time during the Update step.

SPAWN PARTICLES FUNCTION

// Create new particles (through the compute shader)
void Particleclass::SpawnNewParticles(float elapsedSeconds)
{
// Control variable
HRESULT hr;

// Update timer
newParticlesTimer += elapsedSeconds;

// Check if spawn must happen
if (newParticlesTimer >= spawnParticlesInterval)
{

// Create this frame data
SpawnConstantBuffer c;

// Set emmitter location
c.emmiterPos = position;

// Create random vector
static const float scale = 2.0f; // Random Variance
float fRandomX = ((float)rand() / (float)RAND_MAX * scale - scale / 2.0f);
float fRandomY = ((float)rand() / (float)RAND_MAX * scale - scale / 2.0f);
float fRandomZ = ((float)rand() / (float)RAND_MAX * scale - scale / 2.0f);
D3DXVECTOR3 normalized = D3DXVECTOR3(fRandomX, fRandomY, fRandomZ);
// Normalize the random vector
float magnitude = (float)sqrt(normalized.x * normalized.x + normalized.y * normalized.y + normalized.z * normalized.z);
if (magnitude == 0.0)magnitude = 0.000001f;
normalized = D3DXVECTOR3(normalized.x / magnitude, normalized.y / magnitude, normalized.z / magnitude);
// Set random vector
c.randomVector = D3DXVECTOR3(normalized.x, normalized.y, normalized.z);

// Copy the new vector and position in the due buffer
D3D11_MAPPED_SUBRESOURCE mapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pInsertCB, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
// Error Check
if (FAILED(hr)){
std::stringstream stream; stream << "Error Code::" << HRESULT_CODE(hr); MessageBox(NULL, stream.str().c_str(), "InsertCS Buffer Mapping Error", MB_OK); return;
}
memcpy_s(mapped.pData, sizeof(SpawnConstantBuffer), &c, sizeof(c));
m_D3D->GetDeviceContext()->Unmap(cs_pInsertCB, 0);

// Send the updated vector to the CS shader
m_D3D->GetDeviceContext()->CSSetConstantBuffers(0, 1,&cs_pInsertCB);

// Set append buffer
UINT counts[1]; counts[0] = -1;
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 1, &prevState, counts);

// Spawn New Particles
m_D3D->GetDeviceContext()->Dispatch(1,1,1);

// Reset timer
newParticlesTimer = 0;
}
}

##### Share on other sites
neroziros    234

Problem solved! To add both buffer you must do:

UINT counters[2] = { mNumElements, 0 };
ID3D11UnorderedAccessView* uavs[2] = { prevState, currentState };
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 2, uavs, counters);

Tho I am not sure if  mNumElements is the current amount of particles in the system OR the max amount of particles that can be in the system.

Cheers

Edited by neroziros

##### Share on other sites
neroziros    234

Ok, I am almost done with the Compute Shader step. However, I am having one last problem that I can't understand.

For some reason, the particle count stays at 8 if I dispatch the UpdateCS. The only explanation I can think of its that the compute shader is killing all the particles created in the InsertCS shader. ( If I don't execute the Update CS, the particle count goes up as it should)

Here is the Update function which calls the Update CS

// Update the particle system (advance the particle system)void Particleclass::Update(float elapsedSeconds)
{
// Control variable
HRESULT hr;

// Update total elapsed time
m_TotalTimeElapsed += elapsedSeconds*1000; // Miliseconds, so the RNG functions inside the GPU get diverse values

// Create new particles if needed
SpawnNewParticles(elapsedSeconds);

// Get current ammount of particles in the InputBuffer
RefreshCurrentParticleAmount();

// If there are zero particles, don't execute the updater
if (mNumElements <= 0) return;

// Update particles

// Create this frame data
SimulationParameters s;
s.EmitterLocation = position;
s.TimeFactors = 0;

// Pass the new frame data to the shader
D3D11_MAPPED_SUBRESOURCE UMapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pUpdateCB, 0, D3D11_MAP_WRITE_DISCARD, 0, &UMapped);
memcpy_s(UMapped.pData, sizeof(SpawnConstantBuffer), &s, sizeof(s));
m_D3D->GetDeviceContext()->Unmap(cs_pUpdateCB, 0);

// Send this frame particle amount to the Update CS
D3D11_MAPPED_SUBRESOURCE PCMapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pParticleCount, 0, D3D11_MAP_WRITE_DISCARD, 0, &PCMapped);
memcpy_s(PCMapped.pData, sizeof(UINT), &mNumElements, sizeof(mNumElements));
m_D3D->GetDeviceContext()->Unmap(cs_pParticleCount, 0);

// Send the updated constant buffers to the CS shader
ID3D11Buffer *Buffers[2] = {cs_pUpdateCB, cs_pParticleCount};
m_D3D->GetDeviceContext()->CSSetConstantBuffers(0,2,Buffers);

// Set append and consume buffer
UINT counters[2] = {0, mNumElements };
ID3D11UnorderedAccessView* uavs[2] = { OutputState, InputState };
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 2, uavs, counters);

// Dispatch the particles's updater
m_D3D->GetDeviceContext()->Dispatch(maxParticles / 512, 1, 1);

// Swap the two buffers in between frames to allow multithreaded access
// during the rendering phase for the particle buffers.
ID3D11UnorderedAccessView *TempState = InputState;
InputState = OutputState;
OutputState = InputState;
}
And this is the UpdateCS
//-----------------------------------------------------------------------------
//-----------------------------------------------------------------------------
// Particle Structure (relevant for the simulation)
struct Particle
{
float3 position;
float3 velocity;
float  time;
};

cbuffer SimulationParameters : register(b0)
{
float4 TimeFactors;
float4 EmitterLocation;
};

cbuffer ParticleCount : register(b1)
{
uint4 NumParticles;
};

// Compute shaders buffers (Entry Buffer and Output buffer)
AppendStructuredBuffer<Particle> NewSimulationState : register(u0);
ConsumeStructuredBuffer<Particle>   CurrentSimulationState  : register(u1);

{
// Check for if this thread should run or not.

// The statement must check if there are no more particles than it should
if (myID < NumParticles.x)
{
// Get the current particle
Particle p = CurrentSimulationState.Consume();

// Calculate the new position, accounting for the new velocity value
// over the current time step.
p.position += p.velocity * TimeFactors.x;

// Update the life time left for the particle.
p.time = p.time + TimeFactors.x;

// Only keep the particle alive IF its life time has not expired
{
NewSimulationState.Append(p);
}
}
}

I suspect the problem is in either the  "cs_pParticleCount" -> "cbuffer ParticleCount" mapping or when I send the NumParticles's UAV reference to the Compute Shader but I am really not sure (if the NumParticles is not being received in the GPU, then it will stay at zero and it will kill all the particles) . Any help is really appreciated!

Edited by neroziros

##### Share on other sites
neroziros    234

Hmm it seems that the problem is in the buffer swapping. Since I tried not executing the Update compute shader and the particle amount still went up. (If the swap was correct, then the empty outputstate should have overwritten the inputstate, and the amount of particles should had stayed at 8)

// Swap the two buffers in between frames to allow multithreaded access
// during the rendering phase for the particle buffers.
ID3D11UnorderedAccessView *TempState = OutputState;
OutputState = InputState;
InputState = TempState;

Any idea about what I am doing wrong?

Cheers.

##### Share on other sites
Jason Z    6434

So you didn't do an update, but the particle count still increased?  How could that be?  I couldn't see in your code above where the number of particles is calculated - are you reading it back from the buffers with CopyStructureCount somewhere?

##### Share on other sites
neroziros    234

Hi Jason,

Yes, I am reading it here:

// Get current amount of particles in the InputBuffer
RefreshCurrentParticleAmount();

And the function per se is

// Refresh the current amount of particles in the system
void Particleclass::RefreshCurrentParticleAmount()
{
m_D3D->GetDeviceContext()->CopyStructureCount(ParticleCountSTBuffer, 0, InputState);
D3D11_MAPPED_SUBRESOURCE _subresource;
// Transfer the current amount of particles to a local variable
unsigned int* pCount = (unsigned int*)(_subresource.pData);
mNumElements = 0;
for (int i = 0; i < 8; i++)
mNumElements += pCount[i];
m_D3D->GetDeviceContext()->Unmap(ParticleCountSTBuffer, 0);}

##### Share on other sites
neroziros    234

Nvm the swap is actually correct, I was mistaken in my assumption: if the UpdateCS isn't executed, then the particles aren't consumed, and what I am actually doing is splitting the particle insertion between the two buffers ( I checked and with the buffer swap, the particles number seems to go up half the speed than without the particle swap).

In the other hand, I can finally pinpoint the real problem with my update! As initially guessed the constant buffer is not being mapped properly. To test that, in the compute shader I tried replacing the line "if (myID < NumParticles)" for "if (myID < 8)" and the particles got properly consumed. This confirm my guess because if the NumParticles is not properly mapped, then its value is probably 0 inside the shader, and the particles never go inside the conditional and are never consumed.

//-----------------------------------------------------------------------------
//-----------------------------------------------------------------------------
// Particle Structure (relevant for the simulation)
struct Particle
{
float3 position;
float3 velocity;
float  time;
};

cbuffer SimulationParameters : register(b0)
{
float4 TimeFactors;
float4 EmitterLocation;
uint NumParticles;
};

// Compute shaders buffers (Entry Buffer and Output buffer)
AppendStructuredBuffer<Particle> NewSimulationState : register(u0);
ConsumeStructuredBuffer<Particle>   CurrentSimulationState  : register(u1);

{
// Check for if this thread should run or not.

// The statement must check if there are no more particles than it should
if (myID < NumParticles)
{
// Get the current particle
Particle p = CurrentSimulationState.Consume();

// Calculate the new position, accounting for the new velocity value
// over the current time step.
p.position += p.velocity * TimeFactors.x;

// Update the life time left for the particle.
p.time = p.time + TimeFactors.x;

// Only keep the particle alive IF its life time has not expired
{
NewSimulationState.Append(p);
}
}
}


And the code related to the mapping process:

Map Buffer Structure

// For aligning to float4 boundaries
#define Float4Align __declspec(align(16))

struct SimulationParameters
{
Float4Align float TimeFactors;
Float4Align D3DXVECTOR3 EmitterLocation;
Float4Align UINT NumParticles;
};

Buffer Creation

// Constant buffers (UPDATE)
m_desc.ByteWidth = sizeof(SimulationParameters);
m_desc.Usage = D3D11_USAGE_DYNAMIC;
m_desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
m_desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
m_desc.MiscFlags = 0;
m_desc.StructureByteStride = 0;
result = device->CreateBuffer(&m_desc, NULL, &cs_pUpdateCB);
if (FAILED(result))return false;

Mapping

// Create this frame data
SimulationParameters s;
s.EmitterLocation = position;
s.TimeFactors = elapsedSeconds/1000;
s.NumParticles = currentParticles;

// Pass the new frame data to the shader
D3D11_MAPPED_SUBRESOURCE UMapped;
hr = m_D3D->GetDeviceContext()->Map(cs_pUpdateCB, 0, D3D11_MAP_WRITE_DISCARD, 0, &UMapped);
memcpy_s(UMapped.pData, sizeof(SimulationParameters), &s, sizeof(s));
m_D3D->GetDeviceContext()->Unmap(cs_pUpdateCB, 0);

// Send the updated constant buffers to the CS shader
m_D3D->GetDeviceContext()->CSSetConstantBuffers(0, 1, &cs_pUpdateCB);

// Set append and consume buffer
UINT counters[2] = { -1, -1 };
ID3D11UnorderedAccessView* uavs[2] = { OutputState, InputState };
m_D3D->GetDeviceContext()->CSSetUnorderedAccessViews(0, 2, uavs, counters);

// Dispatch the particles's updater
m_D3D->GetDeviceContext()->Dispatch(maxParticles / 512, 1, 1);

As always, any help will be greatly appreciated!

Cheers

Edited by neroziros

##### Share on other sites
Jason Z    6434

When you do the mapping of the buffer to get the number of particles, do you see the correct numbers if you step through the RefreshCurrentParticleCount() method?  If so, then the issue is in how you are copying the value to the constant buffer for use.

One other thing to check - are you enabling the D3D11 debug device?  This would emit debug messages if you try doing things like mapping buffers that aren't accessible to the CPU...

##### Share on other sites
neroziros    234

Thanks! As you said, it was a value copy error. It seems that the constant buffers must be 16b aligned in order to properly work. Here are my new buffers structures.

// SPAWN
struct SpawnConstantBuffer
{
D3DXVECTOR4 EmmiterPosAndLife;
D3DXVECTOR4 randomVector;
};
// UPDATE
struct SimulationParameters
{
D3DXVECTOR4 TimeFactors;
D3DXVECTOR3 EmitterLocation;
UINT NumParticles;
};


and their GPU counterpart:

CSINSERT

cbuffer ParticleInsertParameters : register(b0)
{
float4 EmmiterPosAndLife; // xyz -> pos ; w ->lifetime
float4 RandomVector;};

CSUPDATE

cbuffer SimulationParameters : register(b0)
{
float4 TimeFactors;
float3 EmitterLocation;
uint NumParticles;
};


I am now looking for an efficient way to spawn more than 8 particles per frame.

Cheers!

##### Share on other sites
neroziros    234

Hi all, it seems that I have hit another wall. I can't see the particles at all

I think I have properly initialized all related buffers. But for some reason, even though the particle's current amount is behaving properly, I can't see any particles on screen.  As a side note, I tried using the Graphic Diagnostic Tool (VS 2013) and RenderDoc (Crytek) to debug the shaders, but for some reason the app crashes if I try to capture the current frame info. They work properly if I don't execute the particle system tho, so the problem is probably caused by the compute shaders.

As always, any help is greatly appreciated.

Related Structures

// Render
struct Transforms
{
D3DMATRIX WorldViewMatrix;
D3DMATRIX ProjMatrix;
};

Render Function

// Command the particle system calculations (update and render)
void Particleclass::Draw(float elapsedMiliSeconds,D3DXMATRIX worldMatrix, D3DXMATRIX viewMatrix,D3DXMATRIX projectionMatrix)
{
// Check if the PS has started
if (State == ParticleSystemState::UNSTARTED)
return;

// Timescale
float elapsedSeconds = elapsedMiliSeconds / 1000.0f;

///////////////////////////////////////////////////////////////////
// Simulation
///////////////////////////////////////////////////////////////////
if (State == ParticleSystemState::PLAYING)
Update(elapsedSeconds);

///////////////////////////////////////////////////////////////////
// Render
///////////////////////////////////////////////////////////////////
// Check if there are enough particles in the system
if (currentParticles <= 0) return;

// To render, we need to only select the particles that exist after the update.
m_D3D->GetDeviceContext()->CopyStructureCount(g_pParticlesToRender, 0, InputState);

// Set input layouyt
m_D3D->GetDeviceContext()->IASetInputLayout(pInputLayout);

// Set the type of primitive that should be rendered from this vertex buffer, in this case, the info is stored as point particles.
m_D3D->GetDeviceContext()->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_POINTLIST);

// Set the vertex and pixel shaders that will be used to render this triangle.

// Set blend and stencil model
// Bind blend state
float blendFactor[4] = { 0.0f, 0.0f, 0.0f, 0.0f };
// Bind stencil depth
m_D3D->GetDeviceContext()->OMSetDepthStencilState(pDepthState, 0);

// Set this frame information
Transforms transforms;
transforms.ProjMatrix = projectionMatrix;
transforms.WorldViewMatrix = worldMatrix * viewMatrix;

// Map this frame info
D3D11_MAPPED_SUBRESOURCE mappedResource;
memcpy_s(mappedResource.pData, sizeof(Transforms), &transforms, sizeof(transforms));
m_D3D->GetDeviceContext()->Unmap(m_matrixBuffer, 0);

// Draw the particles
m_D3D->GetDeviceContext()->DrawInstancedIndirect(g_pParticlesToRender, 0); // Check if there are enough particles in the system
}

Render Program

//--------------------------------------------------------------------------------
// Resources
//--------------------------------------------------------------------------------

cbuffer Transforms
{
matrix WorldViewMatrix;
matrix ProjMatrix;
};

cbuffer ParticleRenderParameters
{
float4 EmitterLocation;
float4 ConsumerLocation;
};

static const float scale = 0.5f;

static const float4 g_positions[4] =
{
float4(-scale, scale, 0, 0),
float4(scale, scale, 0, 0),
float4(-scale, -scale, 0, 0),
float4(scale, -scale, 0, 0),
};

static const float2 g_texcoords[4] =
{
float2(0, 1),
float2(1, 1),
float2(0, 0),
float2(1, 0),
};

struct Particle
{
float3 position;
float3 direction;
float  time;
};

StructuredBuffer<Particle> SimulationState;
Texture2D       ParticleTexture : register(t0);
SamplerState    LinearSampler : register(s0);

//--------------------------------------------------------------------------------
// Inter-stage structures
//--------------------------------------------------------------------------------
struct VS_INPUT
{
uint vertexid : SV_VertexID;
};
//--------------------------------------------------------------------------------
struct GS_INPUT
{
float3 position : Position;
};
//--------------------------------------------------------------------------------
struct PS_INPUT
{
float4 position : SV_Position;
float2 texcoords : TEXCOORD0;
float4 color : Color;
};
//--------------------------------------------------------------------------------
GS_INPUT VSMAIN(in VS_INPUT input)
{
GS_INPUT output;

output.position.xyz = SimulationState[input.vertexid].position;

return output;
}
//--------------------------------------------------------------------------------
[maxvertexcount(4)]
void GSMAIN(point GS_INPUT input[1], inout TriangleStream<PS_INPUT> SpriteStream)
{
PS_INPUT output;

float4 color = float4(1.0f, 1.0f, 1.0f, 0.8f);

// Transform to view space
float4 viewposition = mul(float4(input[0].position, 1.0f), WorldViewMatrix);

// Emit two new triangles
for (int i = 0; i < 4; i++)
{
// Transform to clip space
output.position = mul(viewposition + g_positions[i], ProjMatrix);
output.texcoords = g_texcoords[i];
output.color = color;

SpriteStream.Append(output);
}

SpriteStream.RestartStrip();
}
//--------------------------------------------------------------------------------
float4 PSMAIN(in PS_INPUT input) : SV_Target
{
//float4 color = ParticleTexture.Sample(LinearSampler, input.texcoords);
//color = color * input.color;
float4 color = input.color;

return(color);
}
//--------------------------------------------------------------------------------

Edited by neroziros