8800GTX and only 150 Particles before lagging? Help. =)

Started by
16 comments, last by Kurt-olsson 11 years, 5 months ago
My method of rendering particle system is.

Have a list of particles with
position
velocity.

Loop list and create two triangles for each particle in a vertex buffer. (in this stage the created triangles are in right position with right rotation)
copy vertexBuffer every frame.

This gives me sooooooo pooor performance.

I have an 8800GTX card and can only render 150 Particles... come on ... 150 particles before it start lagging. Must be somethign big problem with my code.

Here is my Particle class, it is Simple and i have tried to comment every function.
Please let me know if you see something bad.

Another thing is how come the movement is "slow" when particles are visible, i count everything with DeltaTime so shouldent it lag, but the movement/velocity of my player the same even if there is too much drawn on scene?

Here is my Particle class.

[source lang="cpp"]#pragma once
#include <d3d11.h>
#include <d3dx11.h>
#include <d3dx10.h>
#include <vector>

class Particle {
public:
D3DXVECTOR3 position;
D3DXVECTOR3 velocity;
float time;
};

class ParticleSystem
{
private:
struct VERTEX {FLOAT X, Y, Z; D3DXVECTOR3 Normal; FLOAT U, V; D3DXCOLOR Color;};

D3D11_MAPPED_SUBRESOURCE ms;
ID3D11Buffer *m_vertexBuffer, *m_indexBuffer;
int m_vertexCount;
int m_indexCount;
int number_of_particles;
VERTEX* model_vertices;
DWORD* model_indicies;
std::vector<Particle> lstParticles;
int CurrentParticle;

public:

//This is just run onced to create all particles.
void AddParticles() {

float width = 1.0f;
float height = 1.0f;

for (int i = 0; i < 1150;i++) {

/*float rx = (float)rand()/((float)RAND_MAX/0.01f);
float ry = (float)rand()/((float)RAND_MAX/0.001f);
float rz = (float)rand()/((float)RAND_MAX/0.01f);*/
Particle p;
p.position = D3DXVECTOR3(0,0,0);
p.velocity = D3DXVECTOR3(0,0,0);
lstParticles.push_back(p);
}
}


//Set new position and new Velocity.
void Reset(D3DXVECTOR3 start, D3DXVECTOR3 velocity) {

lstParticles[CurrentParticle].position = start;
lstParticles[CurrentParticle].velocity = velocity;
CurrentParticle++;
if (CurrentParticle>=lstParticles.size())
CurrentParticle=0;

}

//This is run every Frame, here is where i set the position and create
//two triangles from a certain position of a particel.
//this makes it easy to just maintain a list of particles with one position instead of 6.

void UpdateParticles(D3DXVECTOR3 mPos,D3DXVECTOR3 mView) {
//float width = 1.0f;
//float height = 1.0f;

D3DXCOLOR particleColor(1.0f,1.0f,1.0f,0.5f);

for (int i=0;i<lstParticles.size();i++) {
int v_index = i*6;
D3DXVECTOR3 particlePos = lstParticles.position;

D3DXVECTOR3 look = mView - mPos;
D3DXVec3Normalize(&look,&look);

//This i could move outside becuase it is the same every particle
D3DXVECTOR3 camUp(0,1,0);
D3DXVec3Normalize(&camUp,&camUp);

D3DXVECTOR3 right;
D3DXVec3Cross(&right,&camUp,&look);
D3DXVec3Normalize(&right,&right);

D3DXVECTOR3 up;
D3DXVec3Cross(&up,&look,&right);
D3DXVec3Normalize(&up,&up);

//up = up * height;
//right = right * width;

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 0;
model_vertices[v_index].V = 0;
model_vertices[v_index].X = particlePos.x - right.x * 0.5f + up.x;
model_vertices[v_index].Y = particlePos.y - right.y * 0.5f + up.y;
model_vertices[v_index].Z = particlePos.z - right.z * 0.5f + up.z;
v_index++;

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 0;
model_vertices[v_index].V = 1;
model_vertices[v_index].X = particlePos.x + right.x * 0.5f + up.x;
model_vertices[v_index].Y = particlePos.y + right.y * 0.5f + up.y;
model_vertices[v_index].Z = particlePos.z + right.z * 0.5f + up.z;
v_index++;

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 1;
model_vertices[v_index].V = 0;
model_vertices[v_index].X = particlePos.x - right.x * 0.5f;
model_vertices[v_index].Y = particlePos.y - right.y * 0.5f;
model_vertices[v_index].Z = particlePos.z - right.z * 0.5f;
v_index++;

//Second Triangle

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 1;
model_vertices[v_index].V = 0;
model_vertices[v_index].X = particlePos.x - right.x * 0.5f;
model_vertices[v_index].Y = particlePos.y - right.y * 0.5f;
model_vertices[v_index].Z = particlePos.z - right.z * 0.5f;
v_index++;

model_vertices[v_index].Color = PlaneVerticies[0].Color;
model_vertices[v_index].U = 0;
model_vertices[v_index].V = 1;
model_vertices[v_index].X = particlePos.x + right.x * 0.5f + up.x;
model_vertices[v_index].Y = particlePos.y + right.y * 0.5f + up.y;
model_vertices[v_index].Z = particlePos.z + right.z * 0.5f + up.z;
v_index++;


model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 1;
model_vertices[v_index].V = 1;
model_vertices[v_index].X = particlePos.x + right.x * 0.5f;
model_vertices[v_index].Y = particlePos.y + right.y * 0.5f;
model_vertices[v_index].Z = particlePos.z + right.z * 0.5f;
v_index++;

//update position with velocity
lstParticles.position+=lstParticles.velocity;
}

}

//Just create the Vertex Buffer with as many Particles there is * 6 because we render two triangles for the Quad.
//This is because i don´t know how to draw TRIANGLE_STRIP in different position, something with ResetStrip, but i think
//it only works with shaders.
void Init(ID3D11Device* dev) {
CurrentParticle = 0;

number_of_particles = lstParticles.size();
m_vertexCount = (number_of_particles * 6);
m_indexCount = (number_of_particles * 6);

model_vertices = new VERTEX[m_vertexCount];
model_indicies = new DWORD[m_indexCount];

//This might be a problem? The Indicies are never the same as one vertex, so it is a s big as VertexBuffer.
for (int i = 0; i<(number_of_particles * 6);i++) {
model_indicies = i;
}

// create the vertex buffer
D3D11_BUFFER_DESC bd;
ZeroMemory(&bd, sizeof(bd));

bd.Usage = D3D11_USAGE_DYNAMIC;
bd.ByteWidth = sizeof(VERTEX) * m_vertexCount;
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;

dev->CreateBuffer(&bd, NULL, &m_vertexBuffer);

// create the index buffer
bd.Usage = D3D11_USAGE_DYNAMIC;
bd.ByteWidth = sizeof(DWORD) * m_indexCount;
bd.BindFlags = D3D11_BIND_INDEX_BUFFER;
bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bd.MiscFlags = 0;

dev->CreateBuffer(&bd, NULL, &m_indexBuffer);



}

int GetIndexCount() {
return m_indexCount;
}

//This method is run EVERY Frame, it takes the Updated Vertex Buffer and then copies it to the RAM.
void CopyAndSetBuffers(ID3D11DeviceContext* devcon) {


// select which vertex buffer to display
UINT stride = sizeof(VERTEX);
UINT offset = 0;

// copy the vertices into the buffer
//THIS uses the D3D11_MAP_WRITE_DISCARD so it should be ok for updating every frame, right?
devcon->Map(m_vertexBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms); // map the buffer
memcpy(ms.pData, model_vertices, sizeof(VERTEX) * m_vertexCount); // copy the data
devcon->Unmap(m_vertexBuffer, NULL);
//copy the index buffers i
//THIS uses the D3D11_MAP_WRITE_DISCARD so it should be ok for updating every frame, right?
devcon->Map(m_indexBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms); // map the buffer
memcpy(ms.pData, model_indicies, sizeof(DWORD) * m_indexCount); // copy the data
devcon->Unmap(m_indexBuffer, NULL);

devcon->IASetVertexBuffers(0, 1, &m_vertexBuffer, &stride, &offset);
devcon->IASetIndexBuffer(m_indexBuffer, DXGI_FORMAT_R32_UINT, 0);
}

void Clean() {
m_indexBuffer->Release();
m_vertexBuffer->Release();
}


};[/source]
Advertisement
Can you please define "lag"; do you mean that the time per frame increases?
Have you timed [font=courier new,courier,monospace]UpdateParticles[/font] to see how much CPU time it's consuming?
First, it seems like you have 1150 particles, not 150. Still, that shouldn't be all too slow.. how much is it lagging?

Make sure you compile in Release, not Debug, and move those things you commented yourself outside the loop.
Then switch to only creating 4 vertices per quad instead of 6, but still use 6 indices. Indices can re-use vertices, so you only need 4 vertices and indices [0, 1, 2] and [0, 2, 3] for example, to make 2 triangles. This saves you some bandwidth.

If it's still not good enough, look into using a geometry shader, which can save you a lot of CPU time.
HodgeMan:
Can you please define "lag"; do you mean that the time per frame increases?
Have you timed UpdateParticles to see how much CPU time it's consuming?

My lag is like this:
I move my camera with a velocityVector lets say (0,0,0.001f*deltaTime)
Without particles it feels like i am moving "fast".
But with all particles i am moving "slow" but the velocity vector is still the same.
I have not times by Particles, dont know how.

Erik Rufelt:
1150 particles, correct my misstake.
I also forgot to mention i do a RenderTo Texture and use that texture to map a cube.
So i render everything twice so that should cut my performance in 50% but i still think it is to slow.
The only thing i draw is a 1500 verticies model and my Particles + Cube.

I think Indicies performance upgrade is next thing to look into, but i still think it is something wrong.
My plan is to draw at least 10 more 1000 vertices models in my level.

hm...
i will move the code as in my samples and try in release mode.
Try displaying deltaTime on the screen, and measure the difference in milliseconds. If you compare drawing 1000 particles to not drawing anything at all, then it should be much slower. Even something that is very fast is infinitely slower than something that takes zero time. Drawing nothing is close to zero.
If you aim for 60 frames per second, that gives you a max deltaTime of ~16.5 milliseconds, so compare the time taken to draw 1000 particles to that, and see how many percent of the target time is spent.

Without particles it feels like i am moving "fast".

We need some numbers. Get the free version of fraps to display the FPS at least or best to incorporate some kind of time measurement in your code.

Do you send the particles in a single batch to the GPU or are you using a batch for each particle ? The latter will most likely slow down your performance even for only 1150 particles. An other issue would be to paint 1150 large particles, which could result in an huge overdraw rate, an other reason for a slow down.

Best to provide some more data and a screenshot.
FRAPS was a very good idea!

When i have 1500 particles at the beginning at the same place (0,0,0) and player real close to them my FPS is down to 14FPS.
But when i shoot them away and they are away from the player i get around 250~400 fps.
when the particles are far far away i get as high as 550 fps.

it feels that i cant draw my particles close at the same place...
That's normal enough - you're getting heavy overdraw and bottlenecking on fillrate here. Probably covering a good-ish percentage of the entire screen area 1500 times which will bring any GPU to it's knees.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Suddenly i have more respect of the game-engines out there. It feels impossible to get the visuals they do from my hardward. =)

I will try implement indexed vertexbuffer for 2 of my 6 vertices of my two triangles as Erik said.
Maybe that will lift the performance a little bit.

Also, how do you get transparacy of color black?

If i have alphaBlending on the FPS drops even more...
I would recommend that you also make use of the geometry shader stage, that way you only have to use one vertex for the each sprite, here's a good article on how to do it:
http://takinginitiative.net/2011/01/12/directx10-tutorial-9-the-geometry-shader/

This topic is closed to new replies.

Advertisement