Jump to content

  • Log In with Google      Sign In   
  • Create Account


8800GTX and only 150 Particles before lagging? Help. =)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 04 November 2012 - 06:46 AM

My method of rendering particle system is.

Have a list of particles with
position
velocity.

Loop list and create two triangles for each particle in a vertex buffer. (in this stage the created triangles are in right position with right rotation)
copy vertexBuffer every frame.

This gives me sooooooo pooor performance.

I have an 8800GTX card and can only render 150 Particles... come on ... 150 particles before it start lagging. Must be somethign big problem with my code.

Here is my Particle class, it is Simple and i have tried to comment every function.
Please let me know if you see something bad.

Another thing is how come the movement is "slow" when particles are visible, i count everything with DeltaTime so shouldent it lag, but the movement/velocity of my player the same even if there is too much drawn on scene?

Here is my Particle class.

[source lang="cpp"]#pragma once#include <d3d11.h>#include <d3dx11.h>#include <d3dx10.h>#include <vector>class Particle {public: D3DXVECTOR3 position; D3DXVECTOR3 velocity; float time;};class ParticleSystem{private: struct VERTEX {FLOAT X, Y, Z; D3DXVECTOR3 Normal; FLOAT U, V; D3DXCOLOR Color;}; D3D11_MAPPED_SUBRESOURCE ms; ID3D11Buffer *m_vertexBuffer, *m_indexBuffer; int m_vertexCount; int m_indexCount; int number_of_particles; VERTEX* model_vertices; DWORD* model_indicies; std::vector<Particle> lstParticles; int CurrentParticle;public: //This is just run onced to create all particles. void AddParticles() { float width = 1.0f; float height = 1.0f; for (int i = 0; i < 1150;i++) { /*float rx = (float)rand()/((float)RAND_MAX/0.01f); float ry = (float)rand()/((float)RAND_MAX/0.001f); float rz = (float)rand()/((float)RAND_MAX/0.01f);*/ Particle p; p.position = D3DXVECTOR3(0,0,0); p.velocity = D3DXVECTOR3(0,0,0); lstParticles.push_back(p); } } //Set new position and new Velocity. void Reset(D3DXVECTOR3 start, D3DXVECTOR3 velocity) { lstParticles[CurrentParticle].position = start; lstParticles[CurrentParticle].velocity = velocity; CurrentParticle++; if (CurrentParticle>=lstParticles.size()) CurrentParticle=0; } //This is run every Frame, here is where i set the position and create //two triangles from a certain position of a particel. //this makes it easy to just maintain a list of particles with one position instead of 6. void UpdateParticles(D3DXVECTOR3 mPos,D3DXVECTOR3 mView) { //float width = 1.0f; //float height = 1.0f; D3DXCOLOR particleColor(1.0f,1.0f,1.0f,0.5f); for (int i=0;i<lstParticles.size();i++) { int v_index = i*6; D3DXVECTOR3 particlePos = lstParticles[i].position; D3DXVECTOR3 look = mView - mPos; D3DXVec3Normalize(&look,&look); //This i could move outside becuase it is the same every particle D3DXVECTOR3 camUp(0,1,0); D3DXVec3Normalize(&camUp,&camUp); D3DXVECTOR3 right; D3DXVec3Cross(&right,&camUp,&look); D3DXVec3Normalize(&right,&right); D3DXVECTOR3 up; D3DXVec3Cross(&up,&look,&right); D3DXVec3Normalize(&up,&up); //up = up * height; //right = right * width; model_vertices[v_index].Color = particleColor; model_vertices[v_index].U = 0; model_vertices[v_index].V = 0; model_vertices[v_index].X = particlePos.x - right.x * 0.5f + up.x; model_vertices[v_index].Y = particlePos.y - right.y * 0.5f + up.y; model_vertices[v_index].Z = particlePos.z - right.z * 0.5f + up.z; v_index++; model_vertices[v_index].Color = particleColor; model_vertices[v_index].U = 0; model_vertices[v_index].V = 1; model_vertices[v_index].X = particlePos.x + right.x * 0.5f + up.x; model_vertices[v_index].Y = particlePos.y + right.y * 0.5f + up.y; model_vertices[v_index].Z = particlePos.z + right.z * 0.5f + up.z; v_index++; model_vertices[v_index].Color = particleColor; model_vertices[v_index].U = 1; model_vertices[v_index].V = 0; model_vertices[v_index].X = particlePos.x - right.x * 0.5f; model_vertices[v_index].Y = particlePos.y - right.y * 0.5f; model_vertices[v_index].Z = particlePos.z - right.z * 0.5f; v_index++; //Second Triangle model_vertices[v_index].Color = particleColor; model_vertices[v_index].U = 1; model_vertices[v_index].V = 0; model_vertices[v_index].X = particlePos.x - right.x * 0.5f; model_vertices[v_index].Y = particlePos.y - right.y * 0.5f; model_vertices[v_index].Z = particlePos.z - right.z * 0.5f; v_index++; model_vertices[v_index].Color = PlaneVerticies[0].Color; model_vertices[v_index].U = 0; model_vertices[v_index].V = 1; model_vertices[v_index].X = particlePos.x + right.x * 0.5f + up.x; model_vertices[v_index].Y = particlePos.y + right.y * 0.5f + up.y; model_vertices[v_index].Z = particlePos.z + right.z * 0.5f + up.z; v_index++; model_vertices[v_index].Color = particleColor; model_vertices[v_index].U = 1; model_vertices[v_index].V = 1; model_vertices[v_index].X = particlePos.x + right.x * 0.5f; model_vertices[v_index].Y = particlePos.y + right.y * 0.5f; model_vertices[v_index].Z = particlePos.z + right.z * 0.5f; v_index++; //update position with velocity lstParticles[i].position+=lstParticles[i].velocity; } } //Just create the Vertex Buffer with as many Particles there is * 6 because we render two triangles for the Quad. //This is because i don´t know how to draw TRIANGLE_STRIP in different position, something with ResetStrip, but i think //it only works with shaders. void Init(ID3D11Device* dev) { CurrentParticle = 0; number_of_particles = lstParticles.size(); m_vertexCount = (number_of_particles * 6); m_indexCount = (number_of_particles * 6); model_vertices = new VERTEX[m_vertexCount]; model_indicies = new DWORD[m_indexCount]; //This might be a problem? The Indicies are never the same as one vertex, so it is a s big as VertexBuffer. for (int i = 0; i<(number_of_particles * 6);i++) { model_indicies[i] = i; } // create the vertex buffer D3D11_BUFFER_DESC bd; ZeroMemory(&bd, sizeof(bd)); bd.Usage = D3D11_USAGE_DYNAMIC; bd.ByteWidth = sizeof(VERTEX) * m_vertexCount; bd.BindFlags = D3D11_BIND_VERTEX_BUFFER; bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; dev->CreateBuffer(&bd, NULL, &m_vertexBuffer); // create the index buffer bd.Usage = D3D11_USAGE_DYNAMIC; bd.ByteWidth = sizeof(DWORD) * m_indexCount; bd.BindFlags = D3D11_BIND_INDEX_BUFFER; bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; bd.MiscFlags = 0; dev->CreateBuffer(&bd, NULL, &m_indexBuffer); } int GetIndexCount() { return m_indexCount; } //This method is run EVERY Frame, it takes the Updated Vertex Buffer and then copies it to the RAM. void CopyAndSetBuffers(ID3D11DeviceContext* devcon) { // select which vertex buffer to display UINT stride = sizeof(VERTEX); UINT offset = 0; // copy the vertices into the buffer //THIS uses the D3D11_MAP_WRITE_DISCARD so it should be ok for updating every frame, right? devcon->Map(m_vertexBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms); // map the buffer memcpy(ms.pData, model_vertices, sizeof(VERTEX) * m_vertexCount); // copy the data devcon->Unmap(m_vertexBuffer, NULL); //copy the index buffers i //THIS uses the D3D11_MAP_WRITE_DISCARD so it should be ok for updating every frame, right? devcon->Map(m_indexBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms); // map the buffer memcpy(ms.pData, model_indicies, sizeof(DWORD) * m_indexCount); // copy the data devcon->Unmap(m_indexBuffer, NULL); devcon->IASetVertexBuffers(0, 1, &m_vertexBuffer, &stride, &offset); devcon->IASetIndexBuffer(m_indexBuffer, DXGI_FORMAT_R32_UINT, 0); } void Clean() { m_indexBuffer->Release(); m_vertexBuffer->Release(); } };[/source]

Sponsor:

#2 Hodgman   Moderators   -  Reputation: 27702

Like
0Likes
Like

Posted 04 November 2012 - 06:58 AM

Can you please define "lag"; do you mean that the time per frame increases?
Have you timed UpdateParticles to see how much CPU time it's consuming?

#3 Erik Rufelt   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 04 November 2012 - 07:02 AM

First, it seems like you have 1150 particles, not 150. Still, that shouldn't be all too slow.. how much is it lagging?

Make sure you compile in Release, not Debug, and move those things you commented yourself outside the loop.
Then switch to only creating 4 vertices per quad instead of 6, but still use 6 indices. Indices can re-use vertices, so you only need 4 vertices and indices [0, 1, 2] and [0, 2, 3] for example, to make 2 triangles. This saves you some bandwidth.

If it's still not good enough, look into using a geometry shader, which can save you a lot of CPU time.

#4 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 04 November 2012 - 07:19 AM

HodgeMan:
Can you please define "lag"; do you mean that the time per frame increases?
Have you timed UpdateParticles to see how much CPU time it's consuming?

My lag is like this:
I move my camera with a velocityVector lets say (0,0,0.001f*deltaTime)
Without particles it feels like i am moving "fast".
But with all particles i am moving "slow" but the velocity vector is still the same.
I have not times by Particles, dont know how.

Erik Rufelt:
1150 particles, correct my misstake.
I also forgot to mention i do a RenderTo Texture and use that texture to map a cube.
So i render everything twice so that should cut my performance in 50% but i still think it is to slow.
The only thing i draw is a 1500 verticies model and my Particles + Cube.

I think Indicies performance upgrade is next thing to look into, but i still think it is something wrong.
My plan is to draw at least 10 more 1000 vertices models in my level.

hm...
i will move the code as in my samples and try in release mode.

Edited by KurtO, 04 November 2012 - 07:24 AM.


#5 Erik Rufelt   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 04 November 2012 - 08:29 AM

Try displaying deltaTime on the screen, and measure the difference in milliseconds. If you compare drawing 1000 particles to not drawing anything at all, then it should be much slower. Even something that is very fast is infinitely slower than something that takes zero time. Drawing nothing is close to zero.
If you aim for 60 frames per second, that gives you a max deltaTime of ~16.5 milliseconds, so compare the time taken to draw 1000 particles to that, and see how many percent of the target time is spent.

Edited by Erik Rufelt, 04 November 2012 - 08:29 AM.


#6 Ashaman73   Crossbones+   -  Reputation: 6729

Like
0Likes
Like

Posted 04 November 2012 - 09:44 AM

Without particles it feels like i am moving "fast".

We need some numbers. Get the free version of fraps to display the FPS at least or best to incorporate some kind of time measurement in your code.

Do you send the particles in a single batch to the GPU or are you using a batch for each particle ? The latter will most likely slow down your performance even for only 1150 particles. An other issue would be to paint 1150 large particles, which could result in an huge overdraw rate, an other reason for a slow down.

Best to provide some more data and a screenshot.

#7 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 04 November 2012 - 10:27 AM

FRAPS was a very good idea!

When i have 1500 particles at the beginning at the same place (0,0,0) and player real close to them my FPS is down to 14FPS.
But when i shoot them away and they are away from the player i get around 250~400 fps.
when the particles are far far away i get as high as 550 fps.

it feels that i cant draw my particles close at the same place...

Edited by KurtO, 04 November 2012 - 10:28 AM.


#8 mhagain   Crossbones+   -  Reputation: 7446

Like
0Likes
Like

Posted 04 November 2012 - 10:36 AM

That's normal enough - you're getting heavy overdraw and bottlenecking on fillrate here. Probably covering a good-ish percentage of the entire screen area 1500 times which will bring any GPU to it's knees.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#9 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 05 November 2012 - 01:45 AM

Suddenly i have more respect of the game-engines out there. It feels impossible to get the visuals they do from my hardward. =)

I will try implement indexed vertexbuffer for 2 of my 6 vertices of my two triangles as Erik said.
Maybe that will lift the performance a little bit.

Also, how do you get transparacy of color black?

If i have alphaBlending on the FPS drops even more...

#10 papulko   Members   -  Reputation: 790

Like
0Likes
Like

Posted 05 November 2012 - 04:19 AM

I would recommend that you also make use of the geometry shader stage, that way you only have to use one vertex for the each sprite, here's a good article on how to do it:
http://takinginitiative.net/2011/01/12/directx10-tutorial-9-the-geometry-shader/

#11 Erik Rufelt   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 05 November 2012 - 07:57 AM

In your pixel-shader, try something like:
if(color.a == 0)
discard;

Whether it's faster or not is hard to say. As your problem is clearly fillrate, and your card is a few years old, there might not be all too much that you can do, other than making the particles smaller on the screen.

One technique you can try to reduce fillrate is to draw polygons that aren't squares or quads, so that you get as little area as possible on screen for your particles, like shown for example here: http://www.humus.name/index.php?page=Comments&ID=266

#12 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 05 November 2012 - 09:00 AM

papulko, using a Geometry Shader is clearly my next step. When the game is finishes i might "upgrade" that part. It seems really nice to render all particles on the GPU.

Erik, color.a == 0 check looks like a good way to sort this out.

I will definently try to to use only a triangle with texture coords so that my texture is in the middle, because of my transparacy i really dont need a quad if my texture fits inside my triangle! This is really smart!

Correct me if i am wrong, but if i render all my triangles with different positions, i won¨t gain any performance of index-buffer becuase all my vertex will be on seperate places i guess? right?

#13 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 05 November 2012 - 09:16 AM

By the way.
Is it better to have a vertex buffer that contains ALL particles and only update position.

OR

create a new VertexBuffer with only the particles that are Alive and then SWAP that vertexBuffer each frame?

#14 Erik Rufelt   Crossbones+   -  Reputation: 3058

Like
0Likes
Like

Posted 05 November 2012 - 09:43 AM

Probably only update the alive ones..
However, in your case this is most likely irrelevant. As you get high FPS when particles are far away, your vertices are not limiting your performance. Because of this both index-buffers and geometry shaders will gain very little.

Using triangles instead of quads could be better or worse, and you probably want to use like 8-corner polygons or something. Look again at the page I linked. The only thing that matters for you is how many Pixels are covered on the screen. If you use 10 vertices to cover 80% as many pixels, then that is a win.

Your graphics card does two things for you:
1. Transform vertices
2. Fill pixels

As your performance is much worse when your particles are close, it means that step 1 is cheap for you and doesn't matter very much. Index-buffers and geometry shaders improve step 1 to be even better. If you get 500 FPS when particles are far away and 14 FPS when particles are close, that gives approximately:
Step 1: 2 milliseconds
Step 2: 70 milliseconds

That means if you make step 1 twice as fast, your FPS close will still be close to 14. So it does not matter much at all.
If you make Step 2 twice as fast, that makes a much larger difference, even if Step 1 gets slower by increasing the vertex count. So choose vertices so that you cover the least number of pixels, if you want many particles covering a large number of pixels on the screen.

However, no matter what you do it is likely impossible to get 1000 particles covering a large part of the screen on your graphics card, it's simply too many pixels. You have to make your particles a bit smaller or draw fewer particles when they get close. If you have 1000 particles very close to the screen, most won't be visible, so you can maybe sort them and remove those behind others or similar.

#15 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 05 November 2012 - 09:53 AM

Erik, thank you so much for your explanation and your time to write your answer to me.
Now i finally understand that it is the screen pixel coverage that is my problem.

My optimization will be smaller particles and draw fewer when close, that should do the trick!

again, thank you very much.

Edited by KurtO, 05 November 2012 - 09:54 AM.


#16 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 05 November 2012 - 11:38 AM

Holy shit!

you know what you are talking about!
making the particles 0.05f width/height instead of 1,0f makes the particles SUPERFAST!
The fillrate is down and the speed is UP!

5000 particles at same position ~ 200FPS
and all around the place = 450 FPS, hardly no drop at all!

COOOOL!

As you said Erik, i have not optimized index or quad etc, just the size of particle made it superfast!

thanks again.

#17 phil_t   Crossbones+   -  Reputation: 3204

Like
0Likes
Like

Posted 05 November 2012 - 11:44 AM

Another fairly easy thing you can do is when particles get closer and take up large portions of the screen, you can automatically fade them out, until the point where you don't draw them anymore. Of course this decision has to be made in the vertex shader (or earlier) to avoid the pixel shading cost.

Another much more complicated optimization is to render the particles to a lower resolution render target and apply them to the scene afterward: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch23.html

#18 KurtO   Members   -  Reputation: 210

Like
0Likes
Like

Posted 05 November 2012 - 02:14 PM

thanks phil_t, after the engine is "done" i will look into this for optimization.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS