• Advertisement
Sign in to follow this  

DX11 8800GTX and only 150 Particles before lagging? Help. =)

This topic is 1901 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

My method of rendering particle system is.

Have a list of particles with
position
velocity.

Loop list and create two triangles for each particle in a vertex buffer. (in this stage the created triangles are in right position with right rotation)
copy vertexBuffer every frame.

This gives me sooooooo pooor performance.

I have an 8800GTX card and can only render 150 Particles... come on ... 150 particles before it start lagging. Must be somethign big problem with my code.

Here is my Particle class, it is Simple and i have tried to comment every function.
Please let me know if you see something bad.

Another thing is how come the movement is "slow" when particles are visible, i count everything with DeltaTime so shouldent it lag, but the movement/velocity of my player the same even if there is too much drawn on scene?

Here is my Particle class.

[source lang="cpp"]#pragma once
#include <d3d11.h>
#include <d3dx11.h>
#include <d3dx10.h>
#include <vector>

class Particle {
public:
D3DXVECTOR3 position;
D3DXVECTOR3 velocity;
float time;
};

class ParticleSystem
{
private:
struct VERTEX {FLOAT X, Y, Z; D3DXVECTOR3 Normal; FLOAT U, V; D3DXCOLOR Color;};

D3D11_MAPPED_SUBRESOURCE ms;
ID3D11Buffer *m_vertexBuffer, *m_indexBuffer;
int m_vertexCount;
int m_indexCount;
int number_of_particles;
VERTEX* model_vertices;
DWORD* model_indicies;
std::vector<Particle> lstParticles;
int CurrentParticle;

public:

//This is just run onced to create all particles.
void AddParticles() {

float width = 1.0f;
float height = 1.0f;

for (int i = 0; i < 1150;i++) {

/*float rx = (float)rand()/((float)RAND_MAX/0.01f);
float ry = (float)rand()/((float)RAND_MAX/0.001f);
float rz = (float)rand()/((float)RAND_MAX/0.01f);*/
Particle p;
p.position = D3DXVECTOR3(0,0,0);
p.velocity = D3DXVECTOR3(0,0,0);
lstParticles.push_back(p);
}
}


//Set new position and new Velocity.
void Reset(D3DXVECTOR3 start, D3DXVECTOR3 velocity) {

lstParticles[CurrentParticle].position = start;
lstParticles[CurrentParticle].velocity = velocity;
CurrentParticle++;
if (CurrentParticle>=lstParticles.size())
CurrentParticle=0;

}

//This is run every Frame, here is where i set the position and create
//two triangles from a certain position of a particel.
//this makes it easy to just maintain a list of particles with one position instead of 6.

void UpdateParticles(D3DXVECTOR3 mPos,D3DXVECTOR3 mView) {
//float width = 1.0f;
//float height = 1.0f;

D3DXCOLOR particleColor(1.0f,1.0f,1.0f,0.5f);

for (int i=0;i<lstParticles.size();i++) {
int v_index = i*6;
D3DXVECTOR3 particlePos = lstParticles[i].position;

D3DXVECTOR3 look = mView - mPos;
D3DXVec3Normalize(&look,&look);

//This i could move outside becuase it is the same every particle
D3DXVECTOR3 camUp(0,1,0);
D3DXVec3Normalize(&camUp,&camUp);

D3DXVECTOR3 right;
D3DXVec3Cross(&right,&camUp,&look);
D3DXVec3Normalize(&right,&right);

D3DXVECTOR3 up;
D3DXVec3Cross(&up,&look,&right);
D3DXVec3Normalize(&up,&up);

//up = up * height;
//right = right * width;

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 0;
model_vertices[v_index].V = 0;
model_vertices[v_index].X = particlePos.x - right.x * 0.5f + up.x;
model_vertices[v_index].Y = particlePos.y - right.y * 0.5f + up.y;
model_vertices[v_index].Z = particlePos.z - right.z * 0.5f + up.z;
v_index++;

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 0;
model_vertices[v_index].V = 1;
model_vertices[v_index].X = particlePos.x + right.x * 0.5f + up.x;
model_vertices[v_index].Y = particlePos.y + right.y * 0.5f + up.y;
model_vertices[v_index].Z = particlePos.z + right.z * 0.5f + up.z;
v_index++;

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 1;
model_vertices[v_index].V = 0;
model_vertices[v_index].X = particlePos.x - right.x * 0.5f;
model_vertices[v_index].Y = particlePos.y - right.y * 0.5f;
model_vertices[v_index].Z = particlePos.z - right.z * 0.5f;
v_index++;

//Second Triangle

model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 1;
model_vertices[v_index].V = 0;
model_vertices[v_index].X = particlePos.x - right.x * 0.5f;
model_vertices[v_index].Y = particlePos.y - right.y * 0.5f;
model_vertices[v_index].Z = particlePos.z - right.z * 0.5f;
v_index++;

model_vertices[v_index].Color = PlaneVerticies[0].Color;
model_vertices[v_index].U = 0;
model_vertices[v_index].V = 1;
model_vertices[v_index].X = particlePos.x + right.x * 0.5f + up.x;
model_vertices[v_index].Y = particlePos.y + right.y * 0.5f + up.y;
model_vertices[v_index].Z = particlePos.z + right.z * 0.5f + up.z;
v_index++;


model_vertices[v_index].Color = particleColor;
model_vertices[v_index].U = 1;
model_vertices[v_index].V = 1;
model_vertices[v_index].X = particlePos.x + right.x * 0.5f;
model_vertices[v_index].Y = particlePos.y + right.y * 0.5f;
model_vertices[v_index].Z = particlePos.z + right.z * 0.5f;
v_index++;

//update position with velocity
lstParticles[i].position+=lstParticles[i].velocity;
}

}

//Just create the Vertex Buffer with as many Particles there is * 6 because we render two triangles for the Quad.
//This is because i don´t know how to draw TRIANGLE_STRIP in different position, something with ResetStrip, but i think
//it only works with shaders.
void Init(ID3D11Device* dev) {
CurrentParticle = 0;

number_of_particles = lstParticles.size();
m_vertexCount = (number_of_particles * 6);
m_indexCount = (number_of_particles * 6);

model_vertices = new VERTEX[m_vertexCount];
model_indicies = new DWORD[m_indexCount];

//This might be a problem? The Indicies are never the same as one vertex, so it is a s big as VertexBuffer.
for (int i = 0; i<(number_of_particles * 6);i++) {
model_indicies[i] = i;
}

// create the vertex buffer
D3D11_BUFFER_DESC bd;
ZeroMemory(&bd, sizeof(bd));

bd.Usage = D3D11_USAGE_DYNAMIC;
bd.ByteWidth = sizeof(VERTEX) * m_vertexCount;
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;

dev->CreateBuffer(&bd, NULL, &m_vertexBuffer);

// create the index buffer
bd.Usage = D3D11_USAGE_DYNAMIC;
bd.ByteWidth = sizeof(DWORD) * m_indexCount;
bd.BindFlags = D3D11_BIND_INDEX_BUFFER;
bd.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
bd.MiscFlags = 0;

dev->CreateBuffer(&bd, NULL, &m_indexBuffer);



}

int GetIndexCount() {
return m_indexCount;
}

//This method is run EVERY Frame, it takes the Updated Vertex Buffer and then copies it to the RAM.
void CopyAndSetBuffers(ID3D11DeviceContext* devcon) {


// select which vertex buffer to display
UINT stride = sizeof(VERTEX);
UINT offset = 0;

// copy the vertices into the buffer
//THIS uses the D3D11_MAP_WRITE_DISCARD so it should be ok for updating every frame, right?
devcon->Map(m_vertexBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms); // map the buffer
memcpy(ms.pData, model_vertices, sizeof(VERTEX) * m_vertexCount); // copy the data
devcon->Unmap(m_vertexBuffer, NULL);
//copy the index buffers i
//THIS uses the D3D11_MAP_WRITE_DISCARD so it should be ok for updating every frame, right?
devcon->Map(m_indexBuffer, NULL, D3D11_MAP_WRITE_DISCARD, NULL, &ms); // map the buffer
memcpy(ms.pData, model_indicies, sizeof(DWORD) * m_indexCount); // copy the data
devcon->Unmap(m_indexBuffer, NULL);

devcon->IASetVertexBuffers(0, 1, &m_vertexBuffer, &stride, &offset);
devcon->IASetIndexBuffer(m_indexBuffer, DXGI_FORMAT_R32_UINT, 0);
}

void Clean() {
m_indexBuffer->Release();
m_vertexBuffer->Release();
}


};[/source]

Share this post


Link to post
Share on other sites
Advertisement
First, it seems like you have 1150 particles, not 150. Still, that shouldn't be all too slow.. how much is it lagging?

Make sure you compile in Release, not Debug, and move those things you commented yourself outside the loop.
Then switch to only creating 4 vertices per quad instead of 6, but still use 6 indices. Indices can re-use vertices, so you only need 4 vertices and indices [0, 1, 2] and [0, 2, 3] for example, to make 2 triangles. This saves you some bandwidth.

If it's still not good enough, look into using a geometry shader, which can save you a lot of CPU time.

Share this post


Link to post
Share on other sites
HodgeMan:
Can you please define "lag"; do you mean that the time per frame increases?
Have you timed UpdateParticles to see how much CPU time it's consuming?

My lag is like this:
I move my camera with a velocityVector lets say (0,0,0.001f*deltaTime)
Without particles it feels like i am moving "fast".
But with all particles i am moving "slow" but the velocity vector is still the same.
I have not times by Particles, dont know how.

Erik Rufelt:
1150 particles, correct my misstake.
I also forgot to mention i do a RenderTo Texture and use that texture to map a cube.
So i render everything twice so that should cut my performance in 50% but i still think it is to slow.
The only thing i draw is a 1500 verticies model and my Particles + Cube.

I think Indicies performance upgrade is next thing to look into, but i still think it is something wrong.
My plan is to draw at least 10 more 1000 vertices models in my level.

hm...
i will move the code as in my samples and try in release mode. Edited by KurtO

Share this post


Link to post
Share on other sites
Try displaying deltaTime on the screen, and measure the difference in milliseconds. If you compare drawing 1000 particles to not drawing anything at all, then it should be much slower. Even something that is very fast is infinitely slower than something that takes zero time. Drawing nothing is close to zero.
If you aim for 60 frames per second, that gives you a max deltaTime of ~16.5 milliseconds, so compare the time taken to draw 1000 particles to that, and see how many percent of the target time is spent. Edited by Erik Rufelt

Share this post


Link to post
Share on other sites
[quote name='KurtO' timestamp='1352035146' post='4997157']
Without particles it feels like i am moving "fast".
[/quote]
We need some numbers. Get the free version of fraps to display the FPS at least or best to incorporate some kind of time measurement in your code.

Do you send the particles in a single batch to the GPU or are you using a batch for each particle ? The latter will most likely slow down your performance even for only 1150 particles. An other issue would be to paint 1150 large particles, which could result in an huge overdraw rate, an other reason for a slow down.

Best to provide some more data and a screenshot.

Share this post


Link to post
Share on other sites
FRAPS was a very good idea!

When i have 1500 particles at the beginning at the same place (0,0,0) and player real close to them my FPS is down to 14FPS.
But when i shoot them away and they are away from the player i get around 250~400 fps.
when the particles are far far away i get as high as 550 fps.

it feels that i cant draw my particles close at the same place... Edited by KurtO

Share this post


Link to post
Share on other sites
That's normal enough - you're getting heavy overdraw and bottlenecking on fillrate here. Probably covering a good-ish percentage of the entire screen area 1500 times which will bring any GPU to it's knees.

Share this post


Link to post
Share on other sites
Suddenly i have more respect of the game-engines out there. It feels impossible to get the visuals they do from my hardward. =)

I will try implement indexed vertexbuffer for 2 of my 6 vertices of my two triangles as Erik said.
Maybe that will lift the performance a little bit.

Also, how do you get transparacy of color black?

If i have alphaBlending on the FPS drops even more...

Share this post


Link to post
Share on other sites
I would recommend that you also make use of the geometry shader stage, that way you only have to use one vertex for the each sprite, here's a good article on how to do it:
[url="http://takinginitiative.net/2011/01/12/directx10-tutorial-9-the-geometry-shader/"]http://takinginitiative.net/2011/01/12/directx10-tutorial-9-the-geometry-shader/[/url]

Share this post


Link to post
Share on other sites
In your pixel-shader, try something like:
if(color.a == 0)
discard;

Whether it's faster or not is hard to say. As your problem is clearly fillrate, and your card is a few years old, there might not be all too much that you can do, other than making the particles smaller on the screen.

One technique you can try to reduce fillrate is to draw polygons that aren't squares or quads, so that you get as little area as possible on screen for your particles, like shown for example here: [url="http://www.humus.name/index.php?page=Comments&ID=266"]http://www.humus.name/index.php?page=Comments&ID=266[/url]

Share this post


Link to post
Share on other sites
papulko, using a Geometry Shader is clearly my next step. When the game is finishes i might "upgrade" that part. It seems really nice to render all particles on the GPU.

Erik, color.a == 0 check looks like a good way to sort this out.

I will definently try to to use only a triangle with texture coords so that my texture is in the middle, because of my transparacy i really dont need a quad if my texture fits inside my triangle! This is really smart!

Correct me if i am wrong, but if i render all my triangles with different positions, i won¨t gain any performance of index-buffer becuase all my vertex will be on seperate places i guess? right?

Share this post


Link to post
Share on other sites
By the way.
Is it better to have a vertex buffer that contains ALL particles and only update position.

OR

create a new VertexBuffer with only the particles that are Alive and then SWAP that vertexBuffer each frame?

Share this post


Link to post
Share on other sites
Probably only update the alive ones..
However, in your case this is most likely irrelevant. As you get high FPS when particles are far away, your vertices are not limiting your performance. Because of this both index-buffers and geometry shaders will gain very little.

Using triangles instead of quads could be better or worse, and you probably want to use like 8-corner polygons or something. Look again at the page I linked. The only thing that matters for you is how many Pixels are covered on the screen. If you use 10 vertices to cover 80% as many pixels, then that is a win.

Your graphics card does two things for you:
1. Transform vertices
2. Fill pixels

As your performance is much worse when your particles are close, it means that step 1 is cheap for you and doesn't matter very much. Index-buffers and geometry shaders improve step 1 to be even better. If you get 500 FPS when particles are far away and 14 FPS when particles are close, that gives approximately:
Step 1: 2 milliseconds
Step 2: 70 milliseconds

That means if you make step 1 twice as fast, your FPS close will still be close to 14. So it does not matter much at all.
If you make Step 2 twice as fast, that makes a much larger difference, even if Step 1 gets slower by increasing the vertex count. So choose vertices so that you cover the least number of pixels, if you want many particles covering a large number of pixels on the screen.

However, no matter what you do it is likely impossible to get 1000 particles covering a large part of the screen on your graphics card, it's simply too many pixels. You have to make your particles a bit smaller or draw fewer particles when they get close. If you have 1000 particles very close to the screen, most won't be visible, so you can maybe sort them and remove those behind others or similar.

Share this post


Link to post
Share on other sites
Erik, thank you so much for your explanation and your time to write your answer to me.
Now i finally understand that it is the screen pixel coverage that is my problem.

My optimization will be smaller particles and draw fewer when close, that should do the trick!

again, thank you very much. Edited by KurtO

Share this post


Link to post
Share on other sites
Holy shit!

you know what you are talking about!
making the particles 0.05f width/height instead of 1,0f makes the particles SUPERFAST!
The fillrate is down and the speed is UP!

5000 particles at same position ~ 200FPS
and all around the place = 450 FPS, hardly no drop at all!

COOOOL!

As you said Erik, i have not optimized index or quad etc, just the size of particle made it superfast!

thanks again.

Share this post


Link to post
Share on other sites
Another fairly easy thing you can do is when particles get closer and take up large portions of the screen, you can automatically fade them out, until the point where you don't draw them anymore. Of course this decision has to be made in the vertex shader (or earlier) to avoid the pixel shading cost.

Another much more complicated optimization is to render the particles to a lower resolution render target and apply them to the scene afterward: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch23.html

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Now

  • Advertisement
  • Similar Content

    • By AxeGuywithanAxe
      I wanted to see how others are currently handling descriptor heap updates and management.
      I've read a few articles and there tends to be three major strategies :
      1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc)
      2) You have one descriptor heap for an entire pipeline
      3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc)
      The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient.
      The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
    • By evelyn4you
      hi,
      until now i use typical vertexshader approach for skinning with a Constantbuffer containing the transform matrix for the bones and an the vertexbuffer containing bone index and bone weight.
      Now i have implemented realtime environment  probe cubemaping so i have to render my scene from many point of views and the time for skinning takes too long because it is recalculated for every side of the cubemap.
      For Info i am working on Win7 an therefore use one Shadermodel 5.0 not 5.x that have more options, or is there a way to use 5.x in Win 7
      My Graphic Card is Directx 12 compatible NVidia GTX 960
      the member turanszkij has posted a good for me understandable compute shader. ( for Info: in his engine he uses an optimized version of it )
      https://turanszkij.wordpress.com/2017/09/09/skinning-in-compute-shader/
      Now my questions
       is it possible to feed the compute shader with my orignial vertexbuffer or do i have to copy it in several ByteAdressBuffers as implemented in the following code ?
        the same question is about the constant buffer of the matrixes
       my more urgent question is how do i feed my normal pipeline with the result of the compute Shader which are 2 RWByteAddressBuffers that contain position an normal
      for example i could use 2 vertexbuffer bindings
      1 containing only the uv coordinates
      2.containing position and normal
      How do i copy from the RWByteAddressBuffers to the vertexbuffer ?
       
      (Code from turanszkij )
      Here is my shader implementation for skinning a mesh in a compute shader:
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 struct Bone { float4x4 pose; }; StructuredBuffer<Bone> boneBuffer;   ByteAddressBuffer vertexBuffer_POS; // T-Pose pos ByteAddressBuffer vertexBuffer_NOR; // T-Pose normal ByteAddressBuffer vertexBuffer_WEI; // bone weights ByteAddressBuffer vertexBuffer_BON; // bone indices   RWByteAddressBuffer streamoutBuffer_POS; // skinned pos RWByteAddressBuffer streamoutBuffer_NOR; // skinned normal RWByteAddressBuffer streamoutBuffer_PRE; // previous frame skinned pos   inline void Skinning(inout float4 pos, inout float4 nor, in float4 inBon, in float4 inWei) {  float4 p = 0, pp = 0;  float3 n = 0;  float4x4 m;  float3x3 m3;  float weisum = 0;   // force loop to reduce register pressure  // though this way we can not interleave TEX - ALU operations  [loop]  for (uint i = 0; ((i &lt; 4) &amp;&amp; (weisum&lt;1.0f)); ++i)  {  m = boneBuffer[(uint)inBon].pose;  m3 = (float3x3)m;   p += mul(float4(pos.xyz, 1), m)*inWei;  n += mul(nor.xyz, m3)*inWei;   weisum += inWei;  }   bool w = any(inWei);  pos.xyz = w ? p.xyz : pos.xyz;  nor.xyz = w ? n : nor.xyz; }   [numthreads(1024, 1, 1)] void main( uint3 DTid : SV_DispatchThreadID ) {  const uint fetchAddress = DTid.x * 16; // stride is 16 bytes for each vertex buffer now...   uint4 pos_u = vertexBuffer_POS.Load4(fetchAddress);  uint4 nor_u = vertexBuffer_NOR.Load4(fetchAddress);  uint4 wei_u = vertexBuffer_WEI.Load4(fetchAddress);  uint4 bon_u = vertexBuffer_BON.Load4(fetchAddress);   float4 pos = asfloat(pos_u);  float4 nor = asfloat(nor_u);  float4 wei = asfloat(wei_u);  float4 bon = asfloat(bon_u);   Skinning(pos, nor, bon, wei);   pos_u = asuint(pos);  nor_u = asuint(nor);   // copy prev frame current pos to current frame prev pos streamoutBuffer_PRE.Store4(fetchAddress, streamoutBuffer_POS.Load4(fetchAddress)); // write out skinned props:  streamoutBuffer_POS.Store4(fetchAddress, pos_u);  streamoutBuffer_NOR.Store4(fetchAddress, nor_u); }  
    • By mister345
      Hi, can someone please explain why this is giving an assertion EyePosition!=0 exception?
       
      _lightBufferVS->viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&_lightBufferVS->position), XMLoadFloat3(&_lookAt), XMLoadFloat3(&up));
      It looks like DirectX doesnt want the 2nd parameter to be a zero vector in the assertion, but I passed in a zero vector with this exact same code in another program and it ran just fine. (Here is the version of the code that worked - note XMLoadFloat3(&m_lookAt) parameter value is (0,0,0) at runtime - I debugged it - but it throws no exceptions.
          m_viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&m_position), XMLoadFloat3(&m_lookAt), XMLoadFloat3(&up)); Here is the repo for the broken code (See LightClass) https://github.com/mister51213/DirectX11Engine/blob/master/DirectX11Engine/LightClass.cpp
      and here is the repo with the alternative version of the code that is working with a value of (0,0,0) for the second parameter.
      https://github.com/mister51213/DX11Port_SoftShadows/blob/master/Engine/lightclass.cpp
    • By mister345
      Hi, can somebody please tell me in clear simple steps how to debug and step through an hlsl shader file?
      I already did Debug > Start Graphics Debugging > then captured some frames from Visual Studio and
      double clicked on the frame to open it, but no idea where to go from there.
       
      I've been searching for hours and there's no information on this, not even on the Microsoft Website!
      They say "open the  Graphics Pixel History window" but there is no such window!
      Then they say, in the "Pipeline Stages choose Start Debugging"  but the Start Debugging option is nowhere to be found in the whole interface.
      Also, how do I even open the hlsl file that I want to set a break point in from inside the Graphics Debugger?
       
      All I want to do is set a break point in a specific hlsl file, step thru it, and see the data, but this is so unbelievably complicated
      and Microsoft's instructions are horrible! Somebody please, please help.
       
       
       

    • By mister345
      I finally ported Rastertek's tutorial # 42 on soft shadows and blur shading. This tutorial has a ton of really useful effects and there's no working version anywhere online.
      Unfortunately it just draws a black screen. Not sure what's causing it. I'm guessing the camera or ortho matrix transforms are wrong, light directions, or maybe texture resources not being properly initialized.  I didnt change any of the variables though, only upgraded all types and functions DirectX3DVector3 to XMFLOAT3, and used DirectXTK for texture loading. If anyone is willing to take a look at what might be causing the black screen, maybe something pops out to you, let me know, thanks.
      https://github.com/mister51213/DX11Port_SoftShadows
       
      Also, for reference, here's tutorial #40 which has normal shadows but no blur, which I also ported, and it works perfectly.
      https://github.com/mister51213/DX11Port_ShadowMapping
       
  • Advertisement