Only 5.000 Sprites

Started by
11 comments, last by NicoG 13 years ago
some thoughts
1. frames doesn't tell u anything, use ms to compare stuff.
2. try to test if u are filtrate bound, render the particles without updating them every frame.
3. try to shrink your vertex size
4. can u use point spirits instead of your quads? possible if u don't rotate the particles and they face the camera
4.1. experiment with geometrie shaders
5. use instancing, for particles a texture or with OGL3 transform feedback is probably the best
Advertisement
Your update function is doing a lot of memory allocation and releasing - are you calling this every frame? If so it's going to be a potential bottleneck as the numbers get higher.

I wouldn't use C++ new and delete for this kind of thing at all to be honest. On Windows I'd VirtualAlloc (with MEM_RESERVE) a large-ish pool (16-64 MB, but it depends on how many you want to draw really) and VirtualAlloc (with MEM_COMMIT) as needed, but never release memory so that I could reuse previous allocations from frame to frame. Otherwise I'd create an initial static pool of objects that's large enough to cover most common uses, and only allocate/release when the size needs to change (and even then only when it needs to go up). The key however is to always be able to reuse previously allocated memory instead of having to allocate fresh memory every frame. Memory is cheap and plentiful, performance is not.

I wouldn't use a class either (or even a struct with a constructor) for something that I need to dynamically allocate 10s of 1000s of every frame. Here you're allocating a buffer, you're memset-0'ing, it, you're updating the data, you're memcpy'ing it, then you're glBufferSubData'ing it. This actually needs to update data for each vertex five times - constructor, memset-0, update for real, memcpy, and glBufferSubData. That's a lot of walking over the same data,and a lot of unnecessary value-storing (are the constructor and the memset-0 even necessary?), all for a chunk of data that you just end up throwing away. Just use a simple lightweight C style struct and maintain the count of items for it separately.

The point though is that too much OO at too fine a level can really hurt performance, and that if you want your rendering to really fly you may need to start doing things that you consider less "clean".

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Thanks for all the good suggestions.
I have eliminated the memset, but I still need the memcpy.
I also eliminated the renundant OpenGL Calls.
I have also eliminated the allocation of the buffer for sending the data to OpenGL and replaced it with a lightweight, but "safe" solution:

NLVertexData* NLSpriteList::createData(u32 delta, u32 time)
{
TSpriteList::iterator it = m_list.begin();
u32 size = m_list.size()*6;
u32 i = 0;

// Alloc more space only if there is a need to do so.
if ( m_buffer_size < size )
{
delete [] m_buffer;
m_buffer_size = size;
m_buffer = new NLVertexData[m_buffer_size];
}

// Create
for ( it; it != m_list.end(); it++ )
{
(*it).update(delta, time);
memcpy(&m_buffer, &it->m_vertices, sizeof(NLVertexData)*6);
i += 6;
}
return m_buffer;
}

This reduces the usage of the new operator greatly, since normally the size of a batcher rarely changes atm. This is dependant of the usage of my library though, but every lib has its "good practice" section in the docs huh? :D

However, I will examine "placement new" to allocate memory ahead for the batcher and also the created sprites on client side. This way I could save some considerable performance. And it is not Platform dependant like VirtualAlloc().
I just don't know yet how to implement it in detail.
Btw I am up to 75.000 objects in release mode with logic update @ 20 FPS and good timings.
So thanks for all the good hints :). I really apreciate it.
If you say "pls", because it is shorter than "please", I will say "no", because it is shorter than "yes"
http://nightlight2d.de/

This topic is closed to new replies.

Advertisement