Epic Optimization for Particles

Started by
18 comments, last by Muzzy A 11 years, 10 months ago
IF used correctly, std::vector should show equivalent performance to raw arrays IF optimizations are enabled and IF you have a sane implementation. There are caveats here to be wary of, especially if you're cross platform. But knocking std::vector out of your code is not an optimization strategy.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Advertisement
std::vector is my fav =) except when it breaks in places that leave you clueless =\.

It depends on the array that you make, if you allocate new memory for it then it's gonna be just a bit faster than the vector not really anything to notice. But an array on the stack will be even faster, not much faster, but enough to notice an fps change. Either way they're still faster.

Anyways.

I ran Intel Parallel Amplifier on it, and out of 22.7 seconds 14 seconds of it was spent rendering the particles. All of the math and other stuff were all less than a second each.

That's why I was asking about the ID3DXSprite class. There's not really much optimizing left for me to do in the math, i've hard coded several things just to try to achieve 100k particles in 30 fps.

I am using an std::vector, but i really don't think that changing it to an array on the stack would help it much either.
First of all, a stack-allocated array has to fit in the stack space; 100k particles is going to take a chunk of stack that you may well not want to be giving up.

Secondly, allocating an array on the heap and allocating on the stack will emphatically not make any speed difference to accessing the memory. What will make a difference is cache behavior. Your stack space may be in cache... and then again, if you're bloating your stack with particle data, it might not. Memory access latency on modern hardware is a complicated beast and you can't just assume that "oh hey stack is faster than freestore" because that is not necessarily true. In fact, it should be trivial to construct an artificial benchmark where prefetched "heap" pages are accessible faster than stack pages due to the behavior of virtual memory paging and cache locality.

Third, allocating memory with "new foo[]" versus allocating the same memory in a std::vector will make zero difference if your compiler's optimization passes are worth anything. Not a tiny bit, not depending on the array - zero, period. If you reserve() your vector correctly instead of letting it grow/copy up to full size (i.e. if you make a fair comparison) then basically all it does is a call to new foo[] under the covers. There is no performance reason whatsoever to eschew std::vector, and stating that it makes any practical difference whatsoever is liable to mislead people into thinking that they shouldn't use it because "OMFG mai codez must be teh fastar!"

Lastly, "I noticed an FPS change" does not constitute admissible data for performance measurement. I can run the exact same code a dozen times on the exact same hardware and get a deviation of framerate that is noticeable. Just comparing a couple runs of one implementation to a couple runs of something else doesn't give you a valid performance picture.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


In my engine 100.000 particles are rendering decently at 110-120 fps (and I have a 3- years old pc 2GHZ double core.. of course only 1 core is used for moving particles and a Radeon HD4570. ). Of course i'm just moving them around randomly. A more complex behaviour will have major hit on performance.
I suggest Building a million particle system by Lutz Latta. If memory serves, it used to do 100k particles on a GeForce 6600 albeit the performance was quite low, in the range of 20fps I believe.


Third, allocating memory with "new foo[]" versus allocating the same memory in a std::vector will make zero difference if your compiler's optimization passes are worth anything. Not a tiny bit, not depending on the array - zero, period. If you reserve() your vector correctly instead of letting it grow/copy up to full size (i.e. if you make a fair comparison) then basically all it does is a call to new foo[] under the covers. There is no performance reason whatsoever to eschew std::vector, and stating that it makes any practical difference whatsoever is liable to mislead people into thinking that they shouldn't use it because "OMFG mai codez must be teh fastar!"
I think I am misunderstanding everything here.
Perhaps it's just me but it was my understanding [font=courier new,courier,monospace]std::vector[/font] will not take GPU memory. I think the main point of mhagain was to use GPU memory directly through maps. Now, of course we can write an allocator to deal with the reallocations ourselves... besides the fact that's not how vertex buffers are supposed to be used, especially when dealing with particle systems IMHO, now that we have our custom allocator, can we still talk about an [font=courier new,courier,monospace]std::vector[/font]?
I'm afraid not.
Or perhaps we're suggesting to compute everything in system RAM and then copy to GPU memory?
Please explain this clearly so I can understand.
Because perhaps it's just me, but I still have difficulty having [font=courier new,courier,monospace]std::vector[/font] and GPU memory toghether.

Previously "Krohm"

Hm seems I derailed the thread a little, sorry for that I just tend to get defensive when someone claims c-style arrays over std::vectors for performance reasons. And of course Promit is right too. There are various implementations of the standard library, some pretty buggy/slow. And also there are platforms where one simply cannot use it. For most tasks on a standard PC however I would say that one should use the c++ std library extensively.

@Krohm
I guess you are talking about FooBuffer->Map()'ing in D3D10/11? Of course, you get a void* pointer from that function and you will have to copy data into/out of that memory region. I didn't mean to construct a vector from the returned pointer. If you mean something else, mind to explain?

First of all, a stack-allocated array has to fit in the stack space; 100k particles is going to take a chunk of stack that you may well not want to be giving up.

Secondly, allocating an array on the heap and allocating on the stack will emphatically not make any speed difference to accessing the memory. What will make a difference is cache behavior. Your stack space may be in cache... and then again, if you're bloating your stack with particle data, it might not. Memory access latency on modern hardware is a complicated beast and you can't just assume that "oh hey stack is faster than freestore" because that is not necessarily true. In fact, it should be trivial to construct an artificial benchmark where prefetched "heap" pages are accessible faster than stack pages due to the behavior of virtual memory paging and cache locality.

Third, allocating memory with "new foo[]" versus allocating the same memory in a std::vector will make zero difference if your compiler's optimization passes are worth anything. Not a tiny bit, not depending on the array - zero, period. If you reserve() your vector correctly instead of letting it grow/copy up to full size (i.e. if you make a fair comparison) then basically all it does is a call to new foo[] under the covers. There is no performance reason whatsoever to eschew std::vector, and stating that it makes any practical difference whatsoever is liable to mislead people into thinking that they shouldn't use it because "OMFG mai codez must be teh fastar!"

Lastly, "I noticed an FPS change" does not constitute admissible data for performance measurement. I can run the exact same code a dozen times on the exact same hardware and get a deviation of framerate that is noticeable. Just comparing a couple runs of one implementation to a couple runs of something else doesn't give you a valid performance picture.


I apologize, I haven't learned that in school yet. I was just taught that Stack memory is faster than Heap. I havent got to the operating systems class yet so I had no idea. You made me feel like an idiot lol.


Thanks for the link Krohm, i hope it's as helpful as it looks. I'm not giving up til i get that 100k mark lol. If I don't then I'm pretty sure the game I'm working on is going to be pretty laggy.
I guess you are talking about FooBuffer->Map()'ing in D3D10/11? Of course, you get a void* pointer from that function and you will have to copy data into/out of that memory region. I didn't mean to construct a vector from the returned pointer. If you mean something else, mind to explain?
My only goal was to cool down the [font=courier new,courier,monospace]std::vector[/font] performance debate. While I trust it in general, I see no way to mix it with STL containers. It appears to me we either put the data there from scratch or we use std::vector and then copy to VB. That's it.

Previously "Krohm"

ok I managed to get it to 85,000 particles at 25 fps, I think that's about as good as im going to get without using a gpu. I wish the cpu was just like the gpu, what's wrong with making the computers have a gpu rather than a cpu? Things would be much faster.
No they wouldn't.

A GPU is very good at doing a lot of tasks at once when you can generate enough work to hide latency of memory access and jumps. They are high through put, high latency devices.

However not all work loads map to a GPU well, which is where the CPU comes in with its advanced caches, branch prediction and the ability to execute work in a different order. CPUs are low latency, low throughput devices.

Or to put it another way; if you tried to run Word on a GPU then it would run horribly when compared to a CPU because the workload isn't suited to the device.

This is why AMD, Intel and ARM are putting so much effort into devices which combine a CPU and a GPU onto one die; so that work loads can be placed where they make sense.

Unlike what nVidia would probably like you to believe not every workload can be pushed massively parallel and run with 10000x speed up on a GPU, CPUs still very much have their place and will do for some time yet.
well I can't wait until the CPU is much faster than it is now and working with the GPU much more than it is now. I want some performance :P

This topic is closed to new replies.

Advertisement