Sign in to follow this  
The Communist Duck

Memory Issues For Particle Game

Recommended Posts

Hey. I am starting to make a falling sand game, Here's an example, just for some fun/experience. So far it's going to be data driven, and all the reading from file/parsing bits are fine. The issue I've come across is memory and storage. If we assume the game runs at 1024*768, there could be up to ~780,000 particles on the screen at any one time. (Assuming 1 particle = 1 pixel). Each particle has a position, colour, movement chance (8 floats for the chance of moving in one of the 8 compass directions), name, etc. I think this works out that each particle is about 88 bytes (Or this is what sizeof(Particle) returns). If I fill the screen with them, and look on the task manager, it appears to use about 1,200,000K (max) RAM. Or whatever the 000,000K is. This is compared to only around 800,000K for those big name games. Which is obviously a severe memory issue. Oh, and it drops from 5K-17.5K FPS with blank screen to about 1.5FPS. I'm now stuck as to how to organise this. I have a "master" std::map<string, Particle> which holds data about each different element/type of particle (only one). My first idea to solve it was just to have a std::list of pointers to the master copy, but this wouldn't work for the position. Would it be possible to have, for example, a separate class which held a pointer to the "master" in the map, and then a separate position variable? Thank you very much for your time and/or help. :D *I apologise in advance if any of these calculations are wrong.* *I would also test this, but do not have access to a compiler right now.* -Mark

Share this post


Link to post
Share on other sites
Captain P    1092
Quote:
Original post by The Communist Duck
(Assuming 1 particle = 1 pixel).
Each particle has a position, colour, movement chance (8 floats for the chance of moving in one of the 8 compass directions), name, etc. I think this works out that each particle is about 88 bytes (Or this is what sizeof(Particle) returns).
Why 8 floats for movement change? 2 floats should be sufficient: movement along the x axis and along the y axis. You should probably be able to turn that into 20 bytes per particle.

Quote:
If I fill the screen with them, and look on the task manager, it appears to use about 1,200,000K (max) RAM. Or whatever the 000,000K is. This is compared to only around 800,000K for those big name games. Which is obviously a severe memory issue. Oh, and it drops from 5K-17.5K FPS with blank screen to about 1.5FPS.
Task manager is a pretty inaccurate way to determine memory usage, but I think the problem is obvious in this case.

Quote:
Would it be possible to have, for example, a separate class which held a pointer to the "master" in the map, and then a separate position variable?
Of course. It's an example of the flyweight pattern.


However, if each particle is a pixel, why work with floating point values? You can't accurately display that, and you're going to have to do quite a bit of collision checking in order to handle that. You may want to use integers or shorts for position and velocity instead - or use a grid of material pointers (where null-pointers would indicate empty spots of air). It'll probably make something like a falling sand game much easier (and more efficient) to implement.

Share this post


Link to post
Share on other sites
EJH    315
Quote:
Original post by The Communist Duck
I'm now stuck as to how to organise this. I have a "master" std::map<string, Particle> which holds data about each different element/type of particle (only one). My first idea to solve it was just to have a std::list of pointers to the master copy, but this wouldn't work for the position.

Would it be possible to have, for example, a separate class which held a pointer to the "master" in the map, and then a separate position variable?


That would work. Or you could make a few particle classes and have the constant variables, aside from position, be statics.

Also, if you are doing manual memory management, you will want to pre-allocate the max number of particles and then use some type of active/inactive flag. Creating/destroying hundreds of particles per frame will hit your FPS hard.

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by The Communist Duck

If we assume the game runs at 1024*768, there could be up to ~780,000 particles on the screen at any one time. (Assuming 1 particle = 1 pixel).


780,000 * 88bytes = ~65 megabytes.

Quote:
If I fill the screen with them, and look on the task manager, it appears to use about 1,200,000K (max) RAM.

Just how exactly do you store the particles?

Is it anything else than vector<Particle> or Particle particles[MAX]? If so, then you have a huge problem.

Quote:
My first idea to solve it was just to have a std::list of pointers to the master copy, but this wouldn't work for the position.

Remember linked lists? Now forget them permanently. Their cache behavior is atrocious, and cache will be killing you here.

There are two options. One is AoS, other os SoA. In first you have vector<Particle>, in other you have multiple vectors, something like vector<Position>, vector<Color>, ... depending on how you most commonly access them.

Quote:
Would it be possible to have, for example, a separate class which held a pointer to the "master" in the map, and then a separate position variable?


Having a shared template for immutable data *may* help, unless the access to that shared data is too arbitrary (dozens, hundreds of kilobytes), so that each access would trash the L1 cache.

With structures that big, cache will be biggest performance hog, so everything should be optimized for sequential and fully predictable access. What you don't want in any way, shape or form is random access, dynamic memory allocation or following pointers. You also do not want non-sequential data structures, so while map can be used in outer loop, it must not come even close to main update procedure.

[Edited by - Antheus on February 16, 2010 8:27:31 AM]

Share this post


Link to post
Share on other sites
szecs    2990
I took a look on some falling sand games, I think there isn't any real physics in it. If particles can fall, they will fall with one pixel.
If a particle can't fall, it tries to go right or left, you can work out how.
My point is that I'm sure that only the whole screen is stored in an width*height array.
It's size depends of how many kind of sands you want to have.

If you have one type of sand and obstacles plus background: that's only 3 bits of data.

The falling is simply done through manipulating the whole array. In fact, the game in the link you posted, it's quite slow, because you have to go through the whole array in every frames. So one pixel will advance only one pixel in a frame.
Maybe you need an other same sized temp array too, I'm not sure.

I did a similar thing in a tank wars game: when the scorched terrain collapses. It's just manipulating a 2 bit array.

Share this post


Link to post
Share on other sites
Hey everyone, thanks for the replies.

Quote:
Original post by Captain P
Why 8 floats for movement change? 2 floats should be sufficient: movement along the x axis and along the y axis. You should probably be able to turn that into 20 bytes per particle.


I used 8 because my plan was each particle had a chance to move in each of the 8 directions:

Each frame, the chances are calculated for each particle moving in a certain direction.

If we assume the down chance is 0.95;
The down left/down right chances are 0.15;
and the left/right chances are 0.05;

Then out of 1000 particles starting at (x,y):
950 would move to at least (x, y+1).
150 would move to at least (x+1/x-1, y+1).
50 would move to at least (x+1/x-1, y).
This isn't mutually exclusive, so about 150 would move to (x+1, y+2). Then about 7 of those could move left or right, ending up at a position (x+2, y+2).




Running the sizeof(short) giving me 2bytes rather than 4 for float/int, I guess it would be best to store the positions in shorts.

Quote:
Original post by Antheus
780,000 * 88bytes = ~65 megabytes.

That's what I worked out, but I wasn't sure how I would find out if that was the accurate memory used.

Quote:
Original post by Antheus
Is it anything else than vector<Particle> or Particle particles[MAX]? If so, then you have a huge problem.

Remember linked lists? Now forget them permanently. Their cache behavior is atrocious, and cache will be killing you here.

There are two options. One is AoS, other os SoA. In first you have vector<Particle>, in other you have multiple vectors, something like vector<Position>, vector<Color>, ... depending on how you most commonly access them.

I've never heard anything about cache before. I guess it's something I'd better look up.
I have been preferring list over vector because in a vector of 100 elements, doesn't deleting the, say 3rd element, mean 97 elements need to be moved?

Thanks everyone again, for your points. I'm now just thinking it over about the exact implementation. :D

-Mark.

EDIT: Looking at that container choices flow chart, it seemed to point me towards a deque. Aren't deques and lists identical in all but a slight implementation difference? At least it is AFAIK (so not far).

Share this post


Link to post
Share on other sites
Zipster    2365
Quote:
Original post by The Communist Duck
I've never heard anything about cache before. I guess it's something I'd better look up.

Yes indeed! The gist of it is, whenever the CPU needs to access memory, it first needs to be in the L1 cache. If it isn't there, it triggers a "cache miss" which means that the CPU has to wait while that data is retrieved. These misses are expensive, and in the worst case it could even trigger disk access (many orders of magnitude slower than anything else going on in hardware). The catch however is that data is moved into the cache in chunks, a.k.a "lines", which are roughly in the area of 64 bytes on modern hardware (varies by processor). So the secret is to access memory sequentially, so that when you trigger a cache miss on the first byte, in the best case you get the next 63 bytes in memory for "free" without any more cache misses. If you jump around memory ad hoc, then you're constantly triggering cache misses, and constantly stalling the CPU.

Quote:
I have been preferring list over vector because in a vector of 100 elements, doesn't deleting the, say 3rd element, mean 97 elements need to be moved?

Not necessarily. Think what happens when you remove a dead particle, and then spawn a new particle. You remove an element, and then immediately add another one. Doesn't it seem like you shouldn't have removed the first one since you're just going to reuse it again?

Instead of removing elements, split the vector into two partitions, a "live" partition and a "dead" partition, the total size being the total number of particles. Whenever you add a particle, you just move the partition over by one so that a particle moves from the dead partition to the live partition. Killing a particle is similar, you swap it to the end of the live partition, then shift it over by one so it's now in the dead partition. In the worst case, all you're doing is swapping a single particle. Plus the fact that all living particles are sequential in the beginning of the vector means you play nice with the cache as well.

Share this post


Link to post
Share on other sites
Antheus    2409
Here is how such a simulation is implemented.

You do not keep a list of particles, or create/destroy them.

Instead, the board is the state. If there is no particle in a certain pixel, it is still a particle of NONE type or similar.

To update, iterate over entire board, and have a switch statement to determine what to do based on that particle type.

The example above updates by iterating from bottom to top gets the falling behavior naturally from this. For more complex updates, one would keep two states. Iterate over old state, and based on value of particle, write into new state. Then flip the buffers.

And usually, there is no need for velocity or similar, so each particle can be expressed with a handful of bits, perhaps a byte at most. Its type defines its behavior, and the types of surrounding particles define interaction with environment. For example, if a sand particle would fall down, but hit wall, it tries to move down and left and down and right. If it would hit another particle or wall in doing so, the particle is stuck.

Other effects are implemented in a similar way. For example, fire checks if any of neighbors are plants. If yes, it selects a random one and changes it into fire. In addition, original fire particle moves up.

This type of simulation is a basic celular automata, which means that rules are trivial, but emergent behavior is complex result of these rules applied to large number of cells.

Share this post


Link to post
Share on other sites
szecs    2990
Quote:
Original post by Antheus
Here is how such a simulation is implemented.

You do not keep a list of particles, or create/destroy them.

Instead, the board is the state. If there is no particle in a certain pixel, it is still a particle of NONE type or similar.

To update, iterate over entire board, and have a switch statement to determine what to do based on that particle type.

The example above updates by iterating from bottom to top gets the falling behavior naturally from this. For more complex updates, one would keep two states. Iterate over old state, and based on value of particle, write into new state. Then flip the buffers.

And usually, there is no need for velocity or similar, so each particle can be expressed with a handful of bits, perhaps a byte at most. Its type defines its behavior, and the types of surrounding particles define interaction with environment. For example, if a sand particle would fall down, but hit wall, it tries to move down and left and down and right. If it would hit another particle or wall in doing so, the particle is stuck.

Other effects are implemented in a similar way. For example, fire checks if any of neighbors are plants. If yes, it selects a random one and changes it into fire. In addition, original fire particle moves up.

This type of simulation is a basic celular automata, which means that rules are trivial, but emergent behavior is complex result of these rules applied to large number of cells.
This is exactly I was trying to say, but my English is so poor, and I don't know the terms, thanks for explaining it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this