Sign in to follow this  
Siltnamis

Could this really be that slow?

Recommended Posts

Siltnamis    119

Hello im not sure if this is the forum to ask this,. So i was interested in  Conway's Game of Life and decided to make it in c++ using sfml for graphics/input. The project is very simple only a couple hundred of lines: https://github.com/Siltnamis/gameoflife . It works fairly well with low number of cells like 100x100 grid. But i with bigger numbers like 1000x1000 the program runs very slow. I made some measurements and found out that the function called Game::getNewCellData() takes over a second to run. The function just does some checks and modifies a vector. Should it really be that slow with a million of elements or my code is really bad?

Any response is appreciated.

Share this post


Link to post
Share on other sites
Waterlimon    4398

If I calculated correctly, on 4 GHz core that would be like 4000 cycles for each cell. Given that for each cell you check a region of 9 cells (~500 cycles per checked cell), I guess thats feasible if theres some cache misses and branch mispredictions (googling tells me the former can be hundreds of cycles).

 

Are you running in release mode?

 

Is there a certain size of grid where performance suddenly falls? Maybe you can make a graph for fun smile.png

Edited by Waterlimon

Share this post


Link to post
Share on other sites
Siltnamis    119

Thank you for response, i tried compiling with -O3, -O2, -O1(i'm using g++) and performance did not improve that much. Did some more testing and found out that drawing part gets slower more rapidly(about 4 times) than updating cell data(makes sense because drawing calls probably are more expensive). Is there more efficient way to do these things? and how do you draw a million rectangles in reasonable time?(maybe use openGL directly) i get like 2 second draw times. I just want to figure out if i'm writing bad code for doing things or this is how it should be biggrin.png

 

By the way how do you calculate how many cycles roughly an operation would take?

Edited by Siltnamis

Share this post


Link to post
Share on other sites
Katie    2244

Your code is probably a little naive; brute-force approaches to Life aren't particularly machine optimisable -- the compiler's optimisation won't make much of a dent because there's not a huge set of places to go. It's a simple set of operations run a lot of times and the optimiser can't see far enough to improve them.

 

There are much faster solutions, but they are algorithmic optimisations. Eg; using a quad-tree to detect "idle" sections of the plane and skip them. Multiple chasing-pointers which can turn the 8 surrounding cell checks into just a couple. (When you come to look at pixel (x,y) you already know 5 out of the 8 cells. You used them for (x-1, y)...)

 

There are other approaches including run-length encoding the bitmap (reducing the amount of memory accesses needed), sparse grid techniques (Life planes are often sparsely occupied). Dirty rectangle systems, bit-fiddling approaches that can evaluate multiple cells at a time (encode cell neighbour counts in 3 bits and the current state in 1 and now in a 64 bit register you can operate on 16 cells at a time..

 

You can get away without having a grid at all -- Life plane changes have a speed limit; propagation is 1 cell per generation at most. So not only do you know the cells which might die (the ones which were alive last time), you also know all the cells that might possibly become alive (their neighbours). There are optimisations on these techniques which rely on identifying cells which haven't changed state recently (they're part of a static structure).. you can stop looking at them as well until change propagates out to them.

 

If you're going to use OpenGL, REALLY use it. You can count the neighbours by taking the current bitmap and adding it into a blank bitmap shifted a pixel in each of the 8 directions... suddenly you're operating on hundreds of cells at a time on the GPU. The life/death decisions is another pixel shader -- and then you can render the bitmap without needing the CPU to do any work at all.

 

Life is actually a really great way to try out a ton of approaches to optimising :-)

Share this post


Link to post
Share on other sites

Draw a sf::VertexArray. Not a million seperate sf::RectangleShapes. You should have one sf::Texture containing two sub-images: Live cell, Dead cell. 

You're also outputting to std::cout multiple times per frame - that may cause your program to slow down some, because you're interacting with OS calls. How about only output once every ten frames?

 

What MS frame times are you actually getting? Define "slow". Definitely make sure you aren't running in debug mode.

 

Do you realize the difference between microseconds and milliseconds? If perfClock.restart().asMicroseconds() returns 1000, that means it's 1 millisecond.

 

Millisecond = 1,000th of a second.

Microsecond = 1,000,000th of a second.

 

If it's returning exactly 1000 (not 997, or 1002), then that means it's actualling taking less than one millisecond, but that the timer lacks the precision to give accurate microsecond measurements, so it's rounding to the nearest millisecond's worth.

Share this post


Link to post
Share on other sites
Siltnamis    119

Draw a sf::VertexArray. Not a million seperate sf::RectangleShapes. You should have one sf::Texture containing two sub-images: Live cell, Dead cell. 

You're also outputting to std::cout multiple times per frame - that may cause your program to slow down some, because you're interacting with OS calls. How about only output once every ten frames?

 

What MS frame times are you actually getting? Define "slow". Definitely make sure you aren't running in debug mode.

 

Do you realize the difference between microseconds and milliseconds? If perfClock.restart().asMicroseconds() returns 1000, that means it's 1 millisecond.

 

Millisecond = 1,000th of a second.

Microsecond = 1,000,000th of a second.

 

If it's returning exactly 1000 (not 997, or 1002), then that means it's actualling taking less than one millisecond, but that the timer lacks the precision to give accurate microsecond measurements, so it's rounding to the nearest millisecond's worth.

 

 Thank you for response.Those std::cout calls i added only after i saw how slow my code ran. Yes i know microseconds are 10^-6 s. I get over 1100000us draw time, and over 30000us cell data update(so i guess i misread that one a bit :D). so the main bottleneck is drawing. I will take a look at sf::VertexArray.

Share this post


Link to post
Share on other sites
Siltnamis    119

Your code is probably a little naive; brute-force approaches to Life aren't particularly machine optimisable -- the compiler's optimisation won't make much of a dent because there's not a huge set of places to go. It's a simple set of operations run a lot of times and the optimiser can't see far enough to improve them.

 

There are much faster solutions, but they are algorithmic optimisations. Eg; using a quad-tree to detect "idle" sections of the plane and skip them. Multiple chasing-pointers which can turn the 8 surrounding cell checks into just a couple. (When you come to look at pixel (x,y) you already know 5 out of the 8 cells. You used them for (x-1, y)...)

 

There are other approaches including run-length encoding the bitmap (reducing the amount of memory accesses needed), sparse grid techniques (Life planes are often sparsely occupied). Dirty rectangle systems, bit-fiddling approaches that can evaluate multiple cells at a time (encode cell neighbour counts in 3 bits and the current state in 1 and now in a 64 bit register you can operate on 16 cells at a time..

 

You can get away without having a grid at all -- Life plane changes have a speed limit; propagation is 1 cell per generation at most. So not only do you know the cells which might die (the ones which were alive last time), you also know all the cells that might possibly become alive (their neighbours). There are optimisations on these techniques which rely on identifying cells which haven't changed state recently (they're part of a static structure).. you can stop looking at them as well until change propagates out to them.

 

If you're going to use OpenGL, REALLY use it. You can count the neighbours by taking the current bitmap and adding it into a blank bitmap shifted a pixel in each of the 8 directions... suddenly you're operating on hundreds of cells at a time on the GPU. The life/death decisions is another pixel shader -- and then you can render the bitmap without needing the CPU to do any work at all.

 

Life is actually a really great way to try out a ton of approaches to optimising :-)

Wow.. I did not even think that  there are so many interesting ways to do these things.  I thought "no way people cant run Life smoothly with million cells. How others do it?! :D" I had some fiddling with openGL so maybe i'll even try doing something with it. I found shaders really interesting. Many thanks for informative answer, really made me more excited about this "simple" project :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this