c++ Performance boost needed

Started by
34 comments, last by King Mir 9 years, 6 months ago

std::vector<std::vector<bool>> result;
for (int card = 0; card < kNumCards; card++) {
result.push_back(std::vector<bool>());
for (int hand = 0; hand < kNumHands; hand++) {

Reserve. Reserve. Reserve. You know the sizes of your results, so reserve.


result.reserve(kNumCards);
for (int card = 0; card < kNumCards; card++) {
  result.push_back(std::vector<bool>());
  result.back().reserve(kNumHands);

This will avoid a lot of allocation/copy that is happening behind the scenes, and will help improve your write speed.

So, thi is a static method and only called once, at the start of the program. I dont care too much how slow that is, I'll fix it though obviously. But does it matter for later performanece of my program? Like, does the data lie in the memory more organized or something like that, so that the acccess is quicker?

Oh and what do you think about using two thread working parallel? My code looks like this:


/*** FICTICIOUS PLAY ***/
void StrategyPair::equilibrize(bool finish) {
        /* Couldnt I split this spart into a seperate thread and then work both strats simultaniously? */
        /* Part A */
	setMaxExploitEVs(SB, BB, finish);
	std::map<int, Range> sbMaxEVStrat = getMaxEVStrat(SB);
	updateRanges(SB, sbMaxEVStrat, totalIterationsDone);
        /* Part B */
	setMaxExploitEVs(BB, SB, finish);
	std::map<int, Range> bbMaxEVStrat = getMaxEVStrat(BB);
	updateRanges(BB, bbMaxEVStrat, totalIterationsDone);

	totalIterationsDone++;
}

Could I expect a double speedboost with that?

Advertisement

kNumHands is 1326. I believe it does have to be that large, because for every hand a player CAN have, I need the frequency (float) with which he holds that hand. While we're talking about floats. I could give up a little precision and use bytes to represent a percentage from 0to100 instead, but that has nothing to do with my performance, does it?

Using bytes instead of floats may help performance mainly because it makes the table 4 times smaller. Integer operations are also faster than floating point, but when you're using a lot of memory, that's not your bottleneck.

kNumHands is 1326. I believe it does have to be that large, because for every hand a player CAN have, I need the frequency (float) with which he holds that hand. While we're talking about floats. I could give up a little precision and use bytes to represent a percentage from 0to100 instead, but that has nothing to do with my performance, does it?

Using bytes instead of floats may help performance mainly because it makes the table 4 times smaller. Integer operations are also faster than floating point, but when you're using a lot of memory, that's not your bottleneck.

So, just to let me understand how this works. The problem of using alot of memory, is that it's split up over more caches and when I'm trying to access that memory, my program first has to search for where the data lies? So when I'm using 4times smaller data, the program should be able to find it ~4timmes as fast and thus speeding up all the array accesses by a factor of ~4?

Edit: would this still be an improvement, even tho I would have to convert the bytes to floats every time i access them? When I need the equity, saved in a byte, I would return it as float(value/255). Is that too slow?

That's not really how it works... the details are numerous and it's late so I won't try typing up a full explanation here, but try some web searches on "how memory caching works" to get a basic overview.

If you want the full rundown, this doc is a little dated but still fundamentally accurate.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Ok thanks, I read a little bit about it and think there are a few ways to improve. This question still remains tho:

would this still be an improvement, even tho I would have to convert the bytes to floats every time i access them? When I need the equity, saved in a byte, I would return it as float(value/255). Is that too slow?

Now I encountered another weird phenomenon. There are situations where I initalize both players strategies with a range containing only the top 25% of alls pokerhands f.e.

So instead of 1326 hands, the ranges only contain 1/4 of that. What I did now, was that when I initialize my strategies, I create two vectors<startingHand> only containing those hands which are played with a nonzero frequency (the top25% of hands in this example). Now instead of iterating over ALL hands when finding my best strat, I did something like this:


std::map<PlayerType, std::vector<startingHand>> rangeHands;

for(int i=0; i < rangeHands[playerA].size(); i++) {
      // now instead of "i" representing the hand, I use "rangeHands[playerA][i]"
      //e.g.
     EV[playerA][DecisionPt1][rangeHands[playerA][i]] = equity(rangeHands[playerA][i]) * pot;
}

I expected this to be alot quicker than iterating over all hands for the given 25% ranges. Indeed it was.

Previously:

1ca99719c30fdf28af264dc2cc17f6f1.png

After improvement:

d5fb3d8bb01b533f4c5b40a19caf541e.png

What I also expected was, that when running the full 100% ranges again, this would most likely be slower, or at least the same speed, since I always had to look up the startingHand in the vector first. But I checked that again, and even with full ranges, this was quicker:

Without improvement:

0d503e2abcb04dfd4836e5ac9b17f710.png

With improvement (meaning iterating over the vector<startingHand>):

f4db4dbb7fe193ec2bc47e845bc9eb0b.png

How is that possible?

I don't have enough time to answer this properly, but my guess is that your memory layout is such that iterating one way is discontinuous (i.e. jumping around a lot) versus more contiguous, where the latter is better for cache coherency and therefore speed. Your newfound learning on caches should illuminate why this is :-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

kNumHands is 1326. I believe it does have to be that large, because for every hand a player CAN have, I need the frequency (float) with which he holds that hand. While we're talking about floats. I could give up a little precision and use bytes to represent a percentage from 0to100 instead, but that has nothing to do with my performance, does it?

Using bytes instead of floats may help performance mainly because it makes the table 4 times smaller. Integer operations are also faster than floating point, but when you're using a lot of memory, that's not your bottleneck.

So, just to let me understand how this works. The problem of using alot of memory, is that it's split up over more caches and when I'm trying to access that memory, my program first has to search for where the data lies? So when I'm using 4times smaller data, the program should be able to find it ~4timmes as fast and thus speeding up all the array accesses by a factor of ~4?

Edit: would this still be an improvement, even tho I would have to convert the bytes to floats every time i access them? When I need the equity, saved in a byte, I would return it as float(value/255). Is that too slow?

There's no searching involved. A cache (CPU or otherwise) is a local copy of data that would otherwise be slow to fetch each time. For example, your browser cache keeps a copy of the websites you visit, so that whenever you return to the same page, it loads faster. A cpu cache is fundamentally the same. It's high speed memory that's closer to the cpu core. But just like you can't download the whole internet (unless you're google), cpu caches have limmited space. So it will forget what's in memory that you haven't accesses in a while. Using less memory means you can fit more of your data in the cache.

That's a very rough overview of caching. The details of the cpu cache are more complicated than other caches, and there can be 3 levels of CPU cache. It's not a simple fifo queue for what gets removed, because the whole thing is optimized to be really fast to access. But that's the idea, and it explains why using less memory is better.

So, just to let me understand how this works. The problem of using alot of memory, is that it's split up over more caches and when I'm trying to access that memory, my program first has to search for where the data lies? So when I'm using 4times smaller data, the program should be able to find it ~4timmes as fast

this would have happened only if you had perfect cache no-escaping from within your cycles.


for(int i=0; i < rangeHands[playerA].size(); i++)

Compiler optimization will sure not optimize the condition expression and you will always read from a dynamic value samwhere far away, escaping a cache rapidly per cycle makes the fact that you reduced the size of array 4 times will do nearly nothing.

Ok, sorry for not responding in a while, I've been a little bit busy.

I just wanted to let you know that - thanks to your help - I was able to significantly improve the performance, but more importantly, this thread really taught me alot ;)

Thanks to all of you!

While this was mentioned, I didn't see it expounded upon. I think you need to go away from vector of vector's, and use 2 dimensional array's considering you know the size ahead of time. That alone should help improve cache coherency and there are no extra calls made to access the location. I highly doubt you'll have memory size issues on today's machines.

Considering an TSomeType array[X][Y] is a single piece of memory with the size of X*Y*sizeof(TSomeType ) bytes, while looping through it, you'll be going through memory consecutively. Thus, on cache misses, it will be loading the next 64 bytes (typically) into cache, and you'll always use the full 64 bytes (except the last bytes possibly). When using a vector<vector<TSomeType > >, each vector might be store in a different part of memory.

I'd suggest doing that for any container types that you know the size of in advance, including the vector<vector<bool> > container.

My Gamedev Journal: 2D Game Making, the Easy Way

---(Old Blog, still has good info): 2dGameMaking
-----
"No one ever posts on that message board; it's too crowded." - Yoga Berra (sorta)

This topic is closed to new replies.

Advertisement