c++ Performance boost needed

Started by
34 comments, last by King Mir 9 years, 6 months ago

Hello again. I'm still working on a tool for solving game theory solutions. I'm not yet happy with the performance though and since I'm kinda inexperienced, I believe there's still alot I could improve.

So basically what my program does, is

a) computing expected payouts for strategy A vs B

b) setting As strategy to maximally exploit Bs stategy

c) do the same thing for b vs a

d) repeat

I have a vector<vector<floats>> of the size 1326*(1326/2), which save the expected win% of any hand vs any hand. The code for getting that % is:


/* equities is the vector<vector<float>> */
inline equity getEquity(startingHand hero, startingHand villain) const {
	if (hero > villain)
		return 1.0f - equities[villain][hero-villain];
	else
		return equities[hero][villain-hero];
}

My question is: is the vector slowing my program down here, compared to using a simple array? I read somewhere, that the [] operator of the vector is defined as inline, and when it's called too often (fe. inside a loop), the compiler doesn't inline it and thus the access is slowed down. Is that right? I'm calling that function from inside a 0..1326 loop.

While talking about inlines. Does it make sense to use DEFINE macros somewhere to get a speedboost?

Ok, next question. I had the following piece of code:


// kNumHands = 1326
else { // Showdown
		EV pot = currDecPt->getPlayerCIP(hero) + currDecPt->getPlayerCIP(villain);
		EV remStack = tree->getEffStack() - currDecPt->getPlayerCIP(hero);
		for (int hand = 0; hand < kNumHands; hand++) {
                        // getMostRecentRangeOf would traverse the tree upwards and find the most recent Range of villain.
                        // it returned a copy of that range though (thus copying and array of 1326floats)
			EVs[hero][iDecPt][hand] = pot*(1.0f - getMostRecentRangeOf(villain, iDecPt).getEquityVsHand(hand, currDecPt->eArray)) + remStack;
			if (finish)
				CFs[hero][iDecPt][hand] = (EVs[hero][iDecPt][hand] - remStack) / pot;
		}
	}

I changed the code to:


else { // Showdown
		const Range *r = getMostRecentRangeOf(villain, iDecPt);
		EV pot = currDecPt->getPlayerCIP(hero) + currDecPt->getPlayerCIP(villain);
		EV remStack = tree->getEffStack() - currDecPt->getPlayerCIP(hero);
		for (int hand = 0; hand < kNumHands; hand++) {
			EVs[hero][iDecPt][hand] = pot*(1.0f - r->getEquityVsHand(hand, currDecPt->eArray)) + remStack;
			if (finish)
				CFs[hero][iDecPt][hand] = (EVs[hero][iDecPt][hand] - remStack) / pot;
		}
	}

Cause I took the getMostRecentRangeOf-function out of the loop and didn't return a copy but a ptr, I expected a huge speed boost. But when running with printing the std::clock() difference, it showed no speed boost at all. Shouldn't pointers be alot quicker?

Next stop, "getEquityVsHand":


/**
* Returns the Equity of heroHand vs. villainRange on board - using an EquityArray.
*/
float Range::getEquityVsHand(startingHand villain, const EquityArray *ea) const {
	equity eq = 0.0f; frequency count = 0.0f;
	/* copy Villains Range */
        // handRange is an Array of 1326floats
	handRange temp_freqs;
	std::copy(std::begin(frequencies), std::end(frequencies), std::begin(temp_freqs));
        
        // getCardFromHand - converts the starting hand, represented as an int to two ints, representing two single cards
	card deadcards[7] = { getCardFromHand(villain, 0), getCardFromHand(villain, 1), ea->board[0], ea->board[1], ea->board[2], ea->board[3], ea->board[4] };
        // setHandsWithConflicts iterates over the handRange and deadcards.
        // If they are the same, the frequency of the hand is set to 0.
	setHandsWithConflicts(temp_freqs, deadcards, 7, 0.0f);

	for (int i = 0; i < kNumHands; i++) {
		if (temp_freqs[i] == 0.0f)
			continue;
		count += temp_freqs[i];
		eq += ea->getEquity(i, villain) * temp_freqs[i];
	}
	return eq / count;
}

So this is a little complicated. I thought copying the handRange-array would take alot of performance. So I got rid of all the copying stuff and got the following:


float Range::getEquityVsHand(startingHand villainHand, const EquityArray *ea) const {
	equity eq = 0.0f; frequency count = 0.0f;

	for (int i = 0; i < kNumHands; i++) {
		if (frequencies[i] == 0.0f 
                    || handConflictsBoard(ea->board, hand) || handConflictsHand(hand, villainHand))
			continue;
		count += frequencies[i];
		eq += ea->getEquity(i, villain) * frequencies[i];
	}
	return eq / count;
}

So, what handConflictsBoard and handConflictsHand do, is iterate over the given hands and look up if they conflict in a precomputed std::vector<std::vector<bool>>. So basically the same thing that happened in "setHandsWithConflicts", I just don't have to copy the frequencies first. Now again, I expected a nice boost of performance, but instead, the performance dropped drastically and slowed down by 300%.

Since I don't wanne annoy you guys with anymore code, that's it for now. Enlighten me please, guys ;)

Thanks in advance.

Edit: Oh, I almost forgot. As I said above, I'm computing the best strategy for A, then the best response for B, again for A and so on and so forth. Shouldn't it be possible two have two threads running parallel, one starts to compute A vs B, the other one B vs A. Each one obv has to wait for the other one to finish, before continuing. Ofcourse one player would then be "one step behind", but since the solution is converging against an equilibrium, it should work anyway, right?

Shouldn't that give an immense speed boost?

Advertisement

I am not sure I am answering every question, but I'll answer a few.

vector<vector<something>> will require two indirections, while something[1326][1326] would only require a bit of arithmetic and one memory access. That may indeed be faster, although using twice as much memory might negate any improvement.

Macros should not provide any boost these days. Compilers are pretty good at inlining what can be beneficially inlined.

Calling `getMostRecentRangeOf(villain, iDecPt)' outside the loop is the type of optimization that the compiler could be doing, if it can determine that the answer won't change. That would explain why you don't see an improvement.

In your last code transformation, you don't make a copy of the frequencies, but you also added this: "|| handConflictsBoard(ea->board, hand) || handConflictsHand(hand, villainHand)" to the loop. So you are saving some initialization time, but you are making the loop more expensive. If `kNumHands' is large enough, the new code will end up being slower.

  1. Yes, operator[] is a little slower, especially if you have exceptions/etc turned on; it checks whether range is valid, etc. However your loop isn't big enough to make a any noticeable difference.
  2. std::clock() has horrible accuracy. Even if there is performance difference you won't notice until it's huge or it'll round wrong way giving invalid data, ex. execution time can increase by 1% by rounding will change from 16ms to 32ms. Pointers aren't "faster" or "slower", however when misused they definitely can be slower because of cache misses.
  3. I have suspicions that your handConflictsBoard() function passes values instead of references, thus copying a lot of data when it's not necessary which is likely source of your slowdown.

?If you want to make your code faster first you need to figure which part is the slowest and then see how you can improve it. Optimizing random code parts isn't likely to give significant difference.

I haven't seen the important question asked yet, so I'll be the first:

Have you profiled your program to see what the slow spots actually are?



Everything else is ritualism and voodoo until you've got hard data.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


Yes, operator[] is a little slower, especially if you have exceptions/etc turned on; it checks whether range is valid, etc. However your loop isn't big enough to make a any noticeable difference.

Well the thing is, I'm iterating over a large tree (few hundred items) and then for each node, I loop over 1326hands. When I'm reaching a so called "Showdown"-node, I'm iterating over these hands and looking up their equity in a 1326*(1326/2)vector of floats for EVERY single of those hands. So might this add up, and using arrays is fast?


vector> will require two indirections, while something[1326][1326] would only require a bit of arithmetic and one memory access. That may indeed be faster, although using twice as much memory might negate any improvement.

I'd like to do that, but since I have to create alot of these arrays, I'll get bad_allocs too quickly. Thus, sacrificing memory and doubling the amount of data is not an option. I'd have to somehow create those arrays dynamically.


std::clock() has horrible accuracy. Even if there is performance difference you won't notice until it's huge or it'll round wrong way giving invalid data, ex. execution time can increase by 1% by rounding will change from 16ms to 32ms. Pointers aren't "faster" or "slower", however when misused they definitely can be slower because of cache misses.

What can I use to get a more precise info about my performance?


I have suspicions that your handConflictsBoard() function passes values instead of references, thus copying a lot of data when it's not necessary which is likely source of your slowdown.

I'll check that out, but all I'm passing are small arrays of integers, so copying them shouldn't be a big deal?

Have you profiled your program to see what the slow spots actually are?

As mentioned, I used std::clock() so far, and tried to find slow spots.

Thanks so far.

I'd highly suggest leaving profiling to a real tool instead of just guessing with some rough timer values. A good profiler will tell you things like the exact callstack that consumes the bulk of your time, for instance, or break down every function by its run time.

Manual code instrumentation - like you've described - is a pretty advanced technique that needs a lot of understanding of the hardware and software details before it can be employed effectively. By contrast, a good profiling tool basically highlights code and says "fix this."

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


a good profiling tool

Ok, can you recommend anything? Thank you.

  1. Yes, operator[] is a little slower, especially if you have exceptions/etc turned on; it checks whether range is valid, etc. However your loop isn't big enough to make a any noticeable difference

operator[] does not does not do bounds checking on a std::vector, the at(size_t) member function does do bounds checking and will throw an out_of_range exception. As for it not being inlined. As for inlining as asked by OP, this would vary from compiler to compiler. operator[] is a trivial function and I doubt the fact it is in a loop would prevent it being inlined. You would have to check the assembly code.

Also it's worth repeating what has already been said, when you are really aiming for performance gains a good profiling tool is an absolute must!


My question is: is the vector slowing my program down here, compared to using a simple array? I read somewhere, that the [] operator of the vector is defined as inline, and when it's called too often (fe. inside a loop), the compiler doesn't inline it and thus the access is slowed down. Is that right? I'm calling that function from inside a 0..1326 loop.

Vector is *almost* the same performance as a pure array - in release/optimized mode (debug mode at least in visual studio is horrible - vector can slow you down from 1000 to 8 FPS if you are not careful), since there happens almost no range checks, and most calls are inlined/optimized out.

I'm suprised operator[] is mentioned as being slow - actually, [] is faster than at() because [] does not perform range checks, at least for map-classes - at map, .at() will check if the index is valid, whereas [] will return a valid element eigther way, if the key didn't exist before, it will insert it. I assume it will behave similarily for vector too.

Anyways, for static arrays (int variables[256]), vector is quaranteed to be slower, since it requires at least an additional redirection. Don't use a vector here, eigther stick with pure arrays or use std::array, which is a new std container class for static arrays - vector is dynamic and therefore has certain overhead that doesn't exists for static arrays.


Ok, can you recommend anything? Thank you

http://www.codersnotes.com/sleepy

I don't know if it's up to "industry standards" but it is dead simple to use and gives meaningful data.

This topic is closed to new replies.

Advertisement