Phew ok, alot to answer, I'll try my best.
The objective of good optimization usually isn't fixing why lines are slow but fixing why lines are called so often. Try to pick a better algorithm or storage container for your use case and you can usually improve performance.
I will think about that, but I'm not really sure there is another way than bruteforce. Remember, I'm trying to solve theoretical model games, and once the rules become more complex, it's almost impossible to solve the game by solving equations. Instead I'm just looking at every Decision Point with every hand (this is why the loops get so large) and try to find the best response.
the lookup table is probably too big to fit in the cache
This sounds interesting. My tables are computed like this:
/* 1326 * 1326. True if card conflicts with hand. */
static std::vector<std::vector<bool>> fillCardConflictsHand() {
std::vector<std::vector<bool>> result;
for (int card = 0; card < kNumCards; card++) {
result.push_back(std::vector<bool>());
for (int hand = 0; hand < kNumHands; hand++) {
if (handsToCardArray[hand].first == card || handsToCardArray[hand].second == card)
result[card].push_back(true);
else
result[card].push_back(false);
}
}
return result;
}
static std::vector<std::vector<bool>> cardConflictsHand = fillCardConflictsHand();
Then all I'm doing ist:
inline bool Poker::handConflictsCard(startingHand h, card c) {
if (c < kNumCards)
return cardConflictsHand[c][h];
else
return false;
}
First, is that the release build or the debug build?
It's the release build.
How big is kNumHands?
kNumHands is 1326. I believe it does have to be that large, because for every hand a player CAN have, I need the frequency (float) with which he holds that hand. While we're talking about floats. I could give up a little precision and use bytes to represent a percentage from 0to100 instead, but that has nothing to do with my performance, does it?
what is the code behind handRange::operator[](int) ?
handRange is a simple typedef.
typedef int card; /* 0..51 */
typedef int rank; /* 0..12 */
typedef int suit; /* 0..3 */
typedef int startingHand; /* 0..1325 */
typedef card holdemBoard[kBoardLength]; // numerical representation
typedef float equity;
typedef float EV;
typedef float frequency;
typedef float chips;
typedef frequency handRange[kNumHands];
Once I started the project, I felt this would give me a better idea of what I'm doing. I know regret that a little bit, but I don't think it has to do with performance?
unless perhaps your function is not returning a float and the result needs to be converted.
Thats not the case here, but do type conversion take a lot of time?
In the second image I note that the offending 25% line has got (1) a dereference or arrow operator on a class, (2) a function call with parameters, (3) an array index or operator[] on an unspecified class, (4) a multiply of unknown types possibly resulting in a class function, and (5) a += operator on a class.
The function is defined like the following:
class EquityArray {
public:
EquityArray(holdemBoard board_);
/**
* Returns the equity of hero vs. villain on the equity array's board.
* It's much faster than the the getEquityVsHand function, and should such be used whenever possible.
*/
inline equity getEquity(startingHand hero, startingHand villain) const {
if (hero > villain)
return 1.0f - equities[villain][hero-villain];
else
return equities[hero][villain-hero];
}
private:
std::vector< std::vector< equity >> equities;
The vector<vector<equity(float)>> here is precomputed. It gets saved locally and loaded once the Object is created (this is before all the computations are done, I'm then just passing const pointers of it). The parameters are copied, but both of them are ints, so I don't think thats a big deal?
For the third image I see (1) an operator[](int) that is being assigned a value, (2) a function with two paramters, and (3) an assignment operator, inside (4) a nested loop.
I showed you my handRange typedef. Get cardFromHands is again looking up an integer between 0..51 in a table with 1326 entries.
Thanks alot, already guys. Really appreciate you taking the time.
Edit: Just to give you a better idea. The tree I'm currently working on has ~750nodes. 10Iterations, each doing..
1) setting A's best strategy
2) setting B's best strategy
..take around 55seconds to complete.