Back to General and Gameplay Programming

c++ Performance boost needed

.chicken · 2014-10-04T15:07:50

Hello again. I'm still working on a tool for solving game theory solutions. I'm not yet happy with the performance though and since I'm kinda inexperienced, I believe there's still alot I could improve. So basically what my program does, is a) computing expected payouts for strategy A vs B b) setting As strategy to maximally exploit Bs stategy c) do the same thing for b vs a d) repeat I have a vector<vector<floats>> of the size 1326*(1326/2), which save the expected win% of any hand vs any hand. The code for getting that % is: /* equities is the vector<vector<float>> */ inline equity getEquity(startingHand hero, startingHand villain) const { if (hero > villain) return 1.0f - equities[villain][hero-villain]; else return equities[hero][villain-hero]; } My question is: is the vector slowing my program down here, compared to using a simple array? I read somewhere, that the [] operator of the vector is defined as inline, and when it's called too often (fe. inside a loop), the compiler doesn't inline it and thus the access is slowed down. Is that right? I'm calling that function from inside a 0..1326 loop. While talking about inlines. Does it make sense to use DEFINE macros somewhere to get a speedboost? Ok, next question. I had the following piece of code: // kNumHands = 1326 else { // Showdown EV pot = currDecPt->getPlayerCIP(hero) + currDecPt->getPlayerCIP(villain); EV remStack = tree->getEffStack() - currDecPt->getPlayerCIP(hero); for (int hand = 0; hand < kNumHands; hand++) { // getMostRecentRangeOf would traverse the tree upwards and find the most recent Range of villain. // it returned a copy of that range though (thus copying and array of 1326floats) EVs[hero][iDecPt][hand] = pot*(1.0f - getMostRecentRangeOf(villain, iDecPt).getEquityVsHand(hand, currDecPt->eArray)) + remStack; if (finish) CFs[hero][iDecPt][hand] = (EVs[hero][iDecPt][hand] - remStack) / pot; } } I changed the code to: else { // Showdown const Range *r = getMostRecentRangeOf(villain, iDecPt); EV pot = currDecPt->getPlayerCIP(hero) + currDecPt->getPlayerCIP(villain); EV remStack = tree->getEffStack() - currDecPt->getPlayerCIP(hero); for (int hand = 0; hand < kNumHands; hand++) { EVs[hero][iDecPt][hand] = pot*(1.0f - r->getEquityVsHand(hand, currDecPt->eArray)) + remStack; if (finish) CFs[hero][iDecPt][hand] = (EVs[hero][iDecPt][hand] - remStack) / pot; } } Cause I took the getMostRecentRangeOf-function out of the loop and didn't return a copy but a ptr, I expected a huge speed boost. But when running with printing the std::clock() difference, it showed no speed boost at all. Shouldn't pointers be alot quicker? Next stop, "getEquityVsHand": /** * Returns the Equity of heroHand vs. villainRange on board - using an EquityArray. */ float Range::getEquityVsHand(startingHand villain, const EquityArray *ea) const { equity eq = 0.0f; frequency count = 0.0f; /* copy Villains Range */ // handRange is an Array of 1326floats handRange temp_freqs; std::copy(std::begin(frequencies), std::end(frequencies), std::begin(temp_freqs)); // getCardFromHand - converts the starting hand, represented as an int to two ints, representing two single cards card deadcards[7] = { getCardFromHand(villain, 0), getCardFromHand(villain, 1), ea->board[0], ea->board[1], ea->board[2], ea->board[3], ea->board[4] }; // setHandsWithConflicts iterates over the handRange and deadcards. // If they are the same, the frequency of the hand is set to 0. setHandsWithConflicts(temp_freqs, deadcards, 7, 0.0f); for (int i = 0; i < kNumHands; i++) { if (temp_freqs[i] == 0.0f) continue; count += temp_freqs[i]; eq += ea->getEquity(i, villain) * temp_freqs[i]; } return eq / count; } So this is a little complicated. I thought copying the handRange-array would take alot of performance. So I got rid of all the copying stuff and got the following: float Range::getEquityVsHand(startingHand villainHand, const EquityArray *ea) const { equity eq = 0.0f; frequency count = 0.0f; for (int i = 0; i < kNumHands; i++) { if (frequencies[i] == 0.0f || handConflictsBoard(ea->board, hand) || handConflictsHand(hand, villainHand)) continue; count += frequencies[i]; eq += ea->getEquity(i, villain) * frequencies[i]; } return eq / count; } So, what handConflictsBoard and handConflictsHand do, is iterate over the given hands and look up if they conflict in a precomputed std::vector<std::vector<bool>>. So basically the same thing that happened in "setHandsWithConflicts", I just don't have to copy the frequencies first. Now again, I expected a nice boost of performance, but instead, the performance dropped drastically and slowed down by 300%. Since I don't wanne annoy you guys with anymore code, that's it for now. Enlighten me please, guys ;) Thanks in advance. Edit: Oh, I almost forgot. As I said above, I'm computing the best strategy for A, then the best response for B, again for A and so on and so forth. Shouldn't it be possible two have two threads running parallel, one starts to compute A vs B, the other one B vs A. Each one obv has to wait for the other one to finish, before continuing. Ofcourse one player would then be "one step behind", but since the solution is converging against an equilibrium, it should work anyway, right? Shouldn't that give an immense speed boost?

General and Gameplay Programming Programming

Started by .chicken September 17, 2014 07:52 PM

34 comments, last by King Mir 9 years, 7 months ago

ApochPiQ

23,138

September 25, 2014 06:54 PM

Another extension of that approach is to use std::vector and just .reserve() x*y entries for your two-dimensional structure. Access them with [x + y*row_width].

This of course only works if all of your "rows" are the same length.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

King Mir

2,506

September 25, 2014 09:34 PM

Another extension of that approach is to use std::vector and just .reserve() x*y entries for your two-dimensional structure. Access them with [x + y*row_width].

This of course only works if all of your "rows" are the same length.

I think a better way to achieve this memory layout would be a unique_ptr<array<array<int,X>,Y>>. That allows you to heap allocate the memory no earlier than you need it, uses normal array indexing syntax, and disallows resizing of the array, In theory it could also allow compile time bounds checking (as of C++14) for constant indexes.

std::array is also much easier to use than C style arrays, so I'd recommend always preferring it.

Ravyne

14,306

September 26, 2014 12:31 AM

I think a better way to achieve this memory layout would be a unique_ptr<array<array<int, X>,Y>>.

Unless I'm missing something, I don't see a need for unique pointer. It sounds like OP builds the table out once and then only uses it for lookup. If that's the case, the lifetime of the table can almost certainly be determined by its location on the stack. Also, std::array<int, X*Y> is still valid -- You'll have to calculate the index like ApockPiQ showed, but you'll still get whatever static or runtime checks std::array offers, you won't get them on each dimension individually but you'll get them on the whole of the std::array, and that ought to at least help prevent most bugs (most indexing bugs are going to go off the reservation entirely, especially with a non-square array; its pretty rare to see an indexing bug that manages to stay only inside its dimensionality and never step outside its total bounds.)

Do be aware of axis swapping, though -- array<array<int, X>, Y> is the same memory layout as int[Y][X], which are both what you want for row-major traversal patterns, but notice how the relative placement of X and Y are swapped depending on whether nested containers or native arrays are used.

throw table_exception("(? ???)? ? ???");

King Mir

2,506

September 28, 2014 04:12 AM

I think a better way to achieve this memory layout would be a unique_ptr<array<array<int, X>,Y>>.

Unless I'm missing something, I don't see a need for unique pointer. It sounds like OP builds the table out once and then only uses it for lookup. If that's the case, the lifetime of the table can almost certainly be determined by its location on the stack. Also, std::array<int, X*Y> is still valid -- You'll have to calculate the index like ApockPiQ showed, but you'll still get whatever static or runtime checks std::array offers, you won't get them on each dimension individually but you'll get them on the whole of the std::array, and that ought to at least help prevent most bugs (most indexing bugs are going to go off the reservation entirely, especially with a non-square array; its pretty rare to see an indexing bug that manages to stay only inside its dimensionality and never step outside its total bounds.)

The point of the unique pointer in my example is to move the large array from the stack to the heap. You generally don't want large arrays eating up your stack space. I also wanted to show an identical layout as Apoch suggested, but with more safety.

Yes you could use array<int,X*Y>, but why would you?

Ravyne

14,306

October 03, 2014 10:39 PM

The point of the unique pointer in my example is to move the large array from the stack to the heap.

I see, that makes sense if the size of the array is a problem. But the size might not be a problem, and if it isn't its quicker to allocate the array on the stack. If a size or range of sizes is known statically, OP can simply tune the function. If size is determined at runtime and spans a range of sizes from stack-appropriate to not, then there might be a clever way to switch at runtime without duplicating too much code.

Yes you could use array<int, X*Y>,, but why would you?

My bad, I was under the impression that array<array<int,X>, Y> would not create a single block of memory contiguous in both directions [e.g. &(cell[0][width]) + 1 != &(cell[1][0])] but I see that's mistaken.

throw table_exception("(? ???)? ? ???");

King Mir

2,506

October 04, 2014 03:07 PM

The point of the unique pointer in my example is to move the large array from the stack to the heap.

I see, that makes sense if the size of the array is a problem. But the size might not be a problem, and if it isn't its quicker to allocate the array on the stack. If a size or range of sizes is known statically, OP can simply tune the function. If size is determined at runtime and spans a range of sizes from stack-appropriate to not, then there might be a clever way to switch at runtime without duplicating too much code.

If size is determined dynamically, you need to use a vector. What you're talking about, using an in-place small array instead when there are only a few elements, is called small vector optimization. Many vector implementations and libraries with vector like containers will do this, usually for one or zero elements. There's no need to roll your own implementation. This is even more common for strings, because you can fit a lot of characters in the space of a few pointers.

My bad, I was under the impression that array<array<int,X>, Y> would not create a single block of memory contiguous in both directions [e.g. &(cell[0][width]) + 1 != &(cell[1][0])] but I see that's mistaken.

Yep. std::array is just a simple wrapper around old C/C++98 arrays with all the capabilities thereof.

c++ Performance boost needed

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

c++ Performance boost needed

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines