Game Optimization

Started by
12 comments, last by Catherine Lee 8 years ago

?Hey!

?So for the past month in a half I have been working on optimizing it because the FPS was quite low even on my High-Spec PC. I've used a few techniques like occlusion culling but I've had an idea which I couldn't really google because I didn't know how to define it in short words.

?Here is what I thought of doing:

?So when you have an if statement and it checks multiple things, let's say checks if value 1 is equal to 10 and values 2 is equal to 13. So it's checking 2 things and thus it's using more data, not much more but imagine it on hundreds or even thousands of if statements that check for 2 or more things. So what I've thought of is checking only if value 1 is equal to 10 and in it another if statement, so a nested one, checks if value 2 is equal to 13, so it wont look at the nested if statement until value 1 is equal to 10 thus ever so slightly increasing performance. And again this doesn't eat up performance that much but I have a lot of if statements that check for more than 5 things. I'm not sure if anyone does this and I've only been working with computers for 3 years now and didn't know much 'bout them till about a year ago and I wasn't that good at coding so I don't know how much this will increase performance or if it will actually do something with the performance so please tell me your opinion on this.

Thank you for reading :)

PS: I'm sorry for the title being so undescriptive but I didn't want to put "Game Optimization Theory" or something like that because maybe people actually do this and I didn't know about it.

Advertisement

so it wont look at the nested if statement until value 1 is equal to 10 thus ever so slightly increasing performance

The compiler is already doing this.
In a compound if statement, there is no reason to continue checking things if the statement itself is already false.
if ( val1 == 7 && val2 == 13 )
If val1 is not 7, then “val2 == 13” is never executed.

I couldn't really google because I didn't know how to define it in short words.

Short-circuit evaluation



I don't know how much this will increase performance or if it will actually do something with the performance so please tell me your opinion on this.

It will never make it faster and might make it slower.


If you want to micro-optimize branches, look into branch predication.

And frankly you shouldn’t be doing anything without profiling to see where bottlenecks actually are.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

it's not even for performance reasons, but required to make your code work properly e.g.

if(pCamera!=nullptr && pCamera->IsActive()==true)...

you don't want the 2nd statement to be always evaluated, as it would lead to a memory access violation.

Most compilers tend to be fairly smart about things like this. Generally low-level optimizations like this one are easy for the compiler to recognize and it can be more efficient at it because it has all the information it needs. What you should be concentrating on is high-level optimizations where you have information that the compiler doesnt.

As Krypt0n mentioned, it's not about the "optimizer being smart" in this case. It's about the language specification. Short-circuit evaluation is a specifically defined behavior (and I don't know any language that doesn't do it).

I've seen many cases where a set of branches could be optimized to a single branch and a jump table by replacing if(){}elseif(){}else{} with a switch statement, and the conditions are complex enough for the compiler to not be able to know that. Although there's no guarantee of how the compiler will implement a switch statement, it will generally be faster than a bunch of branches. This is something to look into.

Another common case is branches being ordered poorly. I generally try to order branches in order of their likelihood, for the sake of optimizing branch prediction. For example, if you know that one condition will be true in 95% of cases, it's a good idea to put the branch for this condition first, so that the processor doesn't have to branch through less predictable branches in order to get to the predictable one.

Sometimes it's hard to predict what branch will be taken (maybe it depends on user input, or the contents of some file that can be replaced or edited). For example, if one branch is a "fast path" (IE it returns immediately), and the other branch is a "slow path" (IE it has to do some complex calculations, process a bunch of memory, etc), and maybe a third branch that pretty much shouldn't happen (maybe it just throws an exception or logs an error or something); you want to evaluate the fast path first.


but chances are your game doesn't have enough branch mispredictions to reduce fps significantly (if your fps is 30, that means it's drawing every 33.3ms. You want to draw at least as 60fps, every 16.7ms. The average branch misprediction cost is 20ns - .00002ms. You'd need to shave off about 830,000 branch mispredictions per game loop. Do you even have that many branches?) The problem probably lies somewhere else.

Echoing some posts from above: that technique may net you 1-2 FPS increase after digging through all of your code. It may be worth doing, but it's likely something else causing the slowdown. It would be more effective to set some timers before and after your functions to see what's eating up your speed. Once you've got a list of your slowest functions, you can prioritize which ones need to be more effecient. Then you can optimize those where possible.

Depending on your code, some functions may not need to be run every loop. Some, like AI for example, can be run every 5th or 10th loop. If that creates a bottleneck, you can put the objects that need to call it in a que and then run it X times each loop.

Yes, I recommend profiling your game loop and determining where it is spending the most time. I typically log such information in debug (read: development) builds so I can choose where to dedicate my optimizing hours.

A typical timer function with millisecond resolution will probably not be suitable for this - probably many functions worth optimizing will return 0 for this, because they operate in less than 1ms. You need a higher resolution timer function. In C++11, there is std::chrono::high_resolution_clock::now().

The high_resolution_clock is not always very good. Some implementations are particularly bad, such as Visual C++ 2012 and 2013 operating at multi-millisecond resolution.

On the PC you can probably use __rdtsc() intrinsic to read the CPU's time stamp counter, which is supported on most major compilers.

This topic is closed to new replies.

Advertisement