Optimization philosophy and what to do when performance doesn't cut it?

Started by
12 comments, last by Postie 9 years, 6 months ago

So I often have a dilemma when coding where I have to make things run faster but the project isn't finished. As programmers we are often told to code first and then optimize later, this is related to the famous saying "premature optimization is the root of all evil". However adhering by this philosophy is not necessarily always practical, this would require a costly review of a huge codebase, a much more economical approach would be to optimize "as you go" or basically making things as efficient as possible before moving on. I always have trouble figuring out which method is the most practical, when I'm faced with the fact that my project while not finished it simply does not meet performance requirements for the given hardware. What approach should I take to speeding it up?

Advertisement

I always thought that premature optimization doesn't mean "optimization before the project is finished", to me it means "optimization before testing".

The idea is to avoid the use of weird things "because it's faster this way" when it's actually not needed and you could have done a readable and self documented code. The weird way may be faster, but if that code was already fast enough you changed a good code that anyone can read to something that you have to explain to everyone.

If you know something is a real bottleneck, do some optimizations, even if 2 weeks later you end up throwing away that code, and if you do weird things explain them with comments, but make sure that you need those things and measure both results.

Personally, I tend to prototype a feature first until it's functional and then spend a bit of time to clean up the code and optimize it straight after (hopefully without breaking it -.-). Then once a week or so I spend a day reviewing all the things I did that week and refactoring. There's no set schedule for this, I just do it when the mess starts bugging me (which is at pretty regular intervals). I don't go overboard with optimization though - just enough to keep things from getting too ridiculous.

There's not one correct answer here since software architecture isn't an exact science. If it were, there'd be one solution to a problem and we'd all just use that. :) As you gain more experience, you'll be able to identify when you should be optimizing sooner rather than later.

Regardless of when you choose to do performance tuning, the first step is ALWAYS PROFILE THE PROBLEM. So many developers say to themselves, "Self. I know that this code right here is the problem. Let's make it faster and our problems will be solved." They spend a week making that section 10 x faster but it turns out that section was only taking 1/1000th of a second to begin with. They might even try this failed approach a few times before FINALLY timing the various sections of code and tracking the REAL problem down.

Sometimes you get lucky and you guess correctly, but save yourself some headache and profile your performance problems.

- Eck

EckTech Games - Games and Unity Assets I'm working on
Still Flying - My GameDev journal
The Shilwulf Dynasty - Campaign notes for my Rogue Trader RPG

a much more economical approach would be to optimize "as you go" or basically making things as efficient as possible before moving on.


This is the thing covered in the full quote;

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.


Trying to make everything as fast as possible as you write it is, and will remain, a waste as you will slow overall progress for little to no real gain.

The speed of the critical path and hotspots on that path are the key points and often rewriting or rethinking the problem in those areas has very little impact on the overall application (assuming you have designed it properly which is a side issue).

Experience in time will guide you, but even experienced programmers will check themselves before making changes and profile to make sure their gut isn't throwing them off course.

Code is read more than it is written, so programming for readability at the cost of some speed (although being sensible still applies) is preferable - you can always come back and make it faster later but making it correct and maintainable is a much harder job.

Typically at the start of a project you establish a performance target you need to hit on some particular configuration of hardware (often the minimum and regular specification hardware), and then at regular intervals throughout development perform benchmark tests on this hardware using consistent procedures, and schedule optimizations based on the results. Ideally this process is automated, so that you can perform a ton of benchmarking test across all the various game levels and content without requiring dedicated developer time. Automated testing also helps ensure that the testing procedures remain as consistent as possible and you eliminate any human error.

If you blow your performance target, or there's some change made that drastically alters performance, you know it's time to take a look at what happened. The more often the benchmarking is performed, the less code you need to review and profile.

Edit: To expand on what I alluded to above, overall performance is about more than just code itself. Sometimes perfectly reasonable code will interact with certain content in ways you didn't expect.

2 phrases I hate are, “Make games, not engines,” and, “Premature optimization is the root of all evil.”
While they may stem from sound advice, these over-simplified statements cause more strife than good as they lend themselves to more and more people misunderstanding what they mean.
In several cases, people were writing a game and putting reusable code into its own section when they realized, “Wait a minute, am I writing an engine? Oh no!!”. They are literally asking, “Please help! I keep making an engine while I make my game!! How can I make the game without making an engine?”.


In the same way, thanks to reading an over-simplified mantra regarding optimizations, people often end up going out of their way not to optimize until in-theory the very end (in practice, never).


The fact is that many people wouldn’t be doing anything wrong at all if they had never read that. Most of it is common sense.

#1: The “premature optimization” mantra holds truest if you aren’t sure if an algorithm will even work. Prototypes definitely do not need to be concerned with performance. They often have to be entirely rewritten, but if you have a deadline, slow is better than nothing. This is where you would push off the rewrite until later when you have more time.

#2: There is nothing wrong with taking a moment to simplify a mathematical equation, is there? Especially if things jump out to you quickly. Likewise, there are tons of simple things that should constantly be jumping out at you as you write code.
If the order of the loop doesn’t matter, for ( size_t i = vec.size(); i--; ) is never worse than for ( size_t i = 0; i < vec.size(); ++i ).
++i is never worse than i++.

If you are writing a matrix routine:
		CMatrix4x4 & MatrixRotationZLH( float _fA ) {
			_11 = ::cos( _fA );
			_12 = ::sin( _fA );
			_13 = 0.0f;
			_14 = 0.0f;
			
			_21 = -::sin( _fA );
			_22 = ::cos( _fA );
			_23 = 0.0f;
			_24 = 0.0f;
			
			…
			return (*this);
		}
…it’s obvious that you shouldn’t be calling ::sin() and ::cos() on the same value multiple times.

		CMatrix4x4 & MatrixRotationZLH( float _fA ) {
			float fS = ::sin( _fA );
			float fC = ::cos( _fA );
			_11 = fC;
			_12 = fS;
			_13 = 0.0f;
			_14 = 0.0f;
			
			_21 = -fS;
			_22 = fC;
			_23 = 0.0f;
			_24 = 0.0f;
			
			…
			return (*this);
		}
It’s not premature to save redundant calculations to temporaries (and in this specific case ::sincos() would be better still).

There is no excuse for not handling obvious and simple cases on the first pass through the code you are writing.


#3: If you already understand exactly what you are supposed to be implementing, there is no reason not to spend a few minutes thinking ahead towards what obvious bottlenecks there might be and designing around them.
Just this morning I was implementing the first stages of a triangle cache optimizer, which begins by creating a vertex-triangle adjacency list.
Since each vertex can have any number of adjacent triangles, if I was an idiot I would have just started coding right away and given each vertex a variable-length array (std::vector or similar) to hold its list of connections.
That’s not how we do things. I took 5 minutes to think about how to avoid making so many allocations, because allocations are always something you want to avoid. It’s obvious.
I structured my list to take advantage of a pool pre-allocated once.
In that time I also realized that I didn’t need to copy the 3 indices to each triangle over to the list, I could simply store a pointer to the first index in the triangle and know that the next 2 indices are part of the same triangle. Now my pool is 3 times smaller on 32-bit machines.

A little thought saved me a lot of memory and performance issues.
Will I find little things I could improve later? Probably, but I won’t have to rewrite the whole thing because I didn’t use a retarded implementing in the first place.


#4: You don’t need to focus on non-obvious or time-consuming things on the first pass. Making a pool was obvious. Making my loops go from X to 0 was obvious.
If there’s anything else I can do, it’s not obvious, and I am not going to spend time looking for it until profiling later reveals where I should be looking.
This is the point of the mantra.


I get the feeling that people read that mantra and suddenly become idiots where they otherwise would not.
Just because you read that it doesn’t mean suddenly you should repeat ::cos( A ) in five places in a routine or avoid spending the time to think about how to reduce the number of allocations etc., like the people who go out of their ways to avoid writing an engine even though it is the natural byproduct of writing a game.
It means, “Do everything you normally would, just don’t spend unreasonable time on non-obvious optimizations until a profiler has told you where to look.”


Do as much as you can as you go without specifically taking time off to look for deep optimizations.
Go back any time it is convenient and profile for bottlenecks. Not at the end of the project, but during its development.

I just added our optimized Oren-Nayar shading model to my own engine this morning and then went on to other tasks. I had a few extra minutes afterwards and did a test to see if an if in the shader could improve performance. It turned out to be slower or exactly the same in my case, but if you have a hunch and 5 minutes to test, there is also nothing wrong with doing that sporadically throughout the project either.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I frequently find myself having to choose between a more readable line and a faster line. I then leave myself a big comment like this:


// Performance: use Hard2ReadFunc1() + Hard2ReadFunc2()
x = EasyToReadFunction()

That way I can save the brain power, but I will only optimize it if profiling leads me to that spot.

So we have 2 ways of looking at this where both have truth.

It's worth noting that some design-time choices will have a major impact on speed. Choosing to implement some system around data access patterns instead of some sort of object hierarchy is a pretty big decision, and with just a little experience can easily be determined before-hand, even though it could be considered a type of optimization. I would rather do this before rather than after.

However, optimizing my collision detection system for 2 days when I only have 10 objects on the screen at a given time is almost guaranteed a waste of time, but profiling later on will help us determine this for sure.

Spiro, for your #2, and that sin/cos, do you think the compiler wouldn't optimize the initial version?

(Not disagreeing with your point, there are lots of instances of small optimizations that, as long as they don't make the code less clear, and don't take a lot of time, should just be done.)

This topic is closed to new replies.

Advertisement