Jump to content

  • Log In with Google      Sign In   
  • Create Account

Simple & Robust OOP vs Performance


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
20 replies to this topic

#1 rockseller   Members   -  Reputation: 121

Like
0Likes
Like

Posted 06 September 2012 - 11:46 AM

Good day peeps,

I'm going to talk about a project I started, a 2D Game, aimed for Android 2.2 and above, OpenGL ES 2.0, and being coded with the default Android Java SDK (up-to date), rather than the NDK framework.

Ever since I started developing a platform RPG game, I took special care of the micro code optimization, because I felt it would have a big impact in the long term, specially because It's serious project, involving help from other designers and creative guys;
In terms of performance, the experts say that you should profile your game, and keep measuring it on real devices in order to see where your bottleneck really is.

In my experience, and please flame me if you feel I'm wrong, for a time consuming and serious project, this is not 100% true, since you spend time developing tools for other guys to work within your engine framework, and layout the way Bitmaps should be drawn in order to be used as textures (standardizing them), and most of the time you need to save time by optimizing parts of the code (assuming you have a good code design) by the very beginning, in parts where you do know that you will have a performance boost
(I.E. reducing the possibilities for the Garbage Collector to run)

As a project leader and coder, I have realized that in the end is better to have a robust and well designed Class, rather than an "optimized" all-in class.

For instance, I had a class called:

Monster
That had 100 methods



Sure it had a lot of micro optimization inside those methods (good bye readability, I know, good practices, still!!)

But after some time, I figured out that iwas better to have a class:
Monster
With 5 methods, and 3 objects inside
MonsterAttackModule
MonsterAIModule
MonsterBodyModule

With the second case, I still have some micro code optimization, more readability, but more objects
That's more memory, isn't it?

So my point is that in the end is better to have a balance of readability, and a robust OOP Code, micro-code optimize up and there, and profile as experts says, making the original sentence, not completely true, right? :)

Attached Thumbnails

  • device-2012-08-04-141939.png


Sponsor:

#2 Madhed   Crossbones+   -  Reputation: 2984

Like
3Likes
Like

Posted 06 September 2012 - 12:06 PM

1 class with 100 methods VS multiple classes with a few methods is not really about performance but more about sensible design.

Why do you think one "mega class" would perform better?

#3 Radikalizm   Crossbones+   -  Reputation: 2885

Like
3Likes
Like

Posted 06 September 2012 - 12:12 PM

How can you optimize code without profiling? How can an optimization be effective if you don't know whether you're dealing with a bottleneck or not?

I've worked on some larger projects myself, and I can tell you for sure that premature optimization is something you definitely want to avoid.

Write your code according to the proper standards. If you're working with OOP then make sure you properly follow the rules of an object oriented design. A class with 100 methods is definitely a serious violator of the single responsibility principle, which is an enormously important concept within OOP.

If you find that you're having performance issues after following the proper standards you should not resort to micro-optimizations, but you should look at optimizations at a larger scale. Rewrite algorithms which pose a bottleneck, go over some larger systems and see which components could be altered to run faster, etc.
If this still shows performance problems you should probably consider whether it's the design itself which just isn't optimal for the problem you're trying to solve, micro-optimizations should be at the very end of the optimization progress when every other step did not provide a proper solution IMO.

I gets all your texture budgets!


#4 rockseller   Members   -  Reputation: 121

Like
0Likes
Like

Posted 06 September 2012 - 04:59 PM

1 class with 100 methods VS multiple classes with a few methods is not really about performance but more about sensible design.

Why do you think one "mega class" would perform better?


From developer.android.com best practices:
"Object creation is never free. A generational GC with per-thread allocation pools for temporary objects can make allocation cheaper, but allocating memory is always more expensive than not allocating memory."

Thus, you should avoid creating object instances you don't need to..

Unless I'm wrong, having 1 class with 100 methods, over 1 class, and 10 subclasses, each one having 10 methods, will result in
1 object vs 11 objects

What do you mean with sensible design?




How can you optimize code without profiling? How can an optimization be effective if you don't know whether you're dealing with a bottleneck or not?

I've worked on some larger projects myself, and I can tell you for sure that premature optimization is something you definitely want to avoid.

Write your code according to the proper standards. If you're working with OOP then make sure you properly follow the rules of an object oriented design. A class with 100 methods is definitely a serious violator of the single responsibility principle, which is an enormously important concept within OOP.

If you find that you're having performance issues after following the proper standards you should not resort to micro-optimizations, but you should look at optimizations at a larger scale. Rewrite algorithms which pose a bottleneck, go over some larger systems and see which components could be altered to run faster, etc.
If this still shows performance problems you should probably consider whether it's the design itself which just isn't optimal for the problem you're trying to solve, micro-optimizations should be at the very end of the optimization progress when every other step did not provide a proper solution IMO.


My point was to explain my realization about the fact that you can't follow a guide line or best practices all-in;
You need to make a balance, you should profile your application, but not be naive about optimizing your application at some extend.

An example of micro-code optimization, without profiling first


code not micro-optimized (java):


...


//A random ArrayList of an object
ArrayList<AnotherObject> arrayList = new ArrayList<AnotherObject>(100);

//A vector we will be using
Vector3D utilityVector3D = new Vector3D(0,0,0);


for(int i = 0; i = arrayList.size(); i++)
{
AnotherObject obj = arrayList.get(i);

utilitVector3D.x = obj.x;
utilitVector3D.x = obj.y;
utilitVector3D.x = obj.z;
}


...

code micro-optimized:


...

//utilityVector3D is not a member of the parent class of the method
//--- Comented, instantiated at the constructor Vector3D utilityVector3D = new Vector3D(0,0,0);


//A random ArrayList of an object
//--Commented and instanciated too --ArrayList<AnotherObject> arrayList = new ArrayList<AnotherObject>(100);
arrayList.ClearAll();

final int sizeList = arrayList.size();

for(int i = 0; i = sizeList; i++)
{
final AnotherObject obj = arrayList.get(i);
utilitVector3D.x = obj.x;
utilitVector3D.x = obj.y;
utilitVector3D.x = obj.z;
}


...



Here you choose to avoid the Garbage collection to happen at the expense of some Memory.

My point was that it is good to do some micro-code optimization now and then, but at the same time balance by using simple and robust objects, as my pointed example of the 1 object vs 11 objects above.


Am I wrong?
Thanks!

Edited by rockseller, 06 September 2012 - 05:01 PM.


#5 jbadams   Senior Staff   -  Reputation: 18620

Like
0Likes
Like

Posted 06 September 2012 - 06:01 PM

Moving you to General Programming -- Game Design is for discussion of game mechanics and balance rather than code design. Posted Image


A few thoughts:

If you know a certain technique will yield better performance on your target platform and that implementing that technique will not impose undue difficulty when compared to a simpler approach then use of that technique is not really optimisation (micro or otherwise), it's just being sensible. However, this only applies if you know -- from extensive prior experience, because it is well established best practice, or because you have profiling data -- the technique to be better, not just because you think it will be better, a few unsubstantiated online sources claim it might be better, or some line in the documentation can be interpreted as suggesting -- as opposed to clearly stating that -- it is better.


It's true that garbage collection running too often or at undesirable times can negatively impact performance -- particularly on hardware limited devices such as mobile platforms -- and is an area you should pay careful attention to, but using excessive amounts of memory can also be problematic, and sometimes the garbage collector will run without negatively impacting your performance. Again -- unless you know that garbage collection is going to cause a problem, unless it's an idiomatic usage to avoid it in a specific situation on the target platform, or unless you have prior experience of it causing problems on the platform -- you're probably better going with whichever method is simpler and more natural to implement until you have profiling data that shows it's actually causing a problem.

Remember that compilers and other tools are generally designed to provide the most benefit to the most common usage patterns; doing something differently than the "normal" way without a clear benefit may incur other unexpected optimization costs.


"Simple and robust OOP" -- if properly implemented -- should not be something you necessarily need to trade off against performance in the majority of cases. As with everything, don't use OOP where it isn't appropriate, but when you do use it make an effort to do so properly and you'll find that in most cases it doesn't introduce any undue performance overhead.

#6 jbadams   Senior Staff   -  Reputation: 18620

Like
0Likes
Like

Posted 06 September 2012 - 06:31 PM

Oh, and one more:

When developing in a garbage collected language -- especially on a platform with limited capabilities -- be aware of how calls to standard library or third party code may be allocating memory. You can optimize your own code as much as you like, but if the problem is coming from how you're using -- or in some cases even the fact that you've decided to use -- some standard or third party code then it won't do you any good. Don't take this to the extreme of not trusting any other code or following silly blanket-rules of not using anything that allocates memory however, or you'll kill your productivity and risk introducing additional bugs into your less-well-tested alternative code.

This is also yet another example where profiling to find the real source of problems can avoid unnecessary work and allow you to focus your efforts on genuine problems. Use standard libraries and useful third-party code where appropriate, and replace things or change your usage patterns if profiling data shows them to be the cause of problems.

#7 krippy2k8   Members   -  Reputation: 646

Like
2Likes
Like

Posted 06 September 2012 - 08:41 PM

Unless I'm wrong, having 1 class with 100 methods, over 1 class, and 10 subclasses, each one having 10 methods, will result in
1 object vs 11 objects

What do you mean with sensible design?


1) Unless you're spawning many monsters every second, the overhead of spawning 11 objects per monster instead of just 1 is negligible and typically not worth compromising your design. This is in fact a perfect example of why you shouldn't engage in these types of "optimizations" without analyzing your bottlenecks first. If it has no impact on the overall performance of your application, it is a useless optimization.

2) When you have something like MonsterAttackModule, if it is primarily just a collection of methods and doesn't manage state, you can usually get away with only having a single MonsterAttackModule instance per monster type, thus incurring no additional allocation overhead per monster. Or perhaps just a collection of static methods, and thus incurring no allocation overhead at all.

3) Paying more attention to your higher level design can produce performance benefits many orders of magnitude greater than your micro-optimizations. Compromising such a design in any way in favor of micro-optimizations without profiling will almost always hurt you in the long run.


This doesn't mean that you should never think about low-level performance issues as you write your code, as you should, but only where you know it will make a significant and noticeable difference or that it won't compromise your design in any meaningful way.

i.e. block-copying memory instead of copying memory a byte at a time is always a good idea because it has a significant performance impact and doesn't usually result in any significant design compromises. Doing things like string1.append(string2) instead of string1 += string2 is also usually a good idea as it usually eliminates the creation of temporary objects, and also has no impact on your design.

Creating a monolithic class with 100 methods because you think it's faster to allocate is definitely not a good choice for a premature optimization.

Edited by krippy2k8, 07 September 2012 - 12:14 AM.


#8 Bacterius   Crossbones+   -  Reputation: 8890

Like
0Likes
Like

Posted 07 September 2012 - 02:02 AM

You're using Java, which immediately destroys your goal of maximum performance. Java simply does not give you enough low-level control to achieve some important optimizations (such as cache coherency, branch prediction hints, etc...), so your micro-optimization efforts are futile at best. Besides, I'll be surprised if you manage to measure any difference in runtime between your two classes with a half-modern system, so stop worrying about performance.

Good design naturally yields good performance, but the reverse is not true. Therefore, build your project with design in mind, and add any optimizations at the end if they are readable (for instance, inserting three hundred lines of assembly into a twenty-line method to optimize some algorithm is probably not the way to go - if you really need the speed, write the optimized code in a different routine and call it from your method) and meaningful.

Also, interpreters and compilers are very good at noticing common programming patterns (such as copying memory byte per byte) and optimizing them in an optimal manner using the available hardware (such as SIMD instructions). If you try and optimize trivial things yourself, the compiler/interpreter might not understand the pattern and not optimize anything, resulting in slower performance overall.

Now there *are* things worth optimizing to death - for instance, a general-purpose pointer-based binary search is a well-established algorithm, which can be (provably) correctly implemented in the most efficient way possible, and stored in a code library for easy reuse in any language and project - nobody will ever need to change, or even look at this code, and it can be considered an optimally efficient black-box binary search. However, this is not the case for 99% of code, which needs to constantly be checked for bugs and upgraded in various ways: if this is the case, maintainability beats performance by a large margin.

Edited by Bacterius, 07 September 2012 - 02:03 AM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#9 L. Spiro   Crossbones+   -  Reputation: 13600

Like
-2Likes
Like

Posted 07 September 2012 - 06:30 AM

1 class with 100 methods VS multiple classes with a few methods is not really about performance but more about sensible design.

Why do you think one "mega class" would perform better?

Because he is using Java on a mobile device.

Back when dinosaurs roamed the earth my job was mobile development with Java.
In-house there was a running gag related to the fact that mobile devices supported only Java, yet Java was the worst choice due to how much overhead there was in classes etc.

We were basically limited to a maximum of 3 classes per game, but urged to use only one.


That being said, we no longer live in a world of Nokia. Android devices are much more powerful and have much more memory than those of old times, and we really don’t need to care so much about this type of overhead. On any modern device, you are wasting your time if you are worrying about this kind of thing.


But there are still optimizations that you should do whenever possible.
For one, count down to 0 in for-loops whenever possible.
In Java, it looks like this:
for ( int i = iTotal; --i >= 0; ) {
}

This is also faster in C/C++ (many ways to compare against 0, and those instructions are smaller and faster), but in Java it is a major help. It reduces the size of the code itself significantly while also providing a faster iteration.
If the order of the loop does not matter, you should always always always do it this way.


There are a lot of little things such as this that can help, but I can’t pull them off the top of my head.


L. Spiro

Edited by L. Spiro, 07 September 2012 - 06:31 AM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#10 Aardvajk   Crossbones+   -  Reputation: 5982

Like
2Likes
Like

Posted 07 September 2012 - 06:32 AM

The false conclusion - less objects equals better performance - is in itself an excellent example of the need for profiling to identify areas to optimize.

In most languages, methods do not form any part of the memory footprint of an instance of a class. There is some overhead with an allocation and there can be some minimal overhead with a class instance but these are so highly unlikely to be significant in anything other than highly specialized corner cases that your effort is wasted.

Even if you do prove via profiling that these issues are relevant, there are ways to address them without sacrificing good design using alternative allocation strategies and so on. If the language you use is not sufficiently low level to support these options.

But you haven't, so assume it isn't until you do.

#11 Telastyn   Crossbones+   -  Reputation: 3726

Like
0Likes
Like

Posted 07 September 2012 - 07:39 AM

That's more memory, isn't it?


Yes, it's likely 12 measly bytes.

A small price to pay for the productivity and quality gains that come from not having a 100-method class.

#12 l0calh05t   Members   -  Reputation: 743

Like
1Likes
Like

Posted 07 September 2012 - 08:16 AM

But there are still optimizations that you should do whenever possible.
For one, count down to 0 in for-loops whenever possible.
In Java, it looks like this:
[source lang="java"]for ( int i = iTotal; --i >= 0; ) {}[/source]

This is also faster in C/C++ (many ways to compare against 0, and those instructions are smaller and faster), but in Java it is a major help. It reduces the size of the code itself significantly while also providing a faster iteration.
If the order of the loop does not matter, you should always always always do it this way.


This is also very much a premature optimization as well (in C++ at least). Reverse iteration might yield a slightly faster comparison, but is just as likely to cause an increased number of cache misses if the index is used in any way at all (even if the order does not matter semantically).

#13 Mussi   Crossbones+   -  Reputation: 1970

Like
2Likes
Like

Posted 07 September 2012 - 09:30 AM

This is also faster in C/C++ (many ways to compare against 0, and those instructions are smaller and faster), but in Java it is a major help. It reduces the size of the code itself significantly while also providing a faster iteration.
If the order of the loop does not matter, you should always always always do it this way.

Not on my machine using C++.

#include <iostream>
#include <Windows.h>

using namespace std;

int main()
{
const int arraySize = 10000;
int a[arraySize];
int b[arraySize];

for(int i = 0; i < arraySize; i++)
{
  a[i] = 0;
  b[i] = 0;
}

LARGE_INTEGER li, start;
if( !QueryPerformanceFrequency(&li) )
  return 0;
double freq = double(li.QuadPart)/1000.0;

QueryPerformanceCounter(&start);
for(int i = 0; i < arraySize; i++)
  a[i] += i;
QueryPerformanceCounter(&li);
cout << double(li.QuadPart - start.QuadPart)/freq << endl;

QueryPerformanceCounter(&start);
for(int i = arraySize; i >= 0 ; --i)
  b[i] += i;
QueryPerformanceCounter(&li);
cout << double(li.QuadPart - start.QuadPart)/freq << endl;
return 0;
}

Multiple tests indicate that the second iteration is slower, sometimes as much as 2 times slower.

Edit: code not showing up
Edit2: that wasn't a fair comparison, strangly enough switching the order of the iterations makes a difference

Edited by Mussi, 07 September 2012 - 03:35 PM.


#14 Bacterius   Crossbones+   -  Reputation: 8890

Like
0Likes
Like

Posted 07 September 2012 - 09:42 AM

@Mussi: try pastebin, forum software seems to have issues with cout << currently. Also for your test, make sure you reset data every time, preferably using pseudorandom numbers to prevent result precomputation and/or caching, time multiple times and with different array sizes.

Also, if looping backwards is faster on a given platform, and the compiler for this platform can prove that the direction of the loop doesn't matter (or can compensate for it), the compiler probably already does it at some optimization level. This isn't exactly an arcane trick and is just as well-known as, say, xoring a register with itself to zero it.

Edited by Bacterius, 07 September 2012 - 10:22 AM.

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#15 Madhed   Crossbones+   -  Reputation: 2984

Like
0Likes
Like

Posted 07 September 2012 - 11:18 AM

What do you mean with sensible design?


What I meant is using an idiomatic and clean design first. Most programmers would argue that a class with 100 methods is smelling pretty bad and that the potential performance gains would not justify keeping such a beast in your codebase. Posted Image

However, I must admit that I know virtually nothing about Java regarding its allocator and memory layout strategy. But what about this:

Having big objects with lots of independent members usually means you are pulling unused data into the cache, butchering performance.
By splitting up the object into smaller components we can achieve better cache utilisation by grouping the data that gets processed together.
If I understand correctly that's what DOD is all about.

So having many classes can in fact be faster than having big "all in one" classes. Especially if you consider that these days RAM is usually the biggest bottleneck. (Apart from disk access of course)

OTOH this is about mobile development and java, so I have no idea if the bottlenecks are different in that case.

Edited by Madhed, 07 September 2012 - 11:23 AM.


#16 phantom   Moderators   -  Reputation: 7279

Like
0Likes
Like

Posted 07 September 2012 - 05:46 PM

Also, if looping backwards is faster on a given platform, and the compiler for this platform can prove that the direction of the loop doesn't matter (or can compensate for it), the compiler probably already does it at some optimization level. This isn't exactly an arcane trick and is just as well-known as, say, xoring a register with itself to zero it.


It also depends on the hardware under the hood; CPUs can prefetch data into cache in a speculative manner so that when you come to use it it has already been fetched from main memory. If a CPU can't prefetch cache lines backwards very well then a reverse loop could end up being much slower.

(Also register XOR can be slower than simply setting it to zero with an immediate, again depending on CPU, as it can cause pipeline stalls due to register dependencies in the pipeline. I'm pretty sure x86/x64 compilers don't use the XOR trick any more and haven't in some time now for that reason.)

#17 L. Spiro   Crossbones+   -  Reputation: 13600

Like
0Likes
Like

Posted 07 September 2012 - 09:41 PM

This is also very much a premature optimization as well (in C++ at least). Reverse iteration might yield a slightly faster comparison, but is just as likely to cause an increased number of cache misses if the index is used in any way at all (even if the order does not matter semantically).

Not on my machine using C++.

My extensive benchmarks across a wide variety of platforms and compilers reveals that it is overall the fastest choice.
For iOS, for example, profiling revealed 2 functions to be high on the time-consuming list (Xcode comes with “Instruments” that show you how long every function takes) and I was able to reduce the time spent inside those functions by half by reversing the order of the loop to count down to 0.

My coworkers saw my commit to SVN and we had a long discussion about it, since it was the first time they had seen this kind of loop.
We drew out how the cache was being used etc.
Turns out that cache misses are exactly equal to those when counting up in an array. You are going to have the same number of cache misses no matter which way you go.
One of them pointed out that some machines have predict-forward cache mechanisms, which is true, but while you don’t get to exploit this using a reverse array, in practice, the small instruction set and faster comparisons make up for this, especially in inlined performance-critical functions.
When you think about it, predict-forward caching is rarely exploitable in regards to overall memory access. It is mainly for the benefit of the instruction cache, not the memory cache.

For arrays spanning fewer than 16 bytes, you don’t get to take advantage of this regardless of the array traversal direction, which means reverse traversal is always a win.

Hence it is not a premature optimization in practice.
The code is smaller, the iterations are faster, and the time spent coding is shortened (typing “for ( u32 I = uTotal; I--; )” is at least twice as fast as typing “for ( u32 I = 0; I < uTotal; ++I )”).


When it comes to Java on the other hand, there is no other side of the coin. We considered this absolutely necessary when possible. For both performance and for code size.


Edit2: that wasn't a fair comparison, strangly enough switching the order of the iterations makes a difference

I take that as meaning that your results are non-indicative of any performance penalties from reverse traversal.
Besides the fact that your code is flawed and you have invalid access in your reverse loop, plus the fact that this causes it to iterate more times than the forward loop.


L. Spiro

Edited by L. Spiro, 07 September 2012 - 09:45 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#18 krippy2k8   Members   -  Reputation: 646

Like
2Likes
Like

Posted 07 September 2012 - 11:50 PM

For iOS, for example, profiling revealed 2 functions to be high on the time-consuming list (Xcode comes with “Instruments” that show you how long every function takes) and I was able to reduce the time spent inside those functions by half by reversing the order of the loop to count down to 0.


Yeah I would pay to see that happen. If you actually had that result, then there either had to have been something in your loop that was early-outing sooner by counting down, or you were doing something in your comparison that was resulting in a calculation on each loop iteration to possibly have gotten those results: (i.e. i < something.size()).

Either that or you were using some kind of interpreted or intermediate language that did strange things. Otherwise it is physically impossible to get that result with native machine code. Can't say for Java but I would really have to see it to believe that it would have anything more than a extremely negligible effect.

The difference between iterating up and comparing against a memory address, and iterating down and comparing to zero, is exactly 1 test instruction on all ARM and x86 platforms with any reasonable compiler, and an extremely small memory latency issue which is definitely going to be a cache hit on every single iteration except possibly the first, unless your loop is doing so much that the difference in iteration method couldn't even be in the equation.

I just did a quick benchmark on 4 different systems with different CPUs (Windows and OSX, AMD and Intel), counting up and counting down by 2 billion iterations and ran the test 10,000 times, and the worst-case difference was 0.3%. This is with the internals of the loop doing nothing more than a simple addition accumulation and rollover.

It is true that it is usually going to be slightly faster to iterate to 0, but the difference is only going to matter in the most extreme cases.

Edited by krippy2k8, 08 September 2012 - 12:28 AM.


#19 L. Spiro   Crossbones+   -  Reputation: 13600

Like
1Likes
Like

Posted 08 September 2012 - 10:39 PM

The code that I changed was inside the vector math library and had been inlined into probably thousands of locations in the code, and included a nested loop from 0 to 2 (3 iterations) that had been unwound in all of those locations.
My little change cascaded into something your simple test cases would never reveal: Smaller unwindings over thousands of locations in code led to fewer instruction-cache misses, not to mention multiplying that 0.3% from your results by thousands.

You cited a cache miss on every iteration which implies you do not know how the cache works, which is backed by the fact that you yourself got a 0.3% increase in performance. You didn’t say that 0.3% was an increase, but you called it a “worst-case”, and since you are against reverse looping, the worst case for you would be an increase in performance. Followed by your last sentence.


So here is the deal.
Looping down to 0 allows the compiler to emit smaller instructions that take fewer cycles. There are many ways to compare against 0 so the compiler has many more options than it would otherwise and can make the best choices in all situations.
There is never any doubt as to whether a function such as XXX.size() will be called only once and cached or if it will be called on every iteration based on compiler competency.
Smaller instructions can in practice cascade into much smaller code—which simple test cases can never reveal—and lead to fewer cache misses.
It takes half as long to type.
It is never slower than, but can often be faster than forward looping.
But you don’t like it because…?

You yourself admit it is usually going to be faster. Is that not enough in itself? Why you do have to consider that it matters only in extreme cases?
Yes, it probably does only matter in extreme cases such as the one that showed up in my profiler. But as long as the algorithmic result is the same, how does it hurt you to reverse the loop?
This is the kind of, “I am against premature optimizations at all costs,” mentality that I can’t stand.
I agree that profiling should be used to locate bottlenecks, but at the end of the day I would rather not end up with a list of bottlenecks that wouldn’t have even existed in the first place had I taken the time to do simple things such as these. I prefer my profiling to show me a list of functions that baffle me as to why they are taking so long instead of a bunch of functions whose solutions’ are as mundane as the order of the looping.


L. Spiro

Edited by L. Spiro, 08 September 2012 - 10:41 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#20 slayemin   Members   -  Reputation: 2613

Like
2Likes
Like

Posted 10 September 2012 - 11:31 AM

On code design:
Rule #1: Keep it simple (you can always add complexity later).

There are two versions of a programmer: The one who writes the code and the one who has to maintain/update the code. Sometimes its the same programmer, sometimes its different programmers. Sometimes you'll come back to your code six months later and barely remember what you were doing and why you did it. This is where the power of simplicty and good comments will shine.

Simple algorithms are easier to understand and easier to troubleshoot. When you're inventing the solution to your problem, ask yourself "is this the simplest solution to solve the problem?" (sort of like Occams Razor).

Rule #2: Spend more time and effort trying to choose the best algorithms rather than trying to tweak little bits of code for performance gains.
Example: If you have an O(n^2) algorithm and you try to speed up its performance by tightening the inner execution code, your gains will be marginal compared to if you were able to find a O(log n) or O(1) algorithm.

Rule #3: (you should follow rules 1 & 2 first) Spend time optimizing your code when you're seeing actual optimization problems.
I've spent lots of time trying to solve problems that don't exist. It's generally a waste of time and can lead you down quite a rabbit hole. Don't do it. At the same time, don't take this as permission to start writing sloppy code.

Rule #4: Use empirical evidence to justify your optimization.
That means of course, that you have to gather the empircal evidence through a profiler, by counting CPU instruction counts, measuring CPU load, network bandwidth, etc.
Remember debugging 101: Identify, Replicate, Isolate, Solve, Review.

Rule #5: (or maybe rule #0) If you can avoid reinventing the wheel, then do so.
Has someone else done what you're trying to do? Did they use a better technique? Is there available code you can use? Is there a library you could use? A game engine which satisfies your requirements? A software app which already does what you're trying to do? (of course, this is completely ignoring the counter-productive legal BS)

Rule #6: Don't lose sight of your end users and what they care about.
All through university, I pushed myself to write the most technically complicated code. If an assignment asked for a 2D game, I'd go above and beyond and create a 3D game. If we had to find and write up a ray tracer feature, I'd find one that experts said was impossible and then try to do it. Then we'd demo our projects to our fellow students and claim bragging rights. Well, my games/apps were impressive in their own right, but in their technical complexity, they were buggy and unpolished. Other students spent more time refining and polishing their simpler apps/games. Guess who won? Not me. It turns out that nobody gives a shit about the backend technical implementation and its difficulty. It's ALL about functionality, usability, and presentation. It was a good, albeit hard lesson learned and worth remembering :)
(so, your end users won't care about how your OOP is ultimately designed, as long as your app is functional, usable and well presented.)

Eric Nevala

Indie Developer | Dev blog





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS