View more

View more

View more

### Image of the Day Submit

IOTD | Top Screenshots

### The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

# How to write fast code without optimizing

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

29 replies to this topic

### #1warnexus  Prime Members

Posted 19 December 2013 - 01:13 PM

Is fast code good or bad?

How to write fast code without optimizing?

I know I am not suppose to optimizing so I want to see if I can still write fast code.

Edit: Seriously, why the anonymous downvote...?

Edited by warnexus, 19 December 2013 - 10:13 PM.

### #2SimonForsman  Members

Posted 19 December 2013 - 01:15 PM

fast code is good and there is nothing wrong with optimizing your code, just don't waste too much time optimizing parts of your code that don't really matter.

The fastest code is code that you don't run.

I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

### #3Vortez  Members

Posted 19 December 2013 - 01:23 PM

Just try to access your data in a linear fashion, which is cache friendly, like when reading or writing an array for example.

Also, most optimization i've done where just rethinking some algorithm an rewrite them in a more efficient manner.

Edited by Vortez, 19 December 2013 - 01:25 PM.

### #4ApochPiQ  Moderators

Posted 19 December 2013 - 01:24 PM

POPULAR

Who says you're not supposed to be optimizing? Just don't optimize things that don't matter. To know what matters, measure.

Wielder of the Sacred Wands

### #5eduardo_costa  Members

Posted 19 December 2013 - 01:33 PM

Fast code first.

Optimization comes with repeating.

After a good deal repeating the solution of the same problem I could finish with both a fast and optimized code.

You can't reach this overnight.

Some techniques I needed to re-write from scratch around 5 or 10 times before grasping the best way to do it.

There were problems only after re-visiting it after a year I could get the idea behind some optimization technique.

So unless you have a good deal of time, you shouldn't try to optimize your code and only finish it.

### #6Álvaro  Members

Posted 19 December 2013 - 01:44 PM

POPULAR

* clear

* correct

* fast

... in that order. Some people would place correct' first, but I would rather have code that I understand even if it doesn't cover every conceivable case, as long as it is clear enough that I know immediately what the not-covered cases are.

Don't write stupid code that is needlessly slow, but don't obsess about performance until it's clear that the program is not fast enough. In that case, use your profiler to focus your efforts where you should.

### #7warnexus  Prime Members

Posted 19 December 2013 - 01:48 PM

Who says you're not supposed to be optimizing? Just don't optimize things that don't matter. To know what matters, measure.

Oh I see.

Edited by warnexus, 19 December 2013 - 01:50 PM.

### #8warnexus  Prime Members

Posted 19 December 2013 - 01:56 PM

* clear

* correct

* fast

... in that order. Some people would place correct' first, but I would rather have code that I understand even if it doesn't cover every conceivable case, as long as it is clear enough that I know immediately what the not-covered cases are.

Don't write stupid code that is needlessly slow, but don't obsess about performance until it's clear that the program is not fast enough. In that case, use your profiler to focus your efforts where you should.

What makes a code stupid? Unreadable code you mean?

Edited by warnexus, 19 December 2013 - 01:56 PM.

### #9mark ds  Members

Posted 19 December 2013 - 02:39 PM

What makes a code stupid? Unreadable code you mean?

stupid pseudo code:

For x = 0 to image_width

For y = 0 to image_height

Pixel(x, y) = 0

better version:

For y = 0 to image_height

For x = 0 to image_width

Pixel(x, y) = 0

Do you understand the difference between the two?

### #10Bregma  Members

Posted 19 December 2013 - 02:44 PM

POPULAR

Do you understand the difference between the two?

Oh!!!  I know, I know!!!!  Pick me!!!1!eleven!

One is very very fast in Fortran.  The other is very fast in C.

Stephen M. Webb
Professional Free Software Developer

### #11warnexus  Prime Members

Posted 19 December 2013 - 02:59 PM

What makes a code stupid? Unreadable code you mean?

stupid pseudo code:

For x = 0 to image_width

For y = 0 to image_height

Pixel(x, y) = 0

better version:

For y = 0 to image_height

For x = 0 to image_width

Pixel(x, y) = 0

Do you understand the difference between the two?

I don't know but Bregma above me seems to know.

Edited by warnexus, 19 December 2013 - 03:01 PM.

### #12mark ds  Members

Posted 19 December 2013 - 03:01 PM

One is very very fast in Fortran. The other is very fast in C.

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

I don't know but Bregma above me seems to know.

The second version accesses pixels that are stored consecutively in memory (across the image) , which means it is cache friendly.

The first version, however, accesses pixels that are above each other in the image, but a long way apart in memory, making it very unfriendly to the cache.

I just did a really quick test, and a C debug version processed an 8192*8192 image 6 times faster using the second version! Which makes the first version 'stupid code'.

### #13Hodgman  Moderators

Posted 19 December 2013 - 03:18 PM

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

His point was that you're assuming that your pixels are stored in row-major order in memory (and checking if warnexus will make the same assumption and then think about the cache coherency implications).
C uses the row-major convention for arrays and Fortran uses the column-major convention for arrays. If Fortran were still more popular than C, then maybe our image formats would store their pixels in column-major order?  I thought it was a good joke anyway

Is fast code good or bad?

How to write fast code without optimizing?

I know I am not suppose to optimizing so I want to see if I can still write fast code.

Fast code is good. If code has to become unreadable/unmaintainable/overly-complex in order to become fast, then that's a trade-off, because those things are bad.

To write fast code, you need to know as many details about the computer architecture as possible (e.g. how the CPU works), and know exactly what your code is doing (e.g. what low-level instructions your high-level code will translate to). Then, you are able to think about performance implications while writing it... although this is basically optimizing the code mentally as it comes out of your fingers...

You're not supposed to be optimizing your code if: you don't need to (it's already fast enough for your purposes), if you have better things to be spending your time on, or if the optimizations that you're going to use have more cons than pros (e.g. if they will leave the code so unreadable that no-one will ever be able to understand it again).

### #14jms bc  Members

Posted 19 December 2013 - 04:22 PM

Regarding mark ds's post, I am getting a bit of dissonance here, like I did with the 'squaring terms' vs 'squaring sides' bit a few weeks ago...

const int elems = 3000;
int grid[elems][elems];

int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER startqpc, stopqpc;

::QueryPerformanceFrequency(&startqpc);

::QueryPerformanceCounter(&startqpc);
for(int i=0;i<elems;++i)
for(int j=0;j<elems;++j){
grid[i][j] = 0;
}
::QueryPerformanceCounter(&stopqpc);

::QueryPerformanceCounter(&startqpc);
for(int i=0;i<elems;++i)
for(int j=0;j<elems;++j){
grid[j][i] = 0;
}
::QueryPerformanceCounter(&stopqpc);

std::cin.ignore();

return 0;
}

Output:

Time taken: 0.00935299

Time taken: 0.0843954

which would seem to contradict mark ds.

But, I have a smallish hat:collar ratio so I'm hedging. What is correct here?

The Four Horsemen of Happiness have left.

### #15Álvaro  Members

Posted 19 December 2013 - 04:29 PM

No, that doesn't contradict anything. Pixel(x,y) probably returns a reference to screen[Width * y + x].

### #16Godmil  Members

Posted 19 December 2013 - 04:48 PM

Yeah those loops make sense, to go through a 2D array of [i][j], you want to loop the right most one first. Similarly pictures tend to be saved [height][width].

### #17DvDmanDT  GDNet+

Posted 19 December 2013 - 05:21 PM

Many people have already given you some valuable advice. Cache friendlyness is one thing that is reasonably simple yet can make a large difference. Another thing that can make a huge difference is when and how you allocate your memory. That is especially true if you use a garbage collected language such as C# or Java. In C#, know when to use structs vs classes, because it can really make a difference when used right (or wrong).

### #18BGB  Members

Posted 19 December 2013 - 05:25 PM

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

His point was that you're assuming that your pixels are stored in row-major order in memory (and checking if warnexus will make the same assumption and then think about the cache coherency implications).
C uses the row-major convention for arrays and Fortran uses the column-major convention for arrays. If Fortran were still more popular than C, then maybe our image formats would store their pixels in column-major order?  I thought it was a good joke anyway

Is fast code good or bad?

How to write fast code without optimizing?

I know I am not suppose to optimizing so I want to see if I can still write fast code.

Fast code is good. If code has to become unreadable/unmaintainable/overly-complex in order to become fast, then that's a trade-off, because those things are bad.

To write fast code, you need to know as many details about the computer architecture as possible (e.g. how the CPU works), and know exactly what your code is doing (e.g. what low-level instructions your high-level code will translate to). Then, you are able to think about performance implications while writing it... although this is basically optimizing the code mentally as it comes out of your fingers...

You're not supposed to be optimizing your code if: you don't need to (it's already fast enough for your purposes), if you have better things to be spending your time on, or if the optimizations that you're going to use have more cons than pros (e.g. if they will leave the code so unreadable that no-one will ever be able to understand it again).

this has been a recent issue, especially in my fiddling with all the video stuff.

it needs to be fast to avoid interfering with the framerate, but naively written image-processing and manipulation code can easily become fairly slow.

more so, one may run into other more subtle issues, like you don't want to make multiple passes over an image or largish buffer if it can be avoided, ...

the result then is a bunch of hairy and scary-complicated code to do things like decode video frames all in a single pass and with multiple decode routes depending on which output format is being targeted, ... (ex: RGBA, BGRA, UYVY, DXTC, ...).

so, yeah, it is a tradeoff.

if an entire codebase were written in performance-centric code though, this would just be scary.

the other side of this though is people writing dead-slow code with a "computers keep getting faster so why care?".

the code is often simple, but can waste huge amounts of clock cycles doing almost nothing.

so, there are several levels of optimization:

1, dead slow, such as invoking an RDBMS or XSLT or similar every time the user clicks something (trivial operations can potentially take seconds, ...);

2, moderately slow, such as casually using dynamic memory allocations and run-time type-checks, ...;

3, moderately fast, namely going more directly "from point A to point B", ...

4, extra fast, where the code starts "growing hair" in an attempt to get it faster (micro-optimizations start popping up, ...);

5, faster still, thing starts getting outgrowths of ASM and/or dynamically generated machine-code.

most of my code tends to be 2 or 3, with most of my renderer and similar in group 3, a lot of my video stuff in 4, and a lot of my script VM stuff in 4 and 5.

usually, it depends on what is going on and how much it can impact performance.

code in groups 4 and 5 generally evoke a lot of cries of "optimization is evil", but it is sometimes necessary, and probably shouldn't be done as a general development practice.

the main obvious difference:

usually group 3 code is fast but small, whereas group 4 code tends to become massively larger in an attempt to squeeze more speed out of the problem.

group 5's most obvious feature is that it usually comes with a mountain of #ifdef's and inline ASM and similar.

Edited by BGB, 19 December 2013 - 05:28 PM.

### #19ferrous  Members

Posted 19 December 2013 - 06:11 PM

Yeah, it's not that optimization is evil, it's premature optimization is evil.  But even then, it's the amount of time, and the amount of clarity lost when prematurely optimizing that comes as a factor of whether or not it's worthwhile.

If you optimize code and in the process the whole thing is cleaner and clearer?  That's not premature optimization, unless perhaps you took an extra week to get something done that was supposed to only take three days, and are on a schedule, and the code isn't some vital pathway that needs to be fast.

An example of what could be considered a preoptimization that I always end up doing, is when comparing length, I do the length squared if I can get away with it.  Sure the code may not be hit all that often, but it's a rather simple optimization, and it's one I've done so often that it's very clear when I see something like "radius*radius < distSqr" in code somewhere.

Just because you don't want to prematurely over optimize doesn't mean you should turn your brain off when doing coding.

### #20warnexus  Prime Members

Posted 19 December 2013 - 06:18 PM

One is very very fast in Fortran. The other is very fast in C.

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

I don't know but Bregma above me seems to know.

The second version accesses pixels that are stored consecutively in memory (across the image) , which means it is cache friendly.

The first version, however, accesses pixels that are above each other in the image, but a long way apart in memory, making it very unfriendly to the cache.

I just did a really quick test, and a C debug version processed an 8192*8192 image 6 times faster using the second version! Which makes the first version 'stupid code'.

So the second version would be moving from left to right row by row and the first would be going from bottom to top row by row? I seen the second code somewhere in an open source project. It is nice to know the second version is cache friendly.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.