Jump to content

  • Log In with Google      Sign In   
  • Create Account


How to write fast code without optimizing


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
29 replies to this topic

#1 warnexus   Prime Members   -  Reputation: 1408

Like
0Likes
Like

Posted 19 December 2013 - 01:13 PM

Is fast code good or bad?

 

How to write fast code without optimizing?

 

I know I am not suppose to optimizing so I want to see if I can still write fast code.

 

Edit: Seriously, why the anonymous downvote...?


Edited by warnexus, 19 December 2013 - 10:13 PM.


Sponsor:

#2 SimonForsman   Crossbones+   -  Reputation: 6035

Like
1Likes
Like

Posted 19 December 2013 - 01:15 PM

fast code is good and there is nothing wrong with optimizing your code, just don't waste too much time optimizing parts of your code that don't really matter.

 

The fastest code is code that you don't run.


I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#3 Vortez   Crossbones+   -  Reputation: 2697

Like
0Likes
Like

Posted 19 December 2013 - 01:23 PM

Just try to access your data in a linear fashion, which is cache friendly, like when reading or writing an array for example.

Also, most optimization i've done where just rethinking some algorithm an rewrite them in a more efficient manner.


Edited by Vortez, 19 December 2013 - 01:25 PM.


#4 ApochPiQ   Moderators   -  Reputation: 14891

Like
10Likes
Like

Posted 19 December 2013 - 01:24 PM

Who says you're not supposed to be optimizing? Just don't optimize things that don't matter. To know what matters, measure.



#5 eduardo_costa   Members   -  Reputation: 327

Like
0Likes
Like

Posted 19 December 2013 - 01:33 PM

Fast code first.

Optimization comes with repeating.

After a good deal repeating the solution of the same problem I could finish with both a fast and optimized code.

You can't reach this overnight.

Some techniques I needed to re-write from scratch around 5 or 10 times before grasping the best way to do it.

There were problems only after re-visiting it after a year I could get the idea behind some optimization technique.

So unless you have a good deal of time, you shouldn't try to optimize your code and only finish it.



#6 Álvaro   Crossbones+   -  Reputation: 12866

Like
7Likes
Like

Posted 19 December 2013 - 01:44 PM

Your code should be

 * clear

 * correct

 * fast

... in that order. Some people would place `correct' first, but I would rather have code that I understand even if it doesn't cover every conceivable case, as long as it is clear enough that I know immediately what the not-covered cases are.

 

Don't write stupid code that is needlessly slow, but don't obsess about performance until it's clear that the program is not fast enough. In that case, use your profiler to focus your efforts where you should.



#7 warnexus   Prime Members   -  Reputation: 1408

Like
0Likes
Like

Posted 19 December 2013 - 01:48 PM

Who says you're not supposed to be optimizing? Just don't optimize things that don't matter. To know what matters, measure.

Oh I see.


Edited by warnexus, 19 December 2013 - 01:50 PM.


#8 warnexus   Prime Members   -  Reputation: 1408

Like
0Likes
Like

Posted 19 December 2013 - 01:56 PM

Your code should be

 * clear

 * correct

 * fast

... in that order. Some people would place `correct' first, but I would rather have code that I understand even if it doesn't cover every conceivable case, as long as it is clear enough that I know immediately what the not-covered cases are.

 

Don't write stupid code that is needlessly slow, but don't obsess about performance until it's clear that the program is not fast enough. In that case, use your profiler to focus your efforts where you should.

What makes a code stupid? Unreadable code you mean?


Edited by warnexus, 19 December 2013 - 01:56 PM.


#9 mark ds   Members   -  Reputation: 1176

Like
3Likes
Like

Posted 19 December 2013 - 02:39 PM

 

What makes a code stupid? Unreadable code you mean?

 

 

 

stupid pseudo code:

 

For x = 0 to image_width

  For y = 0 to image_height

    Pixel(x, y) = 0

 

better version:

 

For y = 0 to image_height

  For x = 0 to image_width

    Pixel(x, y) = 0

 

Do you understand the difference between the two?



#10 Bregma   Crossbones+   -  Reputation: 4971

Like
7Likes
Like

Posted 19 December 2013 - 02:44 PM


Do you understand the difference between the two?

Oh!!!  I know, I know!!!!  Pick me!!!1!eleven!

 

One is very very fast in Fortran.  The other is very fast in C.


Stephen M. Webb
Professional Free Software Developer

#11 warnexus   Prime Members   -  Reputation: 1408

Like
0Likes
Like

Posted 19 December 2013 - 02:59 PM

 

 

What makes a code stupid? Unreadable code you mean?

 

 

 

stupid pseudo code:

 

For x = 0 to image_width

  For y = 0 to image_height

    Pixel(x, y) = 0

 

better version:

 

For y = 0 to image_height

  For x = 0 to image_width

    Pixel(x, y) = 0

 

Do you understand the difference between the two?

 

I don't know but Bregma above me seems to know.


Edited by warnexus, 19 December 2013 - 03:01 PM.


#12 mark ds   Members   -  Reputation: 1176

Like
0Likes
Like

Posted 19 December 2013 - 03:01 PM

One is very very fast in Fortran. The other is very fast in C.

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

I don't know but Bregma above me seems to know.

The second version accesses pixels that are stored consecutively in memory (across the image) , which means it is cache friendly.

The first version, however, accesses pixels that are above each other in the image, but a long way apart in memory, making it very unfriendly to the cache.

I just did a really quick test, and a C debug version processed an 8192*8192 image 6 times faster using the second version! Which makes the first version 'stupid code'.

#13 Hodgman   Moderators   -  Reputation: 29308

Like
4Likes
Like

Posted 19 December 2013 - 03:18 PM

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

His point was that you're assuming that your pixels are stored in row-major order in memory (and checking if warnexus will make the same assumption and then think about the cache coherency implications).
C uses the row-major convention for arrays and Fortran uses the column-major convention for arrays. If Fortran were still more popular than C, then maybe our image formats would store their pixels in column-major order? wink.png I thought it was a good joke anyway laugh.png

Is fast code good or bad?
 
How to write fast code without optimizing?
 
I know I am not suppose to optimizing so I want to see if I can still write fast code.

Fast code is good. If code has to become unreadable/unmaintainable/overly-complex in order to become fast, then that's a trade-off, because those things are bad.
 
To write fast code, you need to know as many details about the computer architecture as possible (e.g. how the CPU works), and know exactly what your code is doing (e.g. what low-level instructions your high-level code will translate to). Then, you are able to think about performance implications while writing it... although this is basically optimizing the code mentally as it comes out of your fingers...
 
You're not supposed to be optimizing your code if: you don't need to (it's already fast enough for your purposes), if you have better things to be spending your time on, or if the optimizations that you're going to use have more cons than pros (e.g. if they will leave the code so unreadable that no-one will ever be able to understand it again).



#14 jms bc   Members   -  Reputation: 421

Like
0Likes
Like

Posted 19 December 2013 - 04:22 PM

Regarding mark ds's post, I am getting a bit of dissonance here, like I did with the 'squaring terms' vs 'squaring sides' bit a few weeks ago...

 

 

 

const int elems = 3000;
int grid[elems][elems];

int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER startqpc, stopqpc;


::QueryPerformanceFrequency(&startqpc);
double cpufreq = double(startqpc.QuadPart);


::QueryPerformanceCounter(&startqpc);
for(int i=0;i<elems;++i)
for(int j=0;j<elems;++j){
grid[i][j] = 0;
}
::QueryPerformanceCounter(&stopqpc);


std::cout << "Time taken: " << (stopqpc.QuadPart - startqpc.QuadPart)/cpufreq << std::endl;


::QueryPerformanceCounter(&startqpc);
for(int i=0;i<elems;++i)
for(int j=0;j<elems;++j){
grid[j][i] = 0;
}
::QueryPerformanceCounter(&stopqpc);


std::cout << "Time taken: " << (stopqpc.QuadPart - startqpc.QuadPart)/cpufreq << std::endl;


std::cin.ignore();


return 0;
}

Output:

 

Time taken: 0.00935299

Time taken: 0.0843954

 

which would seem to contradict mark ds. 

 

But, I have a smallish hat:collar ratio so I'm hedging. What is correct here?

 

 


The Four Horsemen of Happiness have left.


#15 Álvaro   Crossbones+   -  Reputation: 12866

Like
0Likes
Like

Posted 19 December 2013 - 04:29 PM

No, that doesn't contradict anything. Pixel(x,y) probably returns a reference to screen[Width * y + x].

#16 Godmil   Members   -  Reputation: 744

Like
0Likes
Like

Posted 19 December 2013 - 04:48 PM

Yeah those loops make sense, to go through a 2D array of [i][j], you want to loop the right most one first. Similarly pictures tend to be saved [height][width].



#17 DvDmanDT   Members   -  Reputation: 847

Like
0Likes
Like

Posted 19 December 2013 - 05:21 PM

Many people have already given you some valuable advice. Cache friendlyness is one thing that is reasonably simple yet can make a large difference. Another thing that can make a huge difference is when and how you allocate your memory. That is especially true if you use a garbage collected language such as C# or Java. In C#, know when to use structs vs classes, because it can really make a difference when used right (or wrong).



#18 BGB   Crossbones+   -  Reputation: 1554

Like
1Likes
Like

Posted 19 December 2013 - 05:25 PM

 

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

His point was that you're assuming that your pixels are stored in row-major order in memory (and checking if warnexus will make the same assumption and then think about the cache coherency implications).
C uses the row-major convention for arrays and Fortran uses the column-major convention for arrays. If Fortran were still more popular than C, then maybe our image formats would store their pixels in column-major order? wink.png I thought it was a good joke anyway laugh.png

Is fast code good or bad?
 
How to write fast code without optimizing?
 
I know I am not suppose to optimizing so I want to see if I can still write fast code.

Fast code is good. If code has to become unreadable/unmaintainable/overly-complex in order to become fast, then that's a trade-off, because those things are bad.
 
To write fast code, you need to know as many details about the computer architecture as possible (e.g. how the CPU works), and know exactly what your code is doing (e.g. what low-level instructions your high-level code will translate to). Then, you are able to think about performance implications while writing it... although this is basically optimizing the code mentally as it comes out of your fingers...
 
You're not supposed to be optimizing your code if: you don't need to (it's already fast enough for your purposes), if you have better things to be spending your time on, or if the optimizations that you're going to use have more cons than pros (e.g. if they will leave the code so unreadable that no-one will ever be able to understand it again).

 

 

 

this has been a recent issue, especially in my fiddling with all the video stuff.

 

it needs to be fast to avoid interfering with the framerate, but naively written image-processing and manipulation code can easily become fairly slow.

 

more so, one may run into other more subtle issues, like you don't want to make multiple passes over an image or largish buffer if it can be avoided, ...

the result then is a bunch of hairy and scary-complicated code to do things like decode video frames all in a single pass and with multiple decode routes depending on which output format is being targeted, ... (ex: RGBA, BGRA, UYVY, DXTC, ...).

 

so, yeah, it is a tradeoff.

if an entire codebase were written in performance-centric code though, this would just be scary.

 

 

the other side of this though is people writing dead-slow code with a "computers keep getting faster so why care?".

the code is often simple, but can waste huge amounts of clock cycles doing almost nothing.

 

so, there are several levels of optimization:

1, dead slow, such as invoking an RDBMS or XSLT or similar every time the user clicks something (trivial operations can potentially take seconds, ...);

2, moderately slow, such as casually using dynamic memory allocations and run-time type-checks, ...;

3, moderately fast, namely going more directly "from point A to point B", ...

4, extra fast, where the code starts "growing hair" in an attempt to get it faster (micro-optimizations start popping up, ...);

5, faster still, thing starts getting outgrowths of ASM and/or dynamically generated machine-code.

 

most of my code tends to be 2 or 3, with most of my renderer and similar in group 3, a lot of my video stuff in 4, and a lot of my script VM stuff in 4 and 5.

 

usually, it depends on what is going on and how much it can impact performance.

 

 

code in groups 4 and 5 generally evoke a lot of cries of "optimization is evil", but it is sometimes necessary, and probably shouldn't be done as a general development practice.

 

the main obvious difference:

usually group 3 code is fast but small, whereas group 4 code tends to become massively larger in an attempt to squeeze more speed out of the problem.

group 5's most obvious feature is that it usually comes with a mountain of #ifdef's and inline ASM and similar.


Edited by BGB, 19 December 2013 - 05:28 PM.


#19 ferrous   Members   -  Reputation: 1915

Like
0Likes
Like

Posted 19 December 2013 - 06:11 PM

Yeah, it's not that optimization is evil, it's premature optimization is evil.  But even then, it's the amount of time, and the amount of clarity lost when prematurely optimizing that comes as a factor of whether or not it's worthwhile.

 

If you optimize code and in the process the whole thing is cleaner and clearer?  That's not premature optimization, unless perhaps you took an extra week to get something done that was supposed to only take three days, and are on a schedule, and the code isn't some vital pathway that needs to be fast.

 

An example of what could be considered a preoptimization that I always end up doing, is when comparing length, I do the length squared if I can get away with it.  Sure the code may not be hit all that often, but it's a rather simple optimization, and it's one I've done so often that it's very clear when I see something like "radius*radius < distSqr" in code somewhere. 

 

Just because you don't want to prematurely over optimize doesn't mean you should turn your brain off when doing coding.



#20 warnexus   Prime Members   -  Reputation: 1408

Like
0Likes
Like

Posted 19 December 2013 - 06:18 PM

 

One is very very fast in Fortran. The other is very fast in C.

I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol

I don't know but Bregma above me seems to know.

The second version accesses pixels that are stored consecutively in memory (across the image) , which means it is cache friendly.

The first version, however, accesses pixels that are above each other in the image, but a long way apart in memory, making it very unfriendly to the cache.

I just did a really quick test, and a C debug version processed an 8192*8192 image 6 times faster using the second version! Which makes the first version 'stupid code'.

 

So the second version would be moving from left to right row by row and the first would be going from bottom to top row by row? I seen the second code somewhere in an open source project. It is nice to know the second version is cache friendly.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS