I wasn't aware that Fortran swizzled image data in memory ;-) Perhaps you were thinking of two dimensional arrays? lol
His point was that you're assuming that your pixels are stored in row-major order in memory (and checking if warnexus will make the same assumption and then think about the cache coherency implications).
C uses the row-major convention for arrays and Fortran uses the column-major convention for arrays. If Fortran were still more popular than C, then maybe our image formats would store their pixels in column-major order? I thought it was a good joke anyway
Is fast code good or bad?
How to write fast code without optimizing?
I know I am not suppose to optimizing so I want to see if I can still write fast code.
Fast code is good. If code has to become unreadable/unmaintainable/overly-complex in order to become fast, then that's a trade-off, because those things are bad.
To write fast code, you need to know as many details about the computer architecture as possible (e.g. how the CPU works), and know exactly what your code is doing (e.g. what low-level instructions your high-level code will translate to). Then, you are able to think about performance implications while writing it... although this is basically optimizing the code mentally as it comes out of your fingers...
You're not supposed to be optimizing your code if: you don't need to (it's already fast enough for your purposes), if you have better things to be spending your time on, or if the optimizations that you're going to use have more cons than pros (e.g. if they will leave the code so unreadable that no-one will ever be able to understand it again).
this has been a recent issue, especially in my fiddling with all the video stuff.
it needs to be fast to avoid interfering with the framerate, but naively written image-processing and manipulation code can easily become fairly slow.
more so, one may run into other more subtle issues, like you don't want to make multiple passes over an image or largish buffer if it can be avoided, ...
the result then is a bunch of hairy and scary-complicated code to do things like decode video frames all in a single pass and with multiple decode routes depending on which output format is being targeted, ... (ex: RGBA, BGRA, UYVY, DXTC, ...).
so, yeah, it is a tradeoff.
if an entire codebase were written in performance-centric code though, this would just be scary.
the other side of this though is people writing dead-slow code with a "computers keep getting faster so why care?".
the code is often simple, but can waste huge amounts of clock cycles doing almost nothing.
so, there are several levels of optimization:
1, dead slow, such as invoking an RDBMS or XSLT or similar every time the user clicks something (trivial operations can potentially take seconds, ...);
2, moderately slow, such as casually using dynamic memory allocations and run-time type-checks, ...;
3, moderately fast, namely going more directly "from point A to point B", ...
4, extra fast, where the code starts "growing hair" in an attempt to get it faster (micro-optimizations start popping up, ...);
5, faster still, thing starts getting outgrowths of ASM and/or dynamically generated machine-code.
most of my code tends to be 2 or 3, with most of my renderer and similar in group 3, a lot of my video stuff in 4, and a lot of my script VM stuff in 4 and 5.
usually, it depends on what is going on and how much it can impact performance.
code in groups 4 and 5 generally evoke a lot of cries of "optimization is evil", but it is sometimes necessary, and probably shouldn't be done as a general development practice.
the main obvious difference:
usually group 3 code is fast but small, whereas group 4 code tends to become massively larger in an attempt to squeeze more speed out of the problem.
group 5's most obvious feature is that it usually comes with a mountain of #ifdef's and inline ASM and similar.
Edited by BGB, 19 December 2013 - 05:28 PM.