color = (red <<16) + (green<<8) + blue;
this strikes me both as an ugly and probably inefficient..
Yeah, me too...
You should really be using bitwise-or:
color = (red<<16) | (green<<8) | blue;
There we go -- much better.
why much? iznt add one cycle and well optymized?
No, there is no time difference, addition and bitwise OR take the same time in most hardware (see the Intel docs on throughput/latency of both instructions, they are identical and quite fast indeed). On (very) old hardware bitwise OR could even be slightly faster since you don't need to carry bits, but good luck measuring that. There is also no runtime difference as long as red, green and blue are no larger than a byte. But it's slightly more readable, because when packing bytes into a single word you are not really doing any addition in the usual sense, you're just.. packing bits. So in this sense bitwise OR is better than addition, not that it matters much (both will give wrong answers if red, green and blue are wider than 8 bits anyway).
And who knows? If you write it with | instead of + a compiler might actually recognize what you're trying to do and use a special CPU instruction that can pack bytes very quickly (unlikely, but perhaps on DSP's - digital signal processors). When writing C or C++ code, you're talking to the compiler, not the CPU. Without resorting to manually written assembly, your code will be going through the compiler, so if your goal is to make your code fast, you had better make your code as clear and to-the-point as possible, so that the compiler can understand your intent better (and, yes, it does try - compilers have many heuristics that recognize common code patterns). Writing convoluted code will just cause the compiler to give up and emit suboptimal code. As an added bonus, compiler-friendly code is also often human-friendly code. Yes, there are exceptions, in some cases you can produce faster code by writing code a certain way in bottleneck situations, and intrinsics are a nice middle ground between standard code and full-on assembly which can boost performance immensely if you use them just right, but to be blunt, by going over your snippets in your various threads, you are really not at this stage yet.
How can you claim with a straight face that you've properly profiled your code "100x more" and identified likely bottlenecks when you are still questioning in this very thread whether bitwise OR is less "optymized" than addition? You keep getting tons of very useful advice that you really should follow, but you keep brushing it off as "propaganda" as if you were too good for it. It's getting very repetitive. If you think you know better, why are you asking for advice? If you are not looking for help, why are you making threads?
My final advice to you is: get off your high horse and face the possibility that you actually might not know everything (or anything) about optimization. Then try and modify your code and see what changes in the resulting assembly to learn what your compiler does and does not do. Read up a bit on how CPU hardware works, and get familiar with at least the basics of your own architecture (probably x86 Pentium 3 or Core 2). Find existing C/C++ code on github or whatever. There have to be dozens of software rasterizers online - you could study a few and see how they implemented various parts of their pipeline. Learn from other people's code, compare it to yours. It is hard work, yes. But asking vague questions on a forum unfortunately only gets you so far - to learn to write fast code, you must work at it. There's no secret. If you don't want to take this advice, your loss. I will have only wasted 15 minutes writing it.