decompositing and recompositing color (pixel)

Started by
17 comments, last by fir 9 years, 9 months ago

Not really an optimization, but I'd like to throw a little fuel on the fire...


union uColor {
  struct {unsigned char blue, green, red, alpha;};
  unsigned int uint;
  unsigned char channels[4];
};

More importantly, take my advice and use a profiler on your code before you try to optimize it.

thats good, forgot about this i was not using unions for 12 years

it would be maybe good also doing for some other structures like for example triangle (from 3 vertexes) etc sometimes it is good to acces it by named fields but sometimes it would be nice to iterate on this in loop

Advertisement

Ex: BYTE red = (BYTE)((color >> 16) & 0xff);

ps this is also nice of c that it works this way

int x=0x10203040;

unsigned char y = x; //y gives 0x40 - handy thing

when int x=0x102030f0; char y = x; -> y gives (-16) also fine

color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..


Yeah, me too...
You should really be using bitwise-or:
color = (red<<16) | (green<<8) | blue;
There we go -- much better.

color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..


Yeah, me too...
You should really be using bitwise-or:

color = (red<<16) | (green<<8) | blue;
There we go -- much better.

why much? iznt add one cycle and well optymized?

What you want is saturating arithmetic. SSE provides opcodes for this Kind of task

why much? iznt add one cycle and well optymized?

Depends.
Consult your profiler, your compiler, your optimizer, and your target processor's architecture.

color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..


Yeah, me too...
You should really be using bitwise-or:

color = (red<<16) | (green<<8) | blue;
There we go -- much better.

why much? iznt add one cycle and well optymized?

No, there is no time difference, addition and bitwise OR take the same time in most hardware (see the Intel docs on throughput/latency of both instructions, they are identical and quite fast indeed). On (very) old hardware bitwise OR could even be slightly faster since you don't need to carry bits, but good luck measuring that. There is also no runtime difference as long as red, green and blue are no larger than a byte. But it's slightly more readable, because when packing bytes into a single word you are not really doing any addition in the usual sense, you're just.. packing bits. So in this sense bitwise OR is better than addition, not that it matters much (both will give wrong answers if red, green and blue are wider than 8 bits anyway).

And who knows? If you write it with | instead of + a compiler might actually recognize what you're trying to do and use a special CPU instruction that can pack bytes very quickly (unlikely, but perhaps on DSP's - digital signal processors). When writing C or C++ code, you're talking to the compiler, not the CPU. Without resorting to manually written assembly, your code will be going through the compiler, so if your goal is to make your code fast, you had better make your code as clear and to-the-point as possible, so that the compiler can understand your intent better (and, yes, it does try - compilers have many heuristics that recognize common code patterns). Writing convoluted code will just cause the compiler to give up and emit suboptimal code. As an added bonus, compiler-friendly code is also often human-friendly code. Yes, there are exceptions, in some cases you can produce faster code by writing code a certain way in bottleneck situations, and intrinsics are a nice middle ground between standard code and full-on assembly which can boost performance immensely if you use them just right, but to be blunt, by going over your snippets in your various threads, you are really not at this stage yet.

How can you claim with a straight face that you've properly profiled your code "100x more" and identified likely bottlenecks when you are still questioning in this very thread whether bitwise OR is less "optymized" than addition? You keep getting tons of very useful advice that you really should follow, but you keep brushing it off as "propaganda" as if you were too good for it. It's getting very repetitive. If you think you know better, why are you asking for advice? If you are not looking for help, why are you making threads?

My final advice to you is: get off your high horse and face the possibility that you actually might not know everything (or anything) about optimization. Then try and modify your code and see what changes in the resulting assembly to learn what your compiler does and does not do. Read up a bit on how CPU hardware works, and get familiar with at least the basics of your own architecture (probably x86 Pentium 3 or Core 2). Find existing C/C++ code on github or whatever. There have to be dozens of software rasterizers online - you could study a few and see how they implemented various parts of their pipeline. Learn from other people's code, compare it to yours. It is hard work, yes. But asking vague questions on a forum unfortunately only gets you so far - to learn to write fast code, you must work at it. There's no secret. If you don't want to take this advice, your loss. I will have only wasted 15 minutes writing it.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

How can you claim with a straight face that you've properly profiled your code "100x more" and identified likely bottlenecks when you are still questioning in this very thread whether bitwise OR is less "optymized" than addition? You keep getting tons of very useful advice that you really should follow, but you keep brushing it off as "propaganda" as if you were too good for it. It's getting very repetitive. If you think you know better, why are you asking for advice? If you are not looking for help, why are you making threads?

well i dont questioning this - fastcall suggested that this is better so im assking if really (i got some say 'medium/moderate' knowledge on assembly and i suspected that it has not big difference

And who knows? If you write it with | instead of + a compiler might actually recognize what you're trying to do and use a special CPU instruction that can pack bytes very quickly (unlikely, but perhaps on DSP's - digital signal processors). When writing C or C++ code, you're talking to the compiler, not the CPU. Without resorting to manually written assembly, your code will be going through the compiler, so if your goal is to make your code fast, you had better make your code as clear and to-the-point as possible, so that the compiler can understand your intent better (and, yes, it does try - compilers have many heuristics that recognize common code patterns). Writing convoluted code will just cause the compiler to give up and emit suboptimal code. As an added bonus, compiler-friendly code is also often human-friendly code. Yes, there are exceptions, in some cases you can produce faster code by writing code a certain way in bottleneck situations, and intrinsics are a nice middle ground between standard code and full-on assembly which can boost performance immensely if you use them just right, but to be blunt, by going over your snippets in your various threads, you are really not at this stage yet.

the trouble is that at c lewel here you are just not able to fully express your intention both if using chars or using ints - compiler is forced to generate code that would be conformant to many other rules of working of such types not your intentions where you need only some of them

on assembly level optymization there could be not a big difference though but im interested in such kind of things just fopr the science of it - so the constant 'propaganda' agains it (that i should not be interesting in what im interesting) is not to much appriopriate and is a waste of words here

My final advice to you is: get off your high horse and face the possibility that you actually might not know everything (or anything) about optimization. Then try and modify your code and see what changes in the resulting assembly to learn what your compiler does and does not do. Read up a bit on how CPU hardware works, and get familiar with at least the basics of your own architecture (probably x86 Pentium 3 or Core 2). Find existing C/C++ code on github or whatever. There have to be dozens of software rasterizers online - you could study a few and see how they implemented various parts of their pipeline. Learn from other people's code, compare it to yours. It is hard work, yes. But asking vague questions on a forum unfortunately only gets you so far - to learn to write fast code, you must work at it. There's no secret. If you don't want to take this advice, your loss. I will have only wasted 15 minutes writing it.

Im doing that (i mean studyin rasterization, sse assembly and so on, but it goes slow) Forum is for talking so I am both asking here (and also othes sites) and studying it seperately - (forum could be quicker and better) - that is what such kind of forums are for

I know that assembly is not so much popular topic these days so this is maybe a bit of trouble discussing this - [ if I would find a better one for this kind of question i would like to move there ]

ps. my soft "engine" after the previous optymizations,

https://www.dropbox.com/s/b1ae8l2u7tybb2o/tie57.zip

[attachment=22367:tie57.jpg]

now for 1200x1000 i got 40-50-60 ms it would be very nice to move it down to 30-40-50 - but i feel to do this i would need to babble a bit with this intrinsics optymizations

so i welcome if someone would talk on this

This topic is closed to new replies.

Advertisement