Jump to content

  • Log In with Google      Sign In   
  • Create Account


decompositing and recompositing color (pixel)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
18 replies to this topic

#1 fir   Members   -  Reputation: -452

Like
0Likes
Like

Posted 27 June 2014 - 09:50 AM

In my kind of app i quite often need to do that, color of the pixel is usually unsigned value in the format of ARGB so to do something with it (like dimming color mixling, adding etc) I need to decomposite it something like that

 

int red = (color >> 16) & 0xff;

int green = (color >> 8) & 0xff;

int blue = (color ) & 0xff;

 

then do something with this then recomposite this like

 

if(red<0) red = 0;

if(green<0) green = 0;

if(blue<0) blue = 0;

 
if(red>255) red = 255;
if(green>255) green = 255;
if(blue>255) blue = 255;
 
color = (red <<16) + (green<<8) + blue;
 
this strikes me both as an ugly and probably inefficient.. is there maybe some
way to make this better? (this decomposition and recomposition )
 

yet if doing this what should i use for this intermediate values (I mark bold up there)

should it be int or maybe unsigned char?

 



Sponsor:

#2 Vortez   Crossbones+   -  Reputation: 2697

Like
0Likes
Like

Posted 27 June 2014 - 10:17 AM

int red = (color >> 16) & 0xff;
int green = (color >> 8) & 0xff;
int blue = (color ) & 0xff;

 

Why store them in a int? Use a BYTE instead(or even better, a struct of 3 bytes), which make this

 

Ex: BYTE red = (BYTE)((color >> 16) & 0xff);

 


if(red<0) red = 0;
if(green<0) green = 0;
if(blue<0) blue = 0;

if(red>255) red = 255;
if(green>255) green = 255;
if(blue>255) blue = 255;

 

totally unnecessary.

 

When "recomposing", use a DWORD or UINT instead of a int, no need for a signed variable in this code.


Edited by Vortez, 27 June 2014 - 10:25 AM.


#3 fir   Members   -  Reputation: -452

Like
0Likes
Like

Posted 27 June 2014 - 11:36 AM

 

Why store them in a int? 

 

some operations on such r g b are making overflow (for example adding on ergb to another) often i need a saturation there so need to use int and yet clip it with ifs - though such ynpacking and repacking seem overhead to me but i dont know what to do with that

 

also for cases when unsigned char would suffice im not sure if stating unsigned char, r, g, b will not do paradoxally things slower as compiler would need to do such arithmetic constrained to unsigned char where

it may be easier to him operate on processor words - hard to say,

 

in general passing color as one unsigned int is more handy to me but it seem (though not sure if it has some "> 0" effect on real efficiency) that passing this as separate three values and thus avoiding some of this unpackin/packing could be (theoretically) a bit quicker- but as i said

im not quite sure



#4 Vortez   Crossbones+   -  Reputation: 2697

Like
8Likes
Like

Posted 27 June 2014 - 03:44 PM

As other pointed out to you many times, just stop worrying about micro-optimization like that, today, it's almost impossible to beat the compiler optimizations in release builds. I tried it (something similar to this, color packing/unpacking), with MMX and SSE, i could beat a debug build, but not a release build (it was a tie), because you know what, the compiler(visual studio) in release build use SSE optimizations when it can so i basically did that test for nothing, except learning that such asm optimization are worthless now, in most cases.

 

With that said, it's always usefull to do some profiling to find the real bottlenecks and optimize the algorithms where it count.


Edited by Vortez, 27 June 2014 - 03:56 PM.


#5 HappyCoder   Members   -  Reputation: 2556

Like
3Likes
Like

Posted 27 June 2014 - 10:25 PM

I agree with Vortez. Optimize it when it becomes a problem. Unless you are doing some serious image processing it shouldn't be a big problem.

 

I would also recommend packing the color in struct

struct Color
{
    unsigned char r, g, b, a;

    // constructors, operators, ect
};

Behind the scenes, the compiler will be doing the bit mask and bit shifts for you but with much cleaner code.

 

EDIT: I assumed you are using c++. Is that correct?


Edited by HappyCoder, 27 June 2014 - 10:28 PM.


#6 Samith   Members   -  Reputation: 2140

Like
4Likes
Like

Posted 27 June 2014 - 11:51 PM

EDIT: It's late...

 

Like everyone else in this thread, I think these kinds of micro-optimizations are usually wasted effort. If I were you, I would try to determine if I was performing more decomposition/recompositions than I needed to and try to minimize that, first. I usually find way bigger performance gains by making my code do less stuff than I do by trying to make my code do more stuff quickly.


Edited by Samith, 27 June 2014 - 11:54 PM.


#7 fir   Members   -  Reputation: -452

Like
-5Likes
Like

Posted 28 June 2014 - 02:34 AM

I agree with Vortez. Optimize it when it becomes a problem. Unless you are doing some serious image processing it shouldn't be a big problem.

 

I would also recommend packing the color in struct

struct Color
{
    unsigned char r, g, b, a;

    // constructors, operators, ect
};

Behind the scenes, the compiler will be doing the bit mask and bit shifts for you but with much cleaner code.

 

EDIT: I assumed you are using c++. Is that correct?

 

im using c [but compile in c++ mode ]

 

this with struct is maybe a good hint, tnx, i forgot this option

 

1) if my color mode is ARGB , i mean blue is lowest bits (0-7)

shouldnt it be 

 

struct Color { unsigned char b, g, r, a};

 

 

Im not sure if such structs are organized in the endiann of machine

 or endian independant 

 

then i could probably use it the way with casting

 

though Im not shure if I would use it how it would be passed and hold in the memory and code (if in one register or if in 4?) - if i just will pass this by value foo(Color color) will it be passed just like 32bit unsigned int

or in some other way?

 

 

As to "advices" dont do that - I was writing about this before - this is not an answer but the thing i call "propaganda" (this is more trashing this forum (with unvaluable propaganda that is repeated with no change) than proper technical speakin), also this "profile your code to find if this is a bottleneck" is a propaganda - i hear it 20-th time here (literrally! or close about) so no need to repeating 60-th 70-th time

- specifically as im doing proffiling propably 100X more than those propaganda givers 

 

((1)accidentally this is in my bottleneck code of some shading /coloring 100k triangles per frame (even if it would be not i just like to understand some code so propaganda is not suitable for this attitude (2) as to such optymizations i often profile and optymize and find in group all this kind or microoptymizations speeds up my code on the contrary to the propaganda people here say

(recent case i started with frame time nearly 35 ms when searching hardly for any case of microoptymizations i could use in my mind droppeddown to 16.5 ms )


Edited by fir, 28 June 2014 - 03:25 AM.


#8 Khatharr   Crossbones+   -  Reputation: 2957

Like
3Likes
Like

Posted 28 June 2014 - 02:49 AM

Not really an optimization, but I'd like to throw a little fuel on the fire...

union uColor {
  struct {unsigned char blue, green, red, alpha;};
  unsigned int uint;
  unsigned char channels[4];
};

More importantly, take my advice and use a profiler on your code before you try to optimize it.


void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

#9 Ohforf sake   Members   -  Reputation: 1788

Like
1Likes
Like

Posted 28 June 2014 - 03:05 AM

As to "advices" dont do that - I was writing about this before - this is not an answer but the thing i call "propaganda" (this is more trashing this forum (with unvaluable propaganda that is repeated with no change) than proper technical speakin), also this "profile your code to find if this is a bottleneck" is a propaganda - i hear it 20-th time here (literrally! or close about) so no need to repeating 60-th 70-th time


You know, I was about to give you a code snippet that shows, how this can be done in SSE using pack/unpack instructions, but man, you really have a way of discouraging people from helping you.

#10 fir   Members   -  Reputation: -452

Like
-3Likes
Like

Posted 28 June 2014 - 03:12 AM

 

As to "advices" dont do that - I was writing about this before - this is not an answer but the thing i call "propaganda" (this is more trashing this forum (with unvaluable propaganda that is repeated with no change) than proper technical speakin), also this "profile your code to find if this is a bottleneck" is a propaganda - i hear it 20-th time here (literrally! or close about) so no need to repeating 60-th 70-th time


You know, I was about to give you a code snippet that shows, how this can be done in SSE using pack/unpack instructions, but man, you really have a way of discouraging people from helping you.

 

heh, if you are interested in such optymizations i think you should better understand what im saying about this antyoptymizing (and mertithoricaly invaluable propaganda that is so often repeated here) - but you seem not - but imo you should

what is so hard to understand here - those propaganda is realy invaluable for someone who want to do this anyway ;\

 

sse intrinsics? ye i forgot i had to learn it ;k


Edited by fir, 28 June 2014 - 03:26 AM.


#11 fir   Members   -  Reputation: -452

Like
0Likes
Like

Posted 28 June 2014 - 03:17 AM

Not really an optimization, but I'd like to throw a little fuel on the fire...

union uColor {
  struct {unsigned char blue, green, red, alpha;};
  unsigned int uint;
  unsigned char channels[4];
};

More importantly, take my advice and use a profiler on your code before you try to optimize it.

 

thats good, forgot about this i was not using unions for 12 years

it would be maybe good also doing for some other structures like for example triangle (from 3 vertexes) etc sometimes it is good to acces it by named fields but sometimes it would be nice to iterate on this in loop



#12 fir   Members   -  Reputation: -452

Like
0Likes
Like

Posted 28 June 2014 - 03:39 AM

 

Ex: BYTE red = (BYTE)((color >> 16) & 0xff);

 

 

ps this is also nice of c that it works this way

 

int x=0x10203040;

 

unsigned char y = x;   //y gives 0x40 - handy thing

 

when int x=0x102030f0;  char y = x; -> y gives  (-16) also fine



#13 fastcall22   Crossbones+   -  Reputation: 4221

Like
1Likes
Like

Posted 28 June 2014 - 07:17 AM

color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..


Yeah, me too...
You should really be using bitwise-or:
color = (red<<16) | (green<<8) | blue;
There we go -- much better.

Edited by fastcall22, 28 June 2014 - 07:18 AM.

c3RhdGljIGNoYXIgeW91cl9tb21bMVVMTCA8PCA2NF07CnNwcmludGYoeW91cl9tb20sICJpcyBmYXQiKTs=

#14 fir   Members   -  Reputation: -452

Like
0Likes
Like

Posted 28 June 2014 - 07:32 AM

 

color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..


Yeah, me too...
You should really be using bitwise-or:
color = (red<<16) | (green<<8) | blue;
There we go -- much better.

 

why much? iznt add one cycle and well optymized?



#15 Madhed   Crossbones+   -  Reputation: 2775

Like
0Likes
Like

Posted 28 June 2014 - 08:00 AM

What you want is saturating arithmetic. SSE provides opcodes for this Kind of task

#16 fastcall22   Crossbones+   -  Reputation: 4221

Like
0Likes
Like

Posted 28 June 2014 - 08:00 AM

why much? iznt add one cycle and well optymized?

Depends.
Consult your profiler, your compiler, your optimizer, and your target processor's architecture.
c3RhdGljIGNoYXIgeW91cl9tb21bMVVMTCA8PCA2NF07CnNwcmludGYoeW91cl9tb20sICJpcyBmYXQiKTs=

#17 Bacterius   Crossbones+   -  Reputation: 8478

Like
5Likes
Like

Posted 28 June 2014 - 08:15 AM

 

 

color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..


Yeah, me too...
You should really be using bitwise-or:
color = (red<<16) | (green<<8) | blue;
There we go -- much better.

 

why much? iznt add one cycle and well optymized?

 

 

No, there is no time difference, addition and bitwise OR take the same time in most hardware (see the Intel docs on throughput/latency of both instructions, they are identical and quite fast indeed). On (very) old hardware bitwise OR could even be slightly faster since you don't need to carry bits, but good luck measuring that. There is also no runtime difference as long as red, green and blue are no larger than a byte. But it's slightly more readable, because when packing bytes into a single word you are not really doing any addition in the usual sense, you're just.. packing bits. So in this sense bitwise OR is better than addition, not that it matters much (both will give wrong answers if red, green and blue are wider than 8 bits anyway).

 

And who knows? If you write it with | instead of + a compiler might actually recognize what you're trying to do and use a special CPU instruction that can pack bytes very quickly (unlikely, but perhaps on DSP's - digital signal processors). When writing C or C++ code, you're talking to the compiler, not the CPU. Without resorting to manually written assembly, your code will be going through the compiler, so if your goal is to make your code fast, you had better make your code as clear and to-the-point as possible, so that the compiler can understand your intent better (and, yes, it does try - compilers have many heuristics that recognize common code patterns). Writing convoluted code will just cause the compiler to give up and emit suboptimal code. As an added bonus, compiler-friendly code is also often human-friendly code. Yes, there are exceptions, in some cases you can produce faster code by writing code a certain way in bottleneck situations, and intrinsics are a nice middle ground between standard code and full-on assembly which can boost performance immensely if you use them just right, but to be blunt, by going over your snippets in your various threads, you are really not at this stage yet.

 

How can you claim with a straight face that you've properly profiled your code "100x more" and identified likely bottlenecks when you are still questioning in this very thread whether bitwise OR is less "optymized" than addition? You keep getting tons of very useful advice that you really should follow, but you keep brushing it off as "propaganda" as if you were too good for it. It's getting very repetitive. If you think you know better, why are you asking for advice? If you are not looking for help, why are you making threads?

 

My final advice to you is: get off your high horse and face the possibility that you actually might not know everything (or anything) about optimization. Then try and modify your code and see what changes in the resulting assembly to learn what your compiler does and does not do. Read up a bit on how CPU hardware works, and get familiar with at least the basics of your own architecture (probably x86 Pentium 3 or Core 2). Find existing C/C++ code on github or whatever. There have to be dozens of software rasterizers online - you could study a few and see how they implemented various parts of their pipeline. Learn from other people's code, compare it to yours. It is hard work, yes. But asking vague questions on a forum unfortunately only gets you so far - to learn to write fast code, you must work at it. There's no secret. If you don't want to take this advice, your loss. I will have only wasted 15 minutes writing it.


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#18 fir   Members   -  Reputation: -452

Like
-1Likes
Like

Posted 28 June 2014 - 08:57 AM

 

How can you claim with a straight face that you've properly profiled your code "100x more" and identified likely bottlenecks when you are still questioning in this very thread whether bitwise OR is less "optymized" than addition? You keep getting tons of very useful advice that you really should follow, but you keep brushing it off as "propaganda" as if you were too good for it. It's getting very repetitive. If you think you know better, why are you asking for advice? If you are not looking for help, why are you making threads?

 

 

well i dont questioning this - fastcall suggested that this is better so im assking if really (i got some say 'medium/moderate' knowledge on assembly and i suspected that it has not big difference
 
 

 

 

And who knows? If you write it with | instead of + a compiler might actually recognize what you're trying to do and use a special CPU instruction that can pack bytes very quickly (unlikely, but perhaps on DSP's - digital signal processors). When writing C or C++ code, you're talking to the compiler, not the CPU. Without resorting to manually written assembly, your code will be going through the compiler, so if your goal is to make your code fast, you had better make your code as clear and to-the-point as possible, so that the compiler can understand your intent better (and, yes, it does try - compilers have many heuristics that recognize common code patterns). Writing convoluted code will just cause the compiler to give up and emit suboptimal code. As an added bonus, compiler-friendly code is also often human-friendly code. Yes, there are exceptions, in some cases you can produce faster code by writing code a certain way in bottleneck situations, and intrinsics are a nice middle ground between standard code and full-on assembly which can boost performance immensely if you use them just right, but to be blunt, by going over your snippets in your various threads, you are really not at this stage yet.

 

 

the trouble is that at c lewel here you are just not able to fully express your intention both if using chars or using ints - compiler is forced to generate code that would be conformant to many other rules of working of such types not your intentions where you need only some of them

 

on assembly level optymization there could be not a big difference though but im interested in such kind of things just fopr the science of it - so the constant 'propaganda' agains it (that i should not be interesting in what im interesting) is not to much appriopriate and is a waste of words here



#19 fir   Members   -  Reputation: -452

Like
0Likes
Like

Posted 28 June 2014 - 09:06 AM

 

My final advice to you is: get off your high horse and face the possibility that you actually might not know everything (or anything) about optimization. Then try and modify your code and see what changes in the resulting assembly to learn what your compiler does and does not do. Read up a bit on how CPU hardware works, and get familiar with at least the basics of your own architecture (probably x86 Pentium 3 or Core 2). Find existing C/C++ code on github or whatever. There have to be dozens of software rasterizers online - you could study a few and see how they implemented various parts of their pipeline. Learn from other people's code, compare it to yours. It is hard work, yes. But asking vague questions on a forum unfortunately only gets you so far - to learn to write fast code, you must work at it. There's no secret. If you don't want to take this advice, your loss. I will have only wasted 15 minutes writing it.

 

 

Im doing that (i mean studyin rasterization, sse assembly and so on, but it goes slow)    Forum is for talking so  I am both asking here (and also othes sites) and studying it seperately - (forum could be quicker and better) - that is what such kind of forums are for 

 

I know that assembly is not so much popular topic these days so this is maybe a bit of trouble discussing this - [ if I would find a better one for this kind of question i would like to move there ]

 

ps. my soft "engine" after the previous optymizations,

 

https://www.dropbox.com/s/b1ae8l2u7tybb2o/tie57.zip

 

tie57.jpg

 

now for 1200x1000 i got 40-50-60 ms it would be very nice to move it down to 30-40-50 - but i feel to do this i would need to babble a bit with this intrinsics optymizations

so i welcome if someone would talk on this

 






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS