# Fast additive blending

This topic is 4853 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello all, Wondering what the fastest way to blend two RGB colors in C(16-bit or 32-bit, empty alpha channels). Thanks

##### Share on other sites
Use the graphics hardware. Or if it has to be in C then it depends. When you say 16 or 32 bit do you mean 16/32 bit float per channel or 16/32 bits for all channels?
Also when you say no alpha channel does that mean you just want to average the two colors together ((a+b)*0.5)?

##### Share on other sites
Hi, its gotta be on the cpu.

I mean 16/32 bits ofr all channel(4-4-4-4 or 8-8-8-8).

By no alpha I mean I'm additively blending the colors w/o taking alpha into account.

Also, not looking for an average, but a sum.

ex. additive_blend( rgb(7,7,7), rgb(10,10,10)) = rgb(17,17,17)

Also I was wondering if there's a way to clamp at the max value w/o using 3 if statements like if (color.r > 15) color.r = 15.

##### Share on other sites
Well it depends on a few things. If its an operation that is only going to be performed on individual colors every so often then
a) performance probably isn't an issue
b) any obvious method will probably be as fast as any other

But if you are doing the operation alot then some sort of SIMD operation would probably be the best route. I can't really help there as SIMD scares me, but there are plenty of references.

Try this.

##### Share on other sites
:) mmm okay... Sorry, I've realized that I can be pretty vague when posting.

- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.

- This needs to be portable, so I can't rely on any hardware specific features.

##### Share on other sites
Quote:
 Original post by Unfadable- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw. - This needs to be portable, so I can't rely on any hardware specific features.
Then why not spend the time on writing specialized versions for platforms with SIMD instruction sets?
If you have to write it in pure C code then I suggest beginning with a prototype in MMX and then translating it back to C while massaging the compiler into generating the original code.
It's painful but it works and I know that at least VectorC is capable of generating saturating instructions.

##### Share on other sites
If you need to do it fast, you're going to have to use some SIMD stuff. Almost all computers nowadays have MMX (back to Pentiums 1's) so it's pretty safe to assume support. If you're storing colour as integers, then that's really all that you need. You *can* use SSE2 to work with more values at a time, but in my experience the gain for trivial operations like this is minimal, as it becomes memory bandwidth bound.

##### Share on other sites
Hello

u32 AddBlend(u32 a, u32 b){	u32 a2,b2,ab,carry;	a2=(a >> 1) & 0x7f7f7f7f;	b2=(b >> 1) & 0x7f7f7f7f;	ab=a2+b2;	carry =(((ab >>  7) & 1) * (~0)) & 0x000000ff; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 15) & 1) * (~0)) & 0x0000ff00; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 23) & 1) * (~0)) & 0x00ff0000; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 31) & 1) * (~0)) & 0xff000000; // *(~0) will compile to NEG (at least with visual studio)	return ((ab << 1) & 0xfefefefe) | carry;}

I know, the saturation part looks quite "interesting" so to speak, if it's faster then the if's, hard to say, i think it depends on the target platform.
You'll lose 1 bit of precision with this method, but clamped values are guaranteed to be at FF, and not FE.

##### Share on other sites
rept, Thank You!

Thats the kind of stuff I'm looking for. I'll check that out and see how that performs.

##### Share on other sites
Here's another method, which doesn't lose any precision. Mostly copied from a page on http://www.df.lth.se/%7Ejohn_e/fr_contrib.html (awesome site)
u32 AddBlend(u32 a, u32 b){    u32 a0, b0, a1, b1, carry;    // Start by separating the bytes, to do it in 2 batches    a0 = a & 0x00ff00ff;    a1 = (a >> 8) & 0x00ff00ff;    b0 = b & 0x00ff00ff;    b1 = (b >> 8) & 0x00ff00ff;    // First batch (bytes 0 and 2)    a0 += b0;    carry = a0 & 0x01000100;   // Carry bits    a0 -= carry;               // Clear carry bits (could also use & 0x00ff00ff)    a0 |= carry - (carry >> 8); // If a carry is set, that byte becomes 0xff    // Second batch (bytes 1 and 3)    a1 += b1;    carry = a1 & 0x01000100;    a1 -= carry;    a1 |= carry - (carry >> 8);    return a0 | (a1 << 8);}

EDIT: Fixed it up so it's interchangable with rept's function

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

(You must login to your GameDev.net account.)

• 18
• 12
• 12
• 11
• 9
• ### Forum Statistics

• Total Topics
634753
• Total Posts
3019146
×