Fast additive blending

Started by
11 comments, last by ZQJ 18 years, 4 months ago
Hello all, Wondering what the fastest way to blend two RGB colors in C(16-bit or 32-bit, empty alpha channels). Thanks
Advertisement
Use the graphics hardware. Or if it has to be in C then it depends. When you say 16 or 32 bit do you mean 16/32 bit float per channel or 16/32 bits for all channels?
Also when you say no alpha channel does that mean you just want to average the two colors together ((a+b)*0.5)?
Hi, its gotta be on the cpu.

I mean 16/32 bits ofr all channel(4-4-4-4 or 8-8-8-8).

By no alpha I mean I'm additively blending the colors w/o taking alpha into account.

Also, not looking for an average, but a sum.

ex. additive_blend( rgb(7,7,7), rgb(10,10,10)) = rgb(17,17,17)

Also I was wondering if there's a way to clamp at the max value w/o using 3 if statements like if (color.r > 15) color.r = 15.

Well it depends on a few things. If its an operation that is only going to be performed on individual colors every so often then
a) performance probably isn't an issue
b) any obvious method will probably be as fast as any other

But if you are doing the operation alot then some sort of SIMD operation would probably be the best route. I can't really help there as SIMD scares me, but there are plenty of references.

Try this.
:) mmm okay... Sorry, I've realized that I can be pretty vague when posting.

- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.

- This needs to be portable, so I can't rely on any hardware specific features.
Quote:Original post by Unfadable
- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.
- This needs to be portable, so I can't rely on any hardware specific features.
Then why not spend the time on writing specialized versions for platforms with SIMD instruction sets?
If you have to write it in pure C code then I suggest beginning with a prototype in MMX and then translating it back to C while massaging the compiler into generating the original code.
It's painful but it works and I know that at least VectorC is capable of generating saturating instructions.
If you need to do it fast, you're going to have to use some SIMD stuff. Almost all computers nowadays have MMX (back to Pentiums 1's) so it's pretty safe to assume support. If you're storing colour as integers, then that's really all that you need. You *can* use SSE2 to work with more values at a time, but in my experience the gain for trivial operations like this is minimal, as it becomes memory bandwidth bound.
Hello


u32 AddBlend(u32 a, u32 b){	u32 a2,b2,ab,carry;	a2=(a >> 1) & 0x7f7f7f7f;	b2=(b >> 1) & 0x7f7f7f7f;	ab=a2+b2;	carry =(((ab >>  7) & 1) * (~0)) & 0x000000ff; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 15) & 1) * (~0)) & 0x0000ff00; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 23) & 1) * (~0)) & 0x00ff0000; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 31) & 1) * (~0)) & 0xff000000; // *(~0) will compile to NEG (at least with visual studio)	return ((ab << 1) & 0xfefefefe) | carry;}


I know, the saturation part looks quite "interesting" so to speak, if it's faster then the if's, hard to say, i think it depends on the target platform.
You'll lose 1 bit of precision with this method, but clamped values are guaranteed to be at FF, and not FE.
rept, Thank You!

Thats the kind of stuff I'm looking for. I'll check that out and see how that performs.
Here's another method, which doesn't lose any precision. Mostly copied from a page on http://www.df.lth.se/%7Ejohn_e/fr_contrib.html (awesome site)
u32 AddBlend(u32 a, u32 b){    u32 a0, b0, a1, b1, carry;    // Start by separating the bytes, to do it in 2 batches    a0 = a & 0x00ff00ff;    a1 = (a >> 8) & 0x00ff00ff;    b0 = b & 0x00ff00ff;    b1 = (b >> 8) & 0x00ff00ff;    // First batch (bytes 0 and 2)    a0 += b0;    carry = a0 & 0x01000100;   // Carry bits    a0 -= carry;               // Clear carry bits (could also use & 0x00ff00ff)    a0 |= carry - (carry >> 8); // If a carry is set, that byte becomes 0xff    // Second batch (bytes 1 and 3)    a1 += b1;    carry = a1 & 0x01000100;    a1 -= carry;    a1 |= carry - (carry >> 8);    return a0 | (a1 << 8);}

EDIT: Fixed it up so it's interchangable with rept's function

This topic is closed to new replies.

Advertisement