Fast additive blending

Graphics and GPU Programming Programming

Started by Unfadable December 01, 2005 09:15 PM

11 comments, last by ZQJ 18 years, 4 months ago

Unfadable

140

Author

December 01, 2005 09:15 PM

Hello all, Wondering what the fastest way to blend two RGB colors in C(16-bit or 32-bit, empty alpha channels). Thanks

bluntman

255

December 02, 2005 08:20 AM

Use the graphics hardware. Or if it has to be in C then it depends. When you say 16 or 32 bit do you mean 16/32 bit float per channel or 16/32 bits for all channels?
Also when you say no alpha channel does that mean you just want to average the two colors together ((a+b)*0.5)?

Planet rendering.

Unfadable

140

Author

December 02, 2005 08:41 AM

Hi, its gotta be on the cpu.

I mean 16/32 bits ofr all channel(4-4-4-4 or 8-8-8-8).

By no alpha I mean I'm additively blending the colors w/o taking alpha into account.

Also, not looking for an average, but a sum.

ex. additive_blend( rgb(7,7,7), rgb(10,10,10)) = rgb(17,17,17)

Also I was wondering if there's a way to clamp at the max value w/o using 3 if statements like if (color.r > 15) color.r = 15.

bluntman

255

December 02, 2005 09:48 AM

Well it depends on a few things. If its an operation that is only going to be performed on individual colors every so often then
a) performance probably isn't an issue
b) any obvious method will probably be as fast as any other

But if you are doing the operation alot then some sort of SIMD operation would probably be the best route. I can't really help there as SIMD scares me, but there are plenty of references.

Try this.

Planet rendering.

Unfadable

140

Author

December 02, 2005 10:06 AM

:) mmm okay... Sorry, I've realized that I can be pretty vague when posting.

- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.

- This needs to be portable, so I can't rely on any hardware specific features.

doynax

850

December 02, 2005 10:23 AM

Quote:Original post by Unfadable
- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.
- This needs to be portable, so I can't rely on any hardware specific features.

Then why not spend the time on writing specialized versions for platforms with SIMD instruction sets?
If you have to write it in pure C code then I suggest beginning with a prototype in MMX and then translating it back to C while massaging the compiler into generating the original code.
It's painful but it works and I know that at least VectorC is capable of generating saturating instructions.

AndyTX

807

December 02, 2005 11:57 AM

If you need to do it fast, you're going to have to use some SIMD stuff. Almost all computers nowadays have MMX (back to Pentiums 1's) so it's pretty safe to assume support. If you're storing colour as integers, then that's really all that you need. You *can* use SSE2 to work with more values at a time, but in my experience the gain for trivial operations like this is minimal, as it becomes memory bandwidth bound.

rept

157

December 02, 2005 06:35 PM

Hello

u32 AddBlend(u32 a, u32 b){	u32 a2,b2,ab,carry;	a2=(a >> 1) & 0x7f7f7f7f;	b2=(b >> 1) & 0x7f7f7f7f;	ab=a2+b2;	carry =(((ab >>  7) & 1) * (~0)) & 0x000000ff; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 15) & 1) * (~0)) & 0x0000ff00; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 23) & 1) * (~0)) & 0x00ff0000; // *(~0) will compile to NEG (at least with visual studio)	carry|=(((ab >> 31) & 1) * (~0)) & 0xff000000; // *(~0) will compile to NEG (at least with visual studio)	return ((ab << 1) & 0xfefefefe) | carry;}

I know, the saturation part looks quite "interesting" so to speak, if it's faster then the if's, hard to say, i think it depends on the target platform.
You'll lose 1 bit of precision with this method, but clamped values are guaranteed to be at FF, and not FE.

Unfadable

140

Author

December 02, 2005 07:15 PM

rept, Thank You!

Thats the kind of stuff I'm looking for. I'll check that out and see how that performs.

DekuTree64

1,170

December 03, 2005 01:52 AM

Here's another method, which doesn't lose any precision. Mostly copied from a page on http://www.df.lth.se/%7Ejohn_e/fr_contrib.html (awesome site)

u32 AddBlend(u32 a, u32 b){    u32 a0, b0, a1, b1, carry;    // Start by separating the bytes, to do it in 2 batches    a0 = a & 0x00ff00ff;    a1 = (a >> 8) & 0x00ff00ff;    b0 = b & 0x00ff00ff;    b1 = (b >> 8) & 0x00ff00ff;    // First batch (bytes 0 and 2)    a0 += b0;    carry = a0 & 0x01000100;   // Carry bits    a0 -= carry;               // Clear carry bits (could also use & 0x00ff00ff)    a0 |= carry - (carry >> 8); // If a carry is set, that byte becomes 0xff    // Second batch (bytes 1 and 3)    a1 += b1;    carry = a1 & 0x01000100;    a1 -= carry;    a1 |= carry - (carry >> 8);    return a0 | (a1 << 8);}

EDIT: Fixed it up so it's interchangable with rept's function

Fast additive blending

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Fast additive blending

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines