Jump to content
  • Advertisement
Sign in to follow this  
Unfadable

Fast additive blending

This topic is 4853 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement
Use the graphics hardware. Or if it has to be in C then it depends. When you say 16 or 32 bit do you mean 16/32 bit float per channel or 16/32 bits for all channels?
Also when you say no alpha channel does that mean you just want to average the two colors together ((a+b)*0.5)?

Share this post


Link to post
Share on other sites
Hi, its gotta be on the cpu.

I mean 16/32 bits ofr all channel(4-4-4-4 or 8-8-8-8).

By no alpha I mean I'm additively blending the colors w/o taking alpha into account.

Also, not looking for an average, but a sum.

ex. additive_blend( rgb(7,7,7), rgb(10,10,10)) = rgb(17,17,17)

Also I was wondering if there's a way to clamp at the max value w/o using 3 if statements like if (color.r > 15) color.r = 15.

Share this post


Link to post
Share on other sites
Well it depends on a few things. If its an operation that is only going to be performed on individual colors every so often then
a) performance probably isn't an issue
b) any obvious method will probably be as fast as any other

But if you are doing the operation alot then some sort of SIMD operation would probably be the best route. I can't really help there as SIMD scares me, but there are plenty of references.

Try this.

Share this post


Link to post
Share on other sites
:) mmm okay... Sorry, I've realized that I can be pretty vague when posting.

- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.

- This needs to be portable, so I can't rely on any hardware specific features.

Share this post


Link to post
Share on other sites
Quote:
Original post by Unfadable
- Performance is going to be an issue cause this operation can potentially cover large areas of the screen and is subject to overdraw.
- This needs to be portable, so I can't rely on any hardware specific features.
Then why not spend the time on writing specialized versions for platforms with SIMD instruction sets?
If you have to write it in pure C code then I suggest beginning with a prototype in MMX and then translating it back to C while massaging the compiler into generating the original code.
It's painful but it works and I know that at least VectorC is capable of generating saturating instructions.

Share this post


Link to post
Share on other sites
If you need to do it fast, you're going to have to use some SIMD stuff. Almost all computers nowadays have MMX (back to Pentiums 1's) so it's pretty safe to assume support. If you're storing colour as integers, then that's really all that you need. You *can* use SSE2 to work with more values at a time, but in my experience the gain for trivial operations like this is minimal, as it becomes memory bandwidth bound.

Share this post


Link to post
Share on other sites
Hello



u32 AddBlend(u32 a, u32 b)
{
u32 a2,b2,ab,carry;

a2=(a >> 1) & 0x7f7f7f7f;
b2=(b >> 1) & 0x7f7f7f7f;
ab=a2+b2;

carry =(((ab >> 7) & 1) * (~0)) & 0x000000ff; // *(~0) will compile to NEG (at least with visual studio)
carry|=(((ab >> 15) & 1) * (~0)) & 0x0000ff00; // *(~0) will compile to NEG (at least with visual studio)
carry|=(((ab >> 23) & 1) * (~0)) & 0x00ff0000; // *(~0) will compile to NEG (at least with visual studio)
carry|=(((ab >> 31) & 1) * (~0)) & 0xff000000; // *(~0) will compile to NEG (at least with visual studio)

return ((ab << 1) & 0xfefefefe) | carry;
}



I know, the saturation part looks quite "interesting" so to speak, if it's faster then the if's, hard to say, i think it depends on the target platform.
You'll lose 1 bit of precision with this method, but clamped values are guaranteed to be at FF, and not FE.

Share this post


Link to post
Share on other sites
Here's another method, which doesn't lose any precision. Mostly copied from a page on http://www.df.lth.se/%7Ejohn_e/fr_contrib.html (awesome site)
u32 AddBlend(u32 a, u32 b)
{
u32 a0, b0, a1, b1, carry;

// Start by separating the bytes, to do it in 2 batches
a0 = a & 0x00ff00ff;
a1 = (a >> 8) & 0x00ff00ff;
b0 = b & 0x00ff00ff;
b1 = (b >> 8) & 0x00ff00ff;

// First batch (bytes 0 and 2)
a0 += b0;
carry = a0 & 0x01000100; // Carry bits
a0 -= carry; // Clear carry bits (could also use & 0x00ff00ff)
a0 |= carry - (carry >> 8); // If a carry is set, that byte becomes 0xff

// Second batch (bytes 1 and 3)
a1 += b1;
carry = a1 & 0x01000100;
a1 -= carry;
a1 |= carry - (carry >> 8);

return a0 | (a1 << 8);
}

EDIT: Fixed it up so it's interchangable with rept's function

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!