Faster Alpha Blending?

Started by
8 comments, last by C0D1F1ED 19 years, 6 months ago
I'm making a 2D engine (for PPC) and I'm trying to make alpha blending in the engine, but it's slow! I get something like 3 FPS if I use alpha (50 if not) the alpha is moving from: 0.0f invisible, 1.0f fully visible this is the code:

byte r= RFromColor(SourcePixel);
byte r2=RFromColor(DestPixel);

byte g= GFromColor(SourcePixel);
byte g2=GFromColor(DestPixel);

byte b= BFromColor(SourcePixel);
byte b2=BFromColor(DestPixel);

r=(byte)(r*m_fAlpha+0.5f);
g=(byte)(g*m_fAlpha+0.5f);
b=(byte)(b*m_fAlpha+0.5f);
				
r2=(byte)(r2*(1.0f-m_fAlpha)+0.5f);
g2=(byte)(g2*(1.0f-m_fAlpha)+0.5f);
b2=(byte)(b2*(1.0f-m_fAlpha)+0.5f);

*pDestPixel=RGB16(r+r2,g+g2,b+b2);


How appropriate, you fight like a cow!
Advertisement
You can try doing the multiply with fixed-point integer math. Say alpha goes from 0-255. r = (byte)((r * alpha) >> 8).

What's the reason for the + 0.5f on each line?
Floats are a big no-no on PPC, especially on per-pixel operations. The PPC has no FPU, so it's emulating all those floating point calculations. It looks like there's about 6 multiplies, 6 adds, and 3 subtractions PER PIXEL. That's a lot of floating point arithmetic for a little ARM processor to emulate.

Depending on how (or if) they're being inlined, all those helper functions might be a performance concern as well.

The previous poster has the right idea.
Read the MMX alpha blending tutorial in the Articles section of GameDev.net. It will show you how to work with several pixels simultaneously, thus saving a lot of cycles per pixel. I don't know the exact specifications of PPC though, but I'm sure you could at least find a data type large enough to do 2 pixels at once, I.E. 2 16-bit pixels in a 32 bit data block, or 2 8-bit pixels in a 16 bit block.

The articles will also show you how to turn the floating point maths into integer math, giving you 256 levels of alpha, from invisible to totally opaque. With these two methods combined, you should see a huge increase in perfomrance.

Good Luck.
JRA GameDev Website//Bad Maniac
PPC as in IBM PowerPC? This processor has a good FPU, but still, floating-point operations are quite slow. Using integers will be much faster.

Also, are you writing to graphics memory directly, or using system memory and then copying that (blitting)? The former method really makes blending slow, since graphics memory is -very- slow for read operations. That's because the AGP bus has very little bandwidth for reading, and the drivers have to do some slow synchronization when reading. So it's best to work with a buffer in system memory, do blending and everything else there, and then write to graphics memory when done.
You may find this piece of code useful:

http://www.stereopsis.com/doubleblend.html
Also look into AltiVec, the PPC's MMX equivalent.
Quote:Original post by uavfun
You can try doing the multiply with fixed-point integer math. Say alpha goes from 0-255. r = (byte)((r * alpha) >> 8).

What's the reason for the + 0.5f on each line?

I was just checking something, same speed without.

I'm abit confused...
what is MMX?
And thanks for all the suggestions.
How appropriate, you fight like a cow!
Quote:Original post by C0D1F1ED
PPC as in IBM PowerPC? This processor has a good FPU, but still, floating-point operations are quite slow. Using integers will be much faster.

Also, are you writing to graphics memory directly, or using system memory and then copying that (blitting)? The former method really makes blending slow, since graphics memory is -very- slow for read operations. That's because the AGP bus has very little bandwidth for reading, and the drivers have to do some slow synchronization when reading. So it's best to work with a buffer in system memory, do blending and everything else there, and then write to graphics memory when done.


I suspect he means "Pocket PC" rather than "Power PC", i.e. a PDA like an HP iPAQ. Those mostly use ARM CPUs without an FPU. Good illustration of the danger of abbreviating too much, to many people PPC is an abbreviation of Power PC.


Comments on alpha blending (mainly based on my experiences when doing stuff for unaccelerated PCs):

1) as mentioned, don't use floats they'll be emulated - that's slow.


2) the link memon provided is definately a good way to go.


3) Along similar lines, for 50% alpha you can use the following method:
// pre-compute this once somewhere during program init// redMask, greenMask and blueMask are the bit-masks for each// colour element in your pixel//u32 mask = redMask & (redMask << 1);mask |= greenMask & (greenmask << 1);mask |=	blueMask & (blueMask << 1);...// to blend://result = ((pixelA & mask) + (pixelB & mask)) >> 1;


You can do similar with some other common blend percentages.


4) Pre-compute as much as possible at program initialisation time and have multiple versions of your blend function hard-coded for each major pixel format, i.e. one for k565, another for k555 etc.


5) You can use pre-multiplied alpha to simplify the blend some more.


6) Use RLE or similar to store the alpha in your sprites so that you aren't blending or even touching the memory for the pixels which don't have alpha.


7) There are a few look up table based methods you can use - though make sure you account for the cache implications for your particular device.


8) Don't use alpha where you don't need to - for example instead of a full blend for say a hard edged drop shadow, just darken of fade the pixels directly.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

Quote:Original post by Ilankt
what is MMX?

It's an instruction set for x86 processors (Pentium/Athlon). It processes data in parallel. For example, it can do four 16-bit multiplications in the same time as one 32-bit integer multiplication. You have to write assembly code to use them.

PowerPC has a similar instruction set, called AltiVec.

This topic is closed to new replies.

Advertisement