# Alphablending blues

This topic is 4880 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

In my AB algorithm, i update my source and dest pointers with each iteration. so they always point at the pixel they are now modifying. I noticed that when I have these lines in it, I get about 35 fps: src_color = *src_ptr; dst_color = *dst_ptr; When I remove those lines, it obviously doesn't alphablend anymore but I get 75 fps...


void Screen::alphaBlt(ddSurface& surf, RECT& src_rct, RECT& dst_rct, BYTE alpha)
{
int cx=0;
int cy=0;
int height = src_rct.bottom-src_rct.top;
int width = src_rct.right-src_rct.left;

DWORD dst_pitch, src_pitch;
COLORREF* dst_ptr, *src_ptr;
COLORREF src_color, dst_color;

// lock the surfaces
srfOffScreen.lock(&dst_rct, false, dst_pitch, dst_ptr);
surf.lock(&src_rct, true, src_pitch, src_ptr);

// adjust the pitch to not include the rect width
src_pitch -= width;
dst_pitch -= width;

int src_r, src_g, src_b, dst_r, dst_g, dst_b;
do
{
do
{

src_color = *src_ptr;
dst_color = *dst_ptr;

if (src_color == transcolor)
{
dst_ptr++;
src_ptr++;
cx++;
continue;
}

src_r = ( (src_color & 0xFF0000) >> 16) >> 1;
src_g = ( (src_color & 0x00FF00) >> 8 ) >>1;
src_b = ( (src_color & 0x0000FF)      )>>1;

dst_r = ( (dst_color & 0xFF0000) >> 16) >> 1;
dst_g = ( (dst_color & 0x00FF00) >> 8 ) >> 1;
dst_b = ( (dst_color & 0x0000FF)      ) >> 1;

*dst_ptr =  ( (src_r << 16) + (dst_r << 16) ) | ((src_g << 8) + (dst_g << 8)) | (src_b + dst_b);

dst_ptr++;
src_ptr++;
cx++;

} while (cx < width);

src_ptr += src_pitch;
dst_ptr += dst_pitch;

cx = 0;
cy++;
} while (cy < height);

// unlock surfaces
srfOffScreen.unlock(&dst_rct);
surf.unlock(&src_rct);

}


EDIT: My question is, why?! and how do i fix this? BTW, ignore the alpha parameter: it is not in use at the moment.

##### Share on other sites
Those are memory accesses nested inside two loops being executed every frame, they are obviously going to be a big chunk of time compared to the arithmetic your performing.

As for optimization its not really my area of expertise, but soomebody around here probably has a bag of tricks for ya.

##### Share on other sites
Besides the "you're doing lots of memory access inside a tight loop, and that's not going to be fast", there are a few things.

The compiler may notice you're never updating srccolor, destcolor, or using srcptr, effectively making your loops do almost nothing. It may optimize it to the point of a couple simple operations, precalculated before the loop (since srccolor never changes). The branch based on transcolor may be eliminated if you're not fetching new colors, and that's sure to speed things up a great deal.

You could try using MMX to speed things up.
You could use D3D or OpenGL hardware to speed up drawing.

You could replace large amounts of code with this:
*dstptr = (((src_color)>>1) & 0x7f7f7f7f) + (((*dst_color)>>1) & 0x7f7f7f7f);

You could try removing branches. This can be done by calculating the mixed color always, then using an asm "cmov" instruction to conditionally pick between mixed or unmixed colors. Perhaps some math on src_color and transcolor could give values of 0 or 1, which you can use to multiply the different results by. This would then let the code remain plain C instead of sticking in an asm keyword. Selecting PentiumPro instruction set, may also give the compiler access to CMOV.

##### Share on other sites
Thanks for that 50/50 technique, namethatnobodyelsetook. Thats way more efficient. However, even if I remove all the calculating and "if (src == trans)", it still remains the same speed. The problem is definately the memory access.

But how else could you do it without memory accesses inside the loops?

edit: I mean, my picture is only 100x80 pixels.. this just can't be right! Also, I tried the whole mmx using inline asm, and decided for what its worth, I am not going to use it. 1.) its ugly and nasty and 2.) I suck at asm too badly.

##### Share on other sites
You're reading from a surface in hardware?
This will always result in a major slowdown, even on a small part.

Things to make it faster:

Have a copy of that surface in system memory, do the blending there and copy the whole block once you're done.

Read the whole block from the surface into system memory, and follow the steps above.

##### Share on other sites
Awesome.. So how do I copy it into system mem? Blit? I certainly cannot use memcpy, as that would be just as horrible.

edit:
okay, I tried the blitting method, copying the src and dst segments to system surfaces, and then copying the result onto the backbuffer, and it runs faster. However, now the part that is bringing the code down is the 3 BltFast calls. The actual alphablending loop's time is pretty much irrelevent compared to these blts.

	//Blt Dst rect to tmp surface1	helpersurf1->BltFast(0, 0, backbuf, &dst_rct, DDBLTFAST_WAIT | DDBLTFAST_NOCOLORKEY);	//Blt Src rect to tmp surface2	helpersurf2->BltFast(0, 0, srcsrf, &src_rct, DDBLTFAST_WAIT | DDBLTFAST_NOCOLORKEY);	// lock the surfaces	helpersurf1->lock(NULL, false, dst_pitch, dst_ptr);	helpersurf2->lock(NULL, true, src_pitch, src_ptr);//...	// unlock surfaces	helpersurf1->unlock(NULL);	helpersurf2->unlock(NULL);	//reblit the outcome back to the dst	backbuf->BltFast(dst_rct.left, dst_rct.top, helpersurf1, &src_rct, DDBLTFAST_WAIT | DDBLTFAST_NOCOLORKEY );

And I created those helpersurf's at startup using system memory.

[Edited by - squicklid on May 12, 2005 2:39:08 AM]

##### Share on other sites
BAH! I did some further testing, and found that BltFast is not so fast with System Surfaces. So therefore, THERE IS NO WAY TO DO IT FAST.

METHOD 1:
Use video memory and lose speed due to memory access problems.

Method 2:
Use System memory and lose speed due to HEL slowness.

Any other options?

##### Share on other sites
Keep the destination surface in system memory from the start on. In that case, have a second target surface. Calculate the alpha blend and only blit the result into video memory.

Do as few reads from the backbuffer surface as possible. And memcpy would probably be faster than to get the data pixel per pixel.

1. 1
2. 2
Rutin
21
3. 3
4. 4
frob
16
5. 5

• 9
• 13
• 9
• 33
• 13
• ### Forum Statistics

• Total Topics
632593
• Total Posts
3007279

×