MMX AlphaBlending code inside...
Hi All,
At the bottom of this emails is the AlphaBlending code I am trying to port to Delphi. I do not have any problems with the C syntax and porting that, but I have not experience with Assembler which is used extensively in this version. I have tried compiling the assembler in here with Delphi but it does not understand certains instructions.
Does anyone one how to port Microsoft assembler over so it will compile under Borland compilers like Delphi or C++ Builder?
I also had the same problem when I tried to compile the Quake 2 source code with C++ Builder. When it got to the inline assembler routines it did a back flip.
Anyway here is the C MMX AlphaBlending code...
If you know how to port the assembler parts please let me know.
Dominique
----- CODE FOLLOWS ----->
/*
* Description: Performs a blit operation while allowing for a variable alpha value,
* making use of MMX technology. The function uses black as a color key
* for the blit operation.
*
* Parameters: lpDDSDest - The destination surface of the blit.
*
* lpDDSSource - The source surface of the blit.
*
* iDestX - The horizontal coordinate to blit to on
* the destination surface.
*
* iDestY - The vertical coordinate to blit to on the
* destination surface.
*
* lprcSource - The address of a RECT structure that defines
* the upper-left and lower-right corners of the
* rectangle to blit from on the source surface.
*
* iAlpha - A value in the range from 0 to 256 that
* determines the opacity of the source.
*
* dwMode - One of the following predefined values:
* RGBMODE_555 - 16 bit mode ( 555 )
* RBGMODE_565 - 16 bit mode ( 565 )
* RGBMODE_16 - 16 bit mode ( unknown )
* RGBMODE_24 - 24 bit mode
* RGBMODE_32 - 32 bit mode
*
* Return value: The functions returns 0 to indicate success or -1 if the call fails.
*
*/
int BltAlphaMMX( LPDIRECTDRAWSURFACE7 lpDDSDest, LPDIRECTDRAWSURFACE7 lpDDSSource,
int iDestX, int iDestY, LPRECT lprcSource, int iAlpha, DWORD dwMode )
{
DDSURFACEDESC2 ddsdSource;
DDSURFACEDESC2 ddsdTarget;
RECT rcDest;
DWORD dwTargetPad;
DWORD dwSourcePad;
DWORD dwTargetTemp;
DWORD dwSourceTemp;
DWORD dwSrcRed, dwSrcGreen, dwSrcBlue;
DWORD dwTgtRed, dwTgtGreen, dwTgtBlue;
DWORD dwRed, dwGreen, dwBlue;
BYTE* lpbTarget;
BYTE* lpbSource;
__int64 i64MaskRed;
__int64 i64MaskGreen;
__int64 i64MaskBlue;
__int64 i64Alpha;
__int64 i64RdShift = 0;
__int64 i64GrShift = 0;
__int64 i64BlShift = 0;
__int64 i64Mask;
int iWidth;
int iHeight;
int iRemainder;
bool gOddWidth;
int iRet = 0;
int i;
//
// Enforce the lower limit for the alpha value.
//
if ( iAlpha < 0 )
iAlpha = 0;
//
// Enforce the upper limit for the alpha value.
//
if ( iAlpha > 256 )
iAlpha = 256;
//
// Determine the dimensions of the source surface.
//
if ( lprcSource )
{
//
// Get the width and height from the passed rectangle.
//
iWidth = lprcSource->right - lprcSource->left;
iHeight = lprcSource->bottom - lprcSource->top;
}
else
{
//
// Get the with and height from the surface description.
//
memset( &ddsdSource, 0, sizeof ddsdSource );
ddsdSource.dwSize = sizeof ddsdSource;
ddsdSource.dwFlags = DDSD_WIDTH | DDSD_HEIGHT;
lpDDSSource->GetSurfaceDesc( &ddsdSource );
//
// Remember the dimensions.
//
iWidth = ddsdSource.dwWidth;
iHeight = ddsdSource.dwHeight;
}
//
// Calculate the rectangle to be locked in the target.
//
rcDest.left = iDestX;
rcDest.top = iDestY;
rcDest.right = iDestX + iWidth;
rcDest.bottom = iDestY + iHeight;
//
// Lock down the destination surface.
//
memset( &ddsdTarget, 0, sizeof ddsdTarget );
ddsdTarget.dwSize = sizeof ddsdTarget;
lpDDSDest->Lock( &rcDest, &ddsdTarget, DDLOCK_WAIT, NULL );
//
// Lock down the source surface.
//
memset( &ddsdSource, 0, sizeof ddsdSource );
ddsdSource.dwSize = sizeof ddsdSource;
lpDDSSource->Lock( lprcSource, &ddsdSource, DDLOCK_WAIT, NULL );
switch ( dwMode )
{
/* 16 bit mode ( 555 ). This algorithm
can process four pixels at once. */
case RGBMODE_555:
//
// Determine the padding bytes for the target and the source.
//
dwTargetPad = ddsdTarget.lPitch - ( iWidth * 2 );
dwSourcePad = ddsdSource.lPitch - ( iWidth * 2 );
//
// We process four pixels at once, so the
// width must be a multiple of four.
//
iRemainder = ( iWidth & 0x03 );
iWidth = ( iWidth & ~0x03 ) / 4;
//
// Set the bit masks for red, green and blue.
//
i64MaskRed = 0x7c007c007c007c00;
i64MaskGreen =0x03e003e003e003e0;
i64MaskBlue = 0x001f001f001f001f;
//
// Compose the quadruple alpha value.
//
i64Alpha = iAlpha;
i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ) | ( i64Alpha << 48 );
// Get the address of the target.
lpbTarget = ( BYTE* ) ddsdTarget.lpSurface;
// Get the address of the source.
lpbSource = ( BYTE* ) ddsdSource.lpSurface;
do
{
// Reset the width.
i = iWidth;
//
// Alpha-blend four pixels at once.
//
__asm
{
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip555
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend555:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black555
cmp dword ptr [esi + 4], 0
jnz not_black555
jmp next555
not_black555:
//
// Alpha blend four target and source pixels.
//
/* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
/* Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. */
movq mm6, [edi] // Load the original target pixel.
nop
movq mm7, [esi] // Load the original source pixel.
movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
psrlw mm0, 10 // Shift down the red target channel.
movq mm2, mm6 // Load the register for the green target.
psrlw mm1, 10 // Shift down the red source channel.
movq mm3, mm7 // Load the register for the green source.
psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop
pand mm2, i64MaskGreen // Extract the green target channel.
nop
pand mm3, i64MaskGreen // Extract the green source channel.
psraw mm1, 8 // Divide the red result by 256.
psrlw mm2, 5 // Shift down the green target channel.
paddw mm1, mm0 // Add the red target to the red result.
psllw mm1, 10 // Shift up the red source again.
movq mm4, mm6 // Load the register for the blue target.
psrlw mm3, 5 // Shift down the green source channel.
movq mm5, mm7 // Load the register for the blue source.
pand mm4, i64MaskBlue // Extract the blue target channel.
psubw mm3, mm2 // Calculate green source minus green target.
pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
psubw mm5, mm4 // Calculate blue source minus blue target.
pxor mm0, mm0 // Create black as the color key.
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
psraw mm3, 8 // Divide the green result by 256.
paddw mm3, mm2 // Add the green target to the green result.
pcmpeqw mm0, mm7 // Create a color key mask.
psraw mm5, 8 // Divide the blue result by 256.
psllw mm3, 5 // Shift up the green source again.
paddw mm5, mm4 // Add the blue target to the blue result.
por mm1, mm3 // Combine the new red and green values.
pand mm6, mm0 // Keep old target where the color key applies.
por mm1, mm5 // Combine new blue value with the others.
pandn mm0, mm1 // Keep new target where no color key applies.
por mm6, mm0 // Assemble new target value.
movq [edi], mm6 // Write back new target value.
next555:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend555
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
emms
skip555:
pop ecx
}
//
// Alpha blend any remaining pixels.
//
for ( i = 0; i < iRemainder; i++ )
{
// Read in one source pixel.
dwSourceTemp = *( ( WORD* ) lpbSource );
// If this is not the color key ...
if ( dwSourceTemp != 0 )
{
//
// ... apply the alpha blend to it.
//
// Read in one target pixel.
dwTargetTemp = *( ( WORD* ) lpbTarget );
// Extract the red channels.
dwTgtRed = ( dwTargetTemp >> 10 ) & 0x1f;
dwSrcRed = ( dwSourceTemp >> 10 ) & 0x1f;
// Extract the green channels.
dwTgtGreen = ( dwTargetTemp >> 5 ) & 0x1f;
dwSrcGreen = ( dwSourceTemp >> 5 ) & 0x1f;
// Extract the blue channels.
dwTgtBlue = dwTargetTemp & 0x1f;
dwSrcBlue = dwSourceTemp & 0x1f;
// Write the destination pixel.
*( ( WORD* ) lpbTarget ) = ( WORD )
( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) >> 8 ) + dwTgtRed ) << 10 |
( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) >> 8 ) + dwTgtGreen ) << 5 |
( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) >> 8 ) + dwTgtBlue ) );
}
//
// Proceed to next pixel.
//
lpbTarget += 2;
lpbSource += 2;
}
//
// Proceed to the next line.
//
lpbTarget += dwTargetPad;
lpbSource += dwSourcePad;
}
while ( --iHeight > 0 );
break;
/* 16 bit mode ( 565 ). This algorithm
can process four pixels at once. */
case RGBMODE_565:
//
// Determine the padding bytes for the target and the source.
//
dwTargetPad = ddsdTarget.lPitch - ( iWidth * 2 );
dwSourcePad = ddsdSource.lPitch - ( iWidth * 2 );
//
// We process four pixels at once, so the
// width must be a multiple of four.
//
iRemainder = ( iWidth & 0x03 );
iWidth = ( iWidth & ~0x03 ) / 4;
//
// Set the bit masks for red, green and blue.
//
i64MaskRed = 0xf800f800f800f800;
i64MaskGreen =0x07e007e007e007e0;
i64MaskBlue = 0x001f001f001f001f;
//
// Compose the quadruple alpha value.
//
i64Alpha = iAlpha;
i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ) | ( i64Alpha << 48 );
// Get the address of the target.
lpbTarget = ( BYTE* ) ddsdTarget.lpSurface;
// Get the address of the source.
lpbSource = ( BYTE* ) ddsdSource.lpSurface;
do
{
// Reset the width.
i = iWidth;
//
// Alpha-blend four pixels at once.
//
__asm
{
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip565
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend565:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black565
cmp dword ptr [esi + 4], 0
jnz not_black565
jmp next565
not_black565:
//
// Alpha blend four target and source pixels.
//
/* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
/* Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. */
movq mm6, [edi] // Load the original target pixel.
nop
movq mm7, [esi] // Load the original source pixel.
movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
psrlw mm0, 11 // Shift down the red target channel.
movq mm2, mm6 // Load the register for the green target.
psrlw mm1, 11 // Shift down the red source channel.
movq mm3, mm7 // Load the register for the green source.
psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop
pand mm2, i64MaskGreen // Extract the green target channel.
nop
pand mm3, i64MaskGreen // Extract the green source channel.
psraw mm1, 8 // Divide the red result by 256.
psrlw mm2, 5 // Shift down the green target channel.
paddw mm1, mm0 // Add the red target to the red result.
psllw mm1, 11 // Shift up the red source again.
movq mm4, mm6 // Load the register for the blue target.
psrlw mm3, 5 // Shift down the green source channel.
movq mm5, mm7 // Load the register for the blue source.
pand mm4, i64MaskBlue // Extract the blue target channel.
psubw mm3, mm2 // Calculate green source minus green target.
pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
psubw mm5, mm4 // Calculate blue source minus blue target.
pxor mm0, mm0 // Create black as the color key.
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
psraw mm3, 8 // Divide the green result by 256.
paddw mm3, mm2 // Add the green target to the green result.
pcmpeqw mm0, mm7 // Create a color key mask.
psraw mm5, 8 // Divide the blue result by 256.
psllw mm3, 5 // Shift up the green source again.
paddw mm5, mm4 // Add the blue target to the blue result.
por mm1, mm3 // Combine the new red and green values.
pand mm6, mm0 // Keep old target where the color key applies.
por mm1, mm5 // Combine new blue value with the others.
pandn mm0, mm1 // Keep new target where no color key applies.
por mm6, mm0 // Assemble new target value.
movq [edi], mm6 // Write back new target value.
next565:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend565
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
emms
skip565:
pop ecx
}
//
// Alpha blend any remaining pixels.
//
for ( i = 0; i < iRemainder; i++ )
{
// Read in one source pixel.
dwSourceTemp = *( ( WORD* ) lpbSource );
// If this is not the color key ...
if ( dwSourceTemp != 0 )
{
//
// ... apply the alpha blend to it.
//
// Read in one target pixel.
dwTargetTemp = *( ( WORD* ) lpbTarget );
// Extract the red channels.
dwTgtRed = ( dwTargetTemp >> 11 ) & 0x1f;
dwSrcRed = ( dwSourceTemp >> 11 ) & 0x1f;
// Extract the green channels.
dwTgtGreen = ( dwTargetTemp >> 5 ) & 0x3f;
dwSrcGreen = ( dwSourceTemp >> 5 ) & 0x3f;
// Extract the blue channels.
dwTgtBlue = dwTargetTemp & 0x1f;
dwSrcBlue = dwSourceTemp & 0x1f;
// Write the destination pixel.
*( ( WORD* ) lpbTarget ) = ( WORD )
( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) >> 8 ) + dwTgtRed ) << 11 |
( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) >> 8 ) + dwTgtGreen ) << 5 |
( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) >> 8 ) + dwTgtBlue ) );
}
//
// Proceed to next pixel.
//
lpbTarget += 2;
lpbSource += 2;
}
//
// Proceed to the next line.
//
lpbTarget += dwTargetPad;
lpbSource += dwSourcePad;
}
while ( --iHeight > 0 );
break;
/* 16 bit mode ( unknown ). This algorithm
can process four pixels at once. */
case RGBMODE_16:
//
// Determine the padding bytes for the target and the source.
//
dwTargetPad = ddsdTarget.lPitch - ( iWidth * 2 );
dwSourcePad = ddsdSource.lPitch - ( iWidth * 2 );
//
// We process four pixels at once, so the
// width must be a multiple of four.
//
iRemainder = ( iWidth & 0x03 );
iWidth = ( iWidth & ~0x03 ) / 4;
//
// Determine the distance in bits of each bit mask from the right.
//
while ( ( ( ddsdTarget.ddpfPixelFormat.dwRBitMask >> i64RdShift ) & 0x01 ) == 0 )
i64RdShift++;
while ( ( ( ddsdTarget.ddpfPixelFormat.dwGBitMask >> i64GrShift ) & 0x01 ) == 0 )
i64GrShift++;
while ( ( ( ddsdTarget.ddpfPixelFormat.dwBBitMask >> i64BlShift ) & 0x01 ) == 0 )
i64BlShift++;
//
// Compose the bit masks for each color channel.
//
i64MaskRed = ddsdTarget.ddpfPixelFormat.dwRBitMask;
i64MaskRed |= ( i64MaskRed << 16 ) | ( i64MaskRed << 32 ) | ( i64MaskRed << 48 );
i64MaskGreen = ddsdTarget.ddpfPixelFormat.dwGBitMask;
i64MaskGreen |= ( i64MaskGreen << 16 ) | ( i64MaskGreen << 32 ) | ( i64MaskGreen << 48 );
i64MaskBlue = ddsdTarget.ddpfPixelFormat.dwBBitMask;
i64MaskBlue |= ( i64MaskBlue << 16 ) | ( i64MaskBlue << 32 ) | ( i64MaskBlue << 48 );
//
// Compose the quadruple alpha value.
//
i64Alpha = iAlpha;
i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ) | ( i64Alpha << 48 );
// Get the address of the target.
lpbTarget = ( BYTE* ) ddsdTarget.lpSurface;
// Get the address of the source.
lpbSource = ( BYTE* ) ddsdSource.lpSurface;
do
{
// Reset the width.
i = iWidth;
//
// Alpha-blend four pixels at once.
//
__asm
{
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip16
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend16:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black16
cmp dword ptr [esi + 4], 0
jnz not_black16
jmp next16
not_black16:
//
// Alpha blend four target and source pixels.
//
/* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
/* Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. */
movq mm6, [edi] // Load the original target pixel.
nop
movq mm7, [esi] // Load the original source pixel.
movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
nop
psrlw mm0, i64RdShift // Shift down the red target channel.
movq mm2, mm6 // Load the register for the green target.
psrlw mm1, i64RdShift // Shift down the red source channel.
movq mm3, mm7 // Load the register for the green source.
pand mm2, i64MaskGreen // Extract the green target channel.
psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
movq mm5, mm7 // Load the register for the blue source.
pand mm3, i64MaskGreen // Extract the green source channel.
movq mm4, mm6 // Load the register for the blue target.
psraw mm1, 8 // Divide the red result by 256.
nop
paddw mm1, mm0 // Add the red target to the red result.
nop
psllw mm1, i64RdShift // Shift up the red source again.
pxor mm0, mm0 // Create black as the color key.
psrlw mm2, i64GrShift // Shift down the green target channel.
pcmpeqw mm0, mm7 // Create a color key mask.
psrlw mm3, i64GrShift // Shift down the green source channel.
pand mm6, mm0 // Keep old target where the color key applies.
pand mm4, i64MaskBlue // Extract the blue target channel.
psubw mm3, mm2 // Calculate green source minus green target.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
nop
psrlw mm4, i64BlShift // Shift down the blue source channel.
nop
pand mm5, i64MaskBlue // Extract the blue source channel.
psraw mm3, 8 // Divide the green result by 256.
paddw mm3, mm2 // Add the green target to the green result.
nop
psllw mm3, i64GrShift // Shift up the green source again.
por mm1, mm3 // Combine the new red and green values.
psrlw mm5, i64BlShift // Divide the blue result by 256.
nop
psubw mm5, mm4 // Add the blue target to the blue result.
nop
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
nop
nop
nop
psraw mm5, 8 // Divide the blue result by 256.
nop
paddw mm5, mm4 // Add the blue target to the blue result.
nop
psllw mm5, i64BlShift // Shift up the blue source again.
nop
por mm1, mm5 // Combine new blue value with the others.
nop
pandn mm0, mm1 // Keep new target where no color key applies.
nop
por mm6, mm0 // Assemble new target value.
nop
movq [edi], mm6 // Write back new target value.
next16:
//
// Advance to the next two pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend16
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
emms
skip16:
pop ecx
}
//
// Alpha blend any remaining pixels.
//
for ( i = 0; i < iRemainder; i++ )
{
// Read in the next source pixel.
dwSourceTemp = *( ( WORD* ) lpbSource );
// If the source pixel is not black ...
if ( dwSourceTemp != 0 )
{
// ... read in the next target pixel.
dwTargetTemp = *( ( WORD* ) lpbTarget );
// Extract the red channels.
dwTgtRed = dwTargetTemp & ddsdTarget.ddpfPixelFormat.dwRBitMask;
dwSrcRed = dwSourceTemp & ddsdSource.ddpfPixelFormat.dwRBitMask;
// Extract the green channels.
dwTgtGreen = dwTargetTemp & ddsdTarget.ddpfPixelFormat.dwGBitMask;
dwSrcGreen = dwSourceTemp & ddsdSource.ddpfPixelFormat.dwGBitMask;
// Extract the blue channel.
dwTgtBlue = dwTargetTemp & ddsdTarget.ddpfPixelFormat.dwBBitMask;
dwSrcBlue = dwSourceTemp & ddsdSource.ddpfPixelFormat.dwBBitMask;
// Calculate the alpha-blended red channel.
dwRed = ( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) ) >> 8 ) +
dwTgtRed ) & ddsdTarget.ddpfPixelFormat.dwRBitMask;
// Calculate the alpha-blended green channel.
dwGreen = ( ( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) ) >> 8 ) +
dwTgtGreen ) & ddsdTarget.ddpfPixelFormat.dwGBitMask;
// Calculate the alpha-blended blue channel.
dwBlue = ( ( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) ) >> 8 ) +
dwTgtBlue ) & ddsdTarget.ddpfPixelFormat.dwBBitMask;
// Write the destination pixel.
*( ( WORD* ) lpbTarget ) = ( WORD ) ( dwRed | dwGreen | dwBlue );
}
//
// Proceed to next pixel.
//
lpbTarget += 2;
lpbSource += 2;
}
//
// Proceed to the next line.
//
lpbTarget += dwTargetPad;
lpbSource += dwSourcePad;
}
while ( --iHeight > 0 );
break;
/* 24 bit mode. */
case RGBMODE_24:
//
// Determine the padding bytes for the target and the source.
//
dwTargetPad = ddsdTarget.lPitch - ( iWidth * 3 );
dwSourcePad = ddsdSource.lPitch - ( iWidth * 3 );
//
// Compose the triple alpha value.
//
i64Alpha = iAlpha;
i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 );
// Create a general purpose mask.
i64Mask = 0x0000000000ffffff;
// Get the address of the target.
lpbTarget = ( BYTE* ) ddsdTarget.lpSurface;
// Get the address of the source.
lpbSource = ( BYTE* ) ddsdSource.lpSurface;
do
{
//
// Alpha blend the pixels in the current row.
//
__asm
{
// Reset the width counter.
push ecx
mov ecx, iWidth
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
// Load the mask into an mmx register.
movq mm3, i64Mask
// Load the alpha value into an mmx register.
movq mm5, i64Alpha
// Clear an mmx register to facilitate unpacking.
pxor mm6, mm6
push eax
do_blend24:
//
// Skip this pixel if it is black.
//
mov eax, [esi]
test eax, 00ffffffh // Do not 'and' so that the high order byte is kept.
jnz not_black24
jmp next24
not_black24:
//
// Get a target and a source pixel.
//
/* The mmx registers will basically be used in the following way:
mm0: target value
mm1: source value
mm2: working register
mm3: mask ( 0x00ffffff )
mm4: working register
mm5: alpha value
mm6: zero for unpacking
mm7: original target
/* Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. */
movd mm0, [edi] // Load the target pixel.
movq mm4, mm3 // Reload the mask ( 0x00ffffff ).
movd mm1, eax // Load the source pixel.
movq mm7, mm0 // Save the target pixel.
punpcklbw mm0, mm6 // Unpack the target pixel.
punpcklbw mm1, mm6 // Unpack the source pixel.
movq mm2, mm0 // Save the unpacked target values.
nop
pmullw mm0, mm5 // Multiply the target with the alpha value.
nop
pmullw mm1, mm5 // Multiply the source with the alpha value.
nop
psrlw mm0, 8 // Divide the target by 256.
nop
psrlw mm1, 8 // Divide the source by 256.
nop
psubw mm1, mm0 // Calculate the source minus target.
nop
paddw mm2, mm1 // Add former target value to the result.
nop
packuswb mm2, mm2 // Pack the new target.
nop
pand mm2, mm4 // Mask of unwanted bytes.
nop
pandn mm4, mm7 // Get the high order byte we must keep.
nop
por mm2, mm4 // Assemble the value to write back.
nop
movd [edi], mm2 // Write back the new value.
next24:
//
// Advance to the next pixel.
//
add edi, 3
add esi, 3
//
// Loop again or break.
//
dec ecx
jnz do_blend24
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
emms
pop ecx
}
//
// Proceed to the next line.
//
lpbTarget += dwTargetPad;
lpbSource += dwSourcePad;
}
while ( --iHeight > 0 );
break;
/* 32 bit mode. This algorithm can
process two pixels at once. */
case RGBMODE_32:
//
// Determine the padding bytes for the target and the source.
//
dwTargetPad = ddsdTarget.lPitch - ( iWidth * 4 );
dwSourcePad = ddsdSource.lPitch - ( iWidth * 4 );
// If the width is odd ...
if ( iWidth & 0x01 )
{
// ... set the flag ...
gOddWidth = true;
// ... and calculate the width.
iWidth = ( iWidth - 1 ) / 2;
}
// If the width is even ...
else
{
// ... clear the flag ...
gOddWidth = false;
// ... and calculate the width.
iWidth /= 2;
}
//
// Compose the triple alpha value.
//
i64Alpha = iAlpha;
i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 );
// Get the address of the target.
lpbTarget = ( BYTE* ) ddsdTarget.lpSurface;
// Get the address of the source.
lpbSource = ( BYTE* ) ddsdSource.lpSurface;
do
{
// Reset the width.
i = iWidth;
//
// Alpha blend two pixels at once.
//
__asm
{
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip32
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
// Load the alpha value into an mmx register.
movq mm5, i64Alpha
push eax
do_blend32:
//
// Skip these two pixels if they are both black.
//
mov eax, [esi]
test eax, 00ffffffh
jnz not_black32
mov eax, [esi + 4]
test eax, 00ffffffh
jnz not_black32
jmp next32
not_black32:
//
// Alpha blend two target and two source pixels.
//
/* The mmx registers will basically be used in the following way:
mm0: target pixel one
mm1: source pixel one
mm2: target pixel two
mm3: source pixel two
mm4: working register
mm5: alpha value
mm6: original target
mm7: original source
/* Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. */
movq mm6, [edi] // Load the target pixels.
pxor mm4, mm4 // Clear mm4 so we can unpack easily.
movq mm7, [esi] // Load the source pixels.
movq mm0, mm6 // Create copy one of the target.
movq mm2, mm6 // Create copy two of the target.
punpcklbw mm0, mm4 // Unpack the first target copy.
movq mm1, mm7 // Create copy one of the source.
psrlq mm2, 32 // Move the high order dword of target two.
punpcklbw mm2, mm4 // Unpack the second target copy.
movq mm3, mm7 // Create copy two of the source.
psrlq mm3, 32 // Move the high order dword of source two.
punpcklbw mm1, mm4 // Unpack the first source copy.
punpcklbw mm3, mm4 // Unpack the second source copy.
pslld mm7, 8 // Shift away original source highest bytes.
movq mm4, mm0 // Save target one.
pmullw mm0, mm5 // Multiply target one with alpha.
pmullw mm1, mm5 // Multiply source one with alpha.
psrld mm7, 8 // Complete high order byte clearance.
psrlw mm1, 8 // Divide source one by 256.
nop
psrlw mm0, 8 // Divide target one by 256.
nop
psubw mm1, mm0 // Calculate source one minus target one.
nop
paddw mm1, mm4 // Add the former target one to the result.
nop
movq mm4, mm2 // Save target two.
pmullw mm2, mm5 // Multiply target two with alpha.
pmullw mm3, mm5 // Multiply source two with alpha.
nop
psrlw mm2, 8 // Divide target two by 256.
nop
psrlw mm3, 8 // Divide source two by 256.
nop
psubw mm3, mm2 // Calculate source two minus source one.
nop
paddw mm3, mm4 // Add the former target two to the result.
pxor mm4, mm4 // Clear mm4 so we can pack easily.
packuswb mm1, mm4 // Pack the new target one.
packuswb mm3, mm4 // Pack the new target two.
psllq mm3, 32 // Shift up the new target two.
pcmpeqd mm4, mm7 // Create a color key mask.
por mm1, mm3 // Combine the new targets.
pand mm6, mm4 // Keep old target where color key applies.
pandn mm4, mm1 // Clear new target where color key applies.
nop
por mm6, mm4 // Assemble the new target value.
nop
movq [edi], mm6 // Write back the new target value.
next32:
//
// Advance to the next pixel.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend32
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
emms
skip32:
pop ecx
}
//
// Handle an odd width.
//
if ( gOddWidth )
{
// Read in the next source pixel.
dwSourceTemp = *( ( DWORD* ) lpbSource );
// If the source pixel is not black ...
if ( ( dwSourceTemp & 0xffffff ) != 0 )
{
// ... read in the next target pixel.
dwTargetTemp = *( ( DWORD* ) lpbTarget );
// Extract the red channels.
dwTgtRed = dwTargetTemp & 0xff0000;
dwSrcRed = dwSourceTemp & 0xff0000;
// Extract the green channels.
dwTgtGreen = dwTargetTemp & 0xff00;
dwSrcGreen = dwSourceTemp & 0xff00;
// Extract the blue channel.
dwTgtBlue = dwTargetTemp & 0xff;
dwSrcBlue = dwSourceTemp & 0xff;
// Calculate the destination pixel.
dwTargetTemp =
( ( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) >> 8 ) + dwTgtRed ) & 0xff0000 ) |
( ( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) >> 8 ) + dwTgtGreen ) & 0xff00 ) |
( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) >> 8 ) + dwTgtBlue ) );
// Write the destination pixel.
*( ( DWORD* ) lpbTarget ) = dwTargetTemp;
}
//
// Proceed to the next pixel.
//
lpbTarget += 4;
lpbSource += 4;
}
//
// Proceed to the next line.
//
lpbTarget += dwTargetPad;
lpbSource += dwSourcePad;
}
while ( --iHeight > 0 );
break;
/* Invalid mode. */
default:
iRet = -1;
}
// Unlock the target surface.
lpDDSDest->Unlock( &rcDest );
// Unlock the source surface.
lpDDSSource->Unlock( lprcSource );
// Return the result.
return iRet;
}
PS. If anyone wants the non MMX versions of this routine please let me know and I will send it to you. It works will with IDirect Draw7 surfaces.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
Edited by - savage on January 16, 2002 12:15:16 PM
If I remember right, the in-line assembler doesn''t like MMX instructions (prior to Delphi 6, anyway). Try using this expert, which converts MMX instructions to their op-code equivalent. Maybe it''ll fix the errors:
Hori''s MMX Expert
Alimonster
There are no stupid questions, but there are a lot of inquisitive idiots.
Hori''s MMX Expert
Alimonster
There are no stupid questions, but there are a lot of inquisitive idiots.
HI AliMonster,
Thanks for the suggestion. I tried it and unfortunately it did not do anything to the code I highlighter and pressed Ctrl+A.
Any other suggestions?
Thanks,
Dominique.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
Thanks for the suggestion. I tried it and unfortunately it did not do anything to the code I highlighter and pressed Ctrl+A.
Any other suggestions?
Thanks,
Dominique.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
First of all, Delphi6 should be able to compile MMX instructions. For lower versions you do have to convert MMX routines into machine code.
And finally, as alternate way, you may try compiling DLL in your C/C++ compiler and using it in Delphi.
Hope this helps
- Lifepower
And finally, as alternate way, you may try compiling DLL in your C/C++ compiler and using it in Delphi.
Hope this helps
- Lifepower
Hi LifePower,
As I am using Delphi 4 and 5, It looks like I will have to go fro the "convert MMX routines into machine code" option as I do not want to add another DLL into the mix.
Thanks for your suggestions it looks like I will have to trawl through the Intel site this week-end.
Thanks
Dominique.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
As I am using Delphi 4 and 5, It looks like I will have to go fro the "convert MMX routines into machine code" option as I do not want to add another DLL into the mix.
Thanks for your suggestions it looks like I will have to trawl through the Intel site this week-end.
Thanks
Dominique.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
Greetz... anyway, just wanted to add that Hori''s MMX Expert is actually converting MMX instructions into the given machine code. For the conversion task you''d have either to use the utility mentioned above or look for another converter... otherwise it''s not very efficient (in terms of time) to convert your program into machine code.
Hi LifePower,
How does Hori''s Expert work? Because when I highlight the MMX code and press Ctrl+A it does not do anything for me in Delphi 4.
Thanks,
Dominique.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
How does Hori''s Expert work? Because when I highlight the MMX code and press Ctrl+A it does not do anything for me in Delphi 4.
Thanks,
Dominique.
http://www.DelphiGamer.com := for all your Object Pascal game development needs;
Yep, I got Hori''s expert to work just fine (Delphi 4, Standard edition). I started up Delphi 4, went to the "Install Packages" menu and added the .bpl for Delphi 4. The menu worked fine. That''s not much use to you, but this will be... here''s the translation for the asm blocks (I stripped out the C, so you''ll have to use the comments to guide you. It''s in the same order as your post).
---------START OF CODE
//
// Alpha-blend four pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip555
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend555:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black555
cmp dword ptr [esi + 4], 0
jnz not_black555
jmp next555
not_black555:
//
// Alpha blend four target and source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop
db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
db $0F,$71,$D0,$0A /// psrlw mm0, 10 // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
db $0F,$71,$D1,$0A /// psrlw mm1, 10 // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop
pand mm2, i64MaskGreen // Extract the green target channel.
nop
pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
db $0F,$71,$D2,$05 /// psrlw mm2, 5 // Shift down the green target channel.
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
db $0F,$71,$F1,$0A /// psllw mm1, 10 // Shift up the red source again.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.
db $0F,$71,$D3,$05 /// psrlw mm3, 5 // Shift down the green source channel.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.
pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.
pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
db $0F,$F9,$EC /// psubw mm5, mm4 // Calculate blue source minus blue target.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.
db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.
db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
db $0F,$71,$F3,$05 /// psllw mm3, 5 // Shift up the green source again.
db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.
next555:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend555
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms
skip555:
pop ecx
end;
//
// Alpha-blend four pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip565
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend565:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black565
cmp dword ptr [esi + 4], 0
jnz not_black565
jmp next565
not_black565:
//
// Alpha blend four target and source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop
db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
db $0F,$71,$D0,$0B /// psrlw mm0, 11 // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
db $0F,$71,$D1,$0B /// psrlw mm1, 11 // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop
pand mm2, i64MaskGreen // Extract the green target channel.
nop
pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
db $0F,$71,$D2,$05 /// psrlw mm2, 5 // Shift down the green target channel.
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
db $0F,$71,$F1,$0B /// psllw mm1, 11 // Shift up the red source again.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.
db $0F,$71,$D3,$05 /// psrlw mm3, 5 // Shift down the green source channel.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.
pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.
pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
db $0F,$F9,$EC /// psubw mm5, mm4 // Calculate blue source minus blue target.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.
db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.
db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
db $0F,$71,$F3,$05 /// psllw mm3, 5 // Shift up the green source again.
db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.
next565:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend565
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms
skip565:
pop ecx
end;
//
// Alpha-blend four pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip16
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend16:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black16
cmp dword ptr [esi + 4], 0
jnz not_black16
jmp next16
not_black16:
//
// Alpha blend four target and source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop
db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
nop
psrlw mm0, i64RdShift // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
psrlw mm1, i64RdShift // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
pand mm2, i64MaskGreen // Extract the green target channel.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.
pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
nop
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
nop
psllw mm1, i64RdShift // Shift up the red source again.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.
psrlw mm2, i64GrShift // Shift down the green target channel.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.
psrlw mm3, i64GrShift // Shift down the green source channel.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
nop
psrlw mm4, i64BlShift // Shift down the blue source channel.
nop
pand mm5, i64MaskBlue // Extract the blue source channel.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.
db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
nop
psllw mm3, i64GrShift // Shift up the green source again.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.
psrlw mm5, i64BlShift // Divide the blue result by 256.
nop
db $0F,$F9,$EC /// psubw mm5, mm4 // Add the blue target to the blue result.
nop
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
nop
nop
nop
db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
nop
db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
nop
psllw mm5, i64BlShift // Shift up the blue source again.
nop
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
nop
db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
nop
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
nop
db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.
next16:
//
// Advance to the next two pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend16
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms
skip16:
pop ecx
end;
//
// Alpha blend the pixels in the current row.
//
asm
// Reset the width counter.
push ecx
mov ecx, iWidth
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
// Load the mask into an mmx register.
movq mm3, i64Mask
// Load the alpha value into an mmx register.
movq mm5, i64Alpha
// Clear an mmx register to facilitate unpacking.
db $0F,$EF,$F6 /// pxor mm6, mm6
push eax
do_blend24:
//
// Skip this pixel if it is black.
//
mov eax, [esi]
test eax, 00ffffffh // Do not ''and'' so that the high order byte is kept.
jnz not_black24
jmp next24
not_black24:
//
// Get a target and a source pixel.
//
(* The mmx registers will basically be used in the following way:
mm0: target value
mm1: source value
mm2: working register
mm3: mask ( 0x00ffffff )
mm4: working register
mm5: alpha value
mm6: zero for unpacking
mm7: original target
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6E,$07 /// movd mm0, [edi] // Load the target pixel.
db $0F,$6F,$E3 /// movq mm4, mm3 // Reload the mask ( 0x00ffffff ).
db $0F,$6E,$C8 /// movd mm1, eax // Load the source pixel.
db $0F,$6F,$F8 /// movq mm7, mm0 // Save the target pixel.
db $0F,$60,$C6 /// punpcklbw mm0, mm6 // Unpack the target pixel.
db $0F,$60,$CE /// punpcklbw mm1, mm6 // Unpack the source pixel.
db $0F,$6F,$D0 /// movq mm2, mm0 // Save the unpacked target values.
nop
db $0F,$D5,$C5 /// pmullw mm0, mm5 // Multiply the target with the alpha value.
nop
db $0F,$D5,$CD /// pmullw mm1, mm5 // Multiply the source with the alpha value.
nop
db $0F,$71,$D0,$08 /// psrlw mm0, 8 // Divide the target by 256.
nop
db $0F,$71,$D1,$08 /// psrlw mm1, 8 // Divide the source by 256.
nop
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate the source minus target.
nop
db $0F,$FD,$D1 /// paddw mm2, mm1 // Add former target value to the result.
nop
db $0F,$67,$D2 /// packuswb mm2, mm2 // Pack the new target.
nop
db $0F,$DB,$D4 /// pand mm2, mm4 // Mask of unwanted bytes.
nop
db $0F,$DF,$E7 /// pandn mm4, mm7 // Get the high order byte we must keep.
nop
db $0F,$EB,$D4 /// por mm2, mm4 // Assemble the value to write back.
nop
db $0F,$7E,$17 /// movd [edi], mm2 // Write back the new value.
next24:
//
// Advance to the next pixel.
//
add edi, 3
add esi, 3
//
// Loop again or break.
//
dec ecx
jnz do_blend24
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
db $0F,$77 /// emms
pop ecx
end;
//
// Alpha blend two pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip32
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
// Load the alpha value into an mmx register.
movq mm5, i64Alpha
push eax
do_blend32:
//
// Skip these two pixels if they are both black.
//
mov eax, [esi]
test eax, 00ffffffh
jnz not_black32
mov eax, [esi + 4]
test eax, 00ffffffh
jnz not_black32
jmp next32
not_black32:
//
// Alpha blend two target and two source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: target pixel one
mm1: source pixel one
mm2: target pixel two
mm3: source pixel two
mm4: working register
mm5: alpha value
mm6: original target
mm7: original source
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the target pixels.
db $0F,$EF,$E4 /// pxor mm4, mm4 // Clear mm4 so we can unpack easily.
db $0F,$6F,$3E /// movq mm7, [esi] // Load the source pixels.
db $0F,$6F,$C6 /// movq mm0, mm6 // Create copy one of the target.
db $0F,$6F,$D6 /// movq mm2, mm6 // Create copy two of the target.
db $0F,$60,$C4 /// punpcklbw mm0, mm4 // Unpack the first target copy.
db $0F,$6F,$CF /// movq mm1, mm7 // Create copy one of the source.
db $0F,$73,$D2,$20 /// psrlq mm2, 32 // Move the high order dword of target two.
db $0F,$60,$D4 /// punpcklbw mm2, mm4 // Unpack the second target copy.
db $0F,$6F,$DF /// movq mm3, mm7 // Create copy two of the source.
db $0F,$73,$D3,$20 /// psrlq mm3, 32 // Move the high order dword of source two.
db $0F,$60,$CC /// punpcklbw mm1, mm4 // Unpack the first source copy.
db $0F,$60,$DC /// punpcklbw mm3, mm4 // Unpack the second source copy.
db $0F,$72,$F7,$08 /// pslld mm7, 8 // Shift away original source highest bytes.
db $0F,$6F,$E0 /// movq mm4, mm0 // Save target one.
db $0F,$D5,$C5 /// pmullw mm0, mm5 // Multiply target one with alpha.
db $0F,$D5,$CD /// pmullw mm1, mm5 // Multiply source one with alpha.
db $0F,$72,$D7,$08 /// psrld mm7, 8 // Complete high order byte clearance.
db $0F,$71,$D1,$08 /// psrlw mm1, 8 // Divide source one by 256.
nop
db $0F,$71,$D0,$08 /// psrlw mm0, 8 // Divide target one by 256.
nop
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate source one minus target one.
nop
db $0F,$FD,$CC /// paddw mm1, mm4 // Add the former target one to the result.
nop
db $0F,$6F,$E2 /// movq mm4, mm2 // Save target two.
db $0F,$D5,$D5 /// pmullw mm2, mm5 // Multiply target two with alpha.
db $0F,$D5,$DD /// pmullw mm3, mm5 // Multiply source two with alpha.
nop
db $0F,$71,$D2,$08 /// psrlw mm2, 8 // Divide target two by 256.
nop
db $0F,$71,$D3,$08 /// psrlw mm3, 8 // Divide source two by 256.
nop
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate source two minus source one.
nop
db $0F,$FD,$DC /// paddw mm3, mm4 // Add the former target two to the result.
db $0F,$EF,$E4 /// pxor mm4, mm4 // Clear mm4 so we can pack easily.
db $0F,$67,$CC /// packuswb mm1, mm4 // Pack the new target one.
db $0F,$67,$DC /// packuswb mm3, mm4 // Pack the new target two.
db $0F,$73,$F3,$20 /// psllq mm3, 32 // Shift up the new target two.
db $0F,$76,$E7 /// pcmpeqd mm4, mm7 // Create a color key mask.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new targets.
db $0F,$DB,$F4 /// pand mm6, mm4 // Keep old target where color key applies.
db $0F,$DF,$E1 /// pandn mm4, mm1 // Clear new target where color key applies.
nop
db $0F,$EB,$F4 /// por mm6, mm4 // Assemble the new target value.
nop
db $0F,$7F,$37 /// movq [edi], mm6 // Write back the new target value.
next32:
//
// Advance to the next pixel.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend32
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
db $0F,$77 /// emms
skip32:
pop ecx
end;
---------END OF CODE
Word out,
Alimonster
There are no stupid questions, but there are a lot of inquisitive idiots.
---------START OF CODE
//
// Alpha-blend four pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip555
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend555:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black555
cmp dword ptr [esi + 4], 0
jnz not_black555
jmp next555
not_black555:
//
// Alpha blend four target and source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop
db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
db $0F,$71,$D0,$0A /// psrlw mm0, 10 // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
db $0F,$71,$D1,$0A /// psrlw mm1, 10 // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop
pand mm2, i64MaskGreen // Extract the green target channel.
nop
pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
db $0F,$71,$D2,$05 /// psrlw mm2, 5 // Shift down the green target channel.
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
db $0F,$71,$F1,$0A /// psllw mm1, 10 // Shift up the red source again.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.
db $0F,$71,$D3,$05 /// psrlw mm3, 5 // Shift down the green source channel.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.
pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.
pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
db $0F,$F9,$EC /// psubw mm5, mm4 // Calculate blue source minus blue target.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.
db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.
db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
db $0F,$71,$F3,$05 /// psllw mm3, 5 // Shift up the green source again.
db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.
next555:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend555
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms
skip555:
pop ecx
end;
//
// Alpha-blend four pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip565
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend565:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black565
cmp dword ptr [esi + 4], 0
jnz not_black565
jmp next565
not_black565:
//
// Alpha blend four target and source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop
db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
db $0F,$71,$D0,$0B /// psrlw mm0, 11 // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
db $0F,$71,$D1,$0B /// psrlw mm1, 11 // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop
pand mm2, i64MaskGreen // Extract the green target channel.
nop
pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
db $0F,$71,$D2,$05 /// psrlw mm2, 5 // Shift down the green target channel.
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
db $0F,$71,$F1,$0B /// psllw mm1, 11 // Shift up the red source again.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.
db $0F,$71,$D3,$05 /// psrlw mm3, 5 // Shift down the green source channel.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.
pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.
pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
db $0F,$F9,$EC /// psubw mm5, mm4 // Calculate blue source minus blue target.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.
db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.
db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
db $0F,$71,$F3,$05 /// psllw mm3, 5 // Shift up the green source again.
db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.
next565:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend565
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms
skip565:
pop ecx
end;
//
// Alpha-blend four pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip16
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
do_blend16:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black16
cmp dword ptr [esi + 4], 0
jnz not_black16
jmp next16
not_black16:
//
// Alpha blend four target and source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop
db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.
pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.
pand mm1, i64MaskRed // Extract the red source channel.
nop
psrlw mm0, i64RdShift // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
psrlw mm1, i64RdShift // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
pand mm2, i64MaskGreen // Extract the green target channel.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.
pmullw mm1, i64Alpha // Multiply the red result with alpha.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.
pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
nop
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
nop
psllw mm1, i64RdShift // Shift up the red source again.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.
psrlw mm2, i64GrShift // Shift down the green target channel.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.
psrlw mm3, i64GrShift // Shift down the green source channel.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.
pmullw mm3, i64Alpha // Multiply the green result with alpha.
nop
psrlw mm4, i64BlShift // Shift down the blue source channel.
nop
pand mm5, i64MaskBlue // Extract the blue source channel.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.
db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
nop
psllw mm3, i64GrShift // Shift up the green source again.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.
psrlw mm5, i64BlShift // Divide the blue result by 256.
nop
db $0F,$F9,$EC /// psubw mm5, mm4 // Add the blue target to the blue result.
nop
pmullw mm5, i64Alpha // Multiply the blue result with alpha.
nop
nop
nop
db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
nop
db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
nop
psllw mm5, i64BlShift // Shift up the blue source again.
nop
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
nop
db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
nop
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
nop
db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.
next16:
//
// Advance to the next two pixels.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend16
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms
skip16:
pop ecx
end;
//
// Alpha blend the pixels in the current row.
//
asm
// Reset the width counter.
push ecx
mov ecx, iWidth
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
// Load the mask into an mmx register.
movq mm3, i64Mask
// Load the alpha value into an mmx register.
movq mm5, i64Alpha
// Clear an mmx register to facilitate unpacking.
db $0F,$EF,$F6 /// pxor mm6, mm6
push eax
do_blend24:
//
// Skip this pixel if it is black.
//
mov eax, [esi]
test eax, 00ffffffh // Do not ''and'' so that the high order byte is kept.
jnz not_black24
jmp next24
not_black24:
//
// Get a target and a source pixel.
//
(* The mmx registers will basically be used in the following way:
mm0: target value
mm1: source value
mm2: working register
mm3: mask ( 0x00ffffff )
mm4: working register
mm5: alpha value
mm6: zero for unpacking
mm7: original target
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6E,$07 /// movd mm0, [edi] // Load the target pixel.
db $0F,$6F,$E3 /// movq mm4, mm3 // Reload the mask ( 0x00ffffff ).
db $0F,$6E,$C8 /// movd mm1, eax // Load the source pixel.
db $0F,$6F,$F8 /// movq mm7, mm0 // Save the target pixel.
db $0F,$60,$C6 /// punpcklbw mm0, mm6 // Unpack the target pixel.
db $0F,$60,$CE /// punpcklbw mm1, mm6 // Unpack the source pixel.
db $0F,$6F,$D0 /// movq mm2, mm0 // Save the unpacked target values.
nop
db $0F,$D5,$C5 /// pmullw mm0, mm5 // Multiply the target with the alpha value.
nop
db $0F,$D5,$CD /// pmullw mm1, mm5 // Multiply the source with the alpha value.
nop
db $0F,$71,$D0,$08 /// psrlw mm0, 8 // Divide the target by 256.
nop
db $0F,$71,$D1,$08 /// psrlw mm1, 8 // Divide the source by 256.
nop
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate the source minus target.
nop
db $0F,$FD,$D1 /// paddw mm2, mm1 // Add former target value to the result.
nop
db $0F,$67,$D2 /// packuswb mm2, mm2 // Pack the new target.
nop
db $0F,$DB,$D4 /// pand mm2, mm4 // Mask of unwanted bytes.
nop
db $0F,$DF,$E7 /// pandn mm4, mm7 // Get the high order byte we must keep.
nop
db $0F,$EB,$D4 /// por mm2, mm4 // Assemble the value to write back.
nop
db $0F,$7E,$17 /// movd [edi], mm2 // Write back the new value.
next24:
//
// Advance to the next pixel.
//
add edi, 3
add esi, 3
//
// Loop again or break.
//
dec ecx
jnz do_blend24
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
db $0F,$77 /// emms
pop ecx
end;
//
// Alpha blend two pixels at once.
//
asm
//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip32
//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource
// Load the alpha value into an mmx register.
movq mm5, i64Alpha
push eax
do_blend32:
//
// Skip these two pixels if they are both black.
//
mov eax, [esi]
test eax, 00ffffffh
jnz not_black32
mov eax, [esi + 4]
test eax, 00ffffffh
jnz not_black32
jmp next32
not_black32:
//
// Alpha blend two target and two source pixels.
//
(* The mmx registers will basically be used in the following way:
mm0: target pixel one
mm1: source pixel one
mm2: target pixel two
mm3: source pixel two
mm4: working register
mm5: alpha value
mm6: original target
mm7: original source
Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)
db $0F,$6F,$37 /// movq mm6, [edi] // Load the target pixels.
db $0F,$EF,$E4 /// pxor mm4, mm4 // Clear mm4 so we can unpack easily.
db $0F,$6F,$3E /// movq mm7, [esi] // Load the source pixels.
db $0F,$6F,$C6 /// movq mm0, mm6 // Create copy one of the target.
db $0F,$6F,$D6 /// movq mm2, mm6 // Create copy two of the target.
db $0F,$60,$C4 /// punpcklbw mm0, mm4 // Unpack the first target copy.
db $0F,$6F,$CF /// movq mm1, mm7 // Create copy one of the source.
db $0F,$73,$D2,$20 /// psrlq mm2, 32 // Move the high order dword of target two.
db $0F,$60,$D4 /// punpcklbw mm2, mm4 // Unpack the second target copy.
db $0F,$6F,$DF /// movq mm3, mm7 // Create copy two of the source.
db $0F,$73,$D3,$20 /// psrlq mm3, 32 // Move the high order dword of source two.
db $0F,$60,$CC /// punpcklbw mm1, mm4 // Unpack the first source copy.
db $0F,$60,$DC /// punpcklbw mm3, mm4 // Unpack the second source copy.
db $0F,$72,$F7,$08 /// pslld mm7, 8 // Shift away original source highest bytes.
db $0F,$6F,$E0 /// movq mm4, mm0 // Save target one.
db $0F,$D5,$C5 /// pmullw mm0, mm5 // Multiply target one with alpha.
db $0F,$D5,$CD /// pmullw mm1, mm5 // Multiply source one with alpha.
db $0F,$72,$D7,$08 /// psrld mm7, 8 // Complete high order byte clearance.
db $0F,$71,$D1,$08 /// psrlw mm1, 8 // Divide source one by 256.
nop
db $0F,$71,$D0,$08 /// psrlw mm0, 8 // Divide target one by 256.
nop
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate source one minus target one.
nop
db $0F,$FD,$CC /// paddw mm1, mm4 // Add the former target one to the result.
nop
db $0F,$6F,$E2 /// movq mm4, mm2 // Save target two.
db $0F,$D5,$D5 /// pmullw mm2, mm5 // Multiply target two with alpha.
db $0F,$D5,$DD /// pmullw mm3, mm5 // Multiply source two with alpha.
nop
db $0F,$71,$D2,$08 /// psrlw mm2, 8 // Divide target two by 256.
nop
db $0F,$71,$D3,$08 /// psrlw mm3, 8 // Divide source two by 256.
nop
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate source two minus source one.
nop
db $0F,$FD,$DC /// paddw mm3, mm4 // Add the former target two to the result.
db $0F,$EF,$E4 /// pxor mm4, mm4 // Clear mm4 so we can pack easily.
db $0F,$67,$CC /// packuswb mm1, mm4 // Pack the new target one.
db $0F,$67,$DC /// packuswb mm3, mm4 // Pack the new target two.
db $0F,$73,$F3,$20 /// psllq mm3, 32 // Shift up the new target two.
db $0F,$76,$E7 /// pcmpeqd mm4, mm7 // Create a color key mask.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new targets.
db $0F,$DB,$F4 /// pand mm6, mm4 // Keep old target where color key applies.
db $0F,$DF,$E1 /// pandn mm4, mm1 // Clear new target where color key applies.
nop
db $0F,$EB,$F4 /// por mm6, mm4 // Assemble the new target value.
nop
db $0F,$7F,$37 /// movq [edi], mm6 // Write back the new target value.
next32:
//
// Advance to the next pixel.
//
add edi, 8
add esi, 8
//
// Loop again or break.
//
dec ecx
jnz do_blend32
//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
db $0F,$77 /// emms
skip32:
pop ecx
end;
---------END OF CODE
Word out,
Alimonster
There are no stupid questions, but there are a lot of inquisitive idiots.
Looks like some of the MMX instructions weren't converted by the expert... oh well, at least you've got about half the work now . I think anything beginning with 'p' (e.g. 'pand') is an mmx instruction, and mm* are the MMX registers.
EDIT - except push and pop, of course! Also, I've noticed that some of the instructions *have* been converted in one place, but not elsewhere. It seems the expert can only deal with some instructions when their data are the MMX registers - for example, search the code above for "// pand" and you'll see it got converted sometimes.
Alimonster
There are no stupid questions, but there are a lot of inquisitive idiots.
Edited by - Alimonster on January 18, 2002 10:52:56 AM
EDIT - except push and pop, of course! Also, I've noticed that some of the instructions *have* been converted in one place, but not elsewhere. It seems the expert can only deal with some instructions when their data are the MMX registers - for example, search the code above for "// pand" and you'll see it got converted sometimes.
Alimonster
There are no stupid questions, but there are a lot of inquisitive idiots.
Edited by - Alimonster on January 18, 2002 10:52:56 AM
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement