MMX AlphaBlending code inside...

Started by
8 comments, last by savage 22 years, 3 months ago
Hi All, At the bottom of this emails is the AlphaBlending code I am trying to port to Delphi. I do not have any problems with the C syntax and porting that, but I have not experience with Assembler which is used extensively in this version. I have tried compiling the assembler in here with Delphi but it does not understand certains instructions. Does anyone one how to port Microsoft assembler over so it will compile under Borland compilers like Delphi or C++ Builder? I also had the same problem when I tried to compile the Quake 2 source code with C++ Builder. When it got to the inline assembler routines it did a back flip. Anyway here is the C MMX AlphaBlending code... If you know how to port the assembler parts please let me know. Dominique ----- CODE FOLLOWS -----> /* * Description: Performs a blit operation while allowing for a variable alpha value, * making use of MMX technology. The function uses black as a color key * for the blit operation. * * Parameters: lpDDSDest - The destination surface of the blit. * * lpDDSSource - The source surface of the blit. * * iDestX - The horizontal coordinate to blit to on * the destination surface. * * iDestY - The vertical coordinate to blit to on the * destination surface. * * lprcSource - The address of a RECT structure that defines * the upper-left and lower-right corners of the * rectangle to blit from on the source surface. * * iAlpha - A value in the range from 0 to 256 that * determines the opacity of the source. * * dwMode - One of the following predefined values: * RGBMODE_555 - 16 bit mode ( 555 ) * RBGMODE_565 - 16 bit mode ( 565 ) * RGBMODE_16 - 16 bit mode ( unknown ) * RGBMODE_24 - 24 bit mode * RGBMODE_32 - 32 bit mode * * Return value: The functions returns 0 to indicate success or -1 if the call fails. * */ int BltAlphaMMX( LPDIRECTDRAWSURFACE7 lpDDSDest, LPDIRECTDRAWSURFACE7 lpDDSSource, int iDestX, int iDestY, LPRECT lprcSource, int iAlpha, DWORD dwMode ) { DDSURFACEDESC2 ddsdSource; DDSURFACEDESC2 ddsdTarget; RECT rcDest; DWORD dwTargetPad; DWORD dwSourcePad; DWORD dwTargetTemp; DWORD dwSourceTemp; DWORD dwSrcRed, dwSrcGreen, dwSrcBlue; DWORD dwTgtRed, dwTgtGreen, dwTgtBlue; DWORD dwRed, dwGreen, dwBlue; BYTE* lpbTarget; BYTE* lpbSource; __int64 i64MaskRed; __int64 i64MaskGreen; __int64 i64MaskBlue; __int64 i64Alpha; __int64 i64RdShift = 0; __int64 i64GrShift = 0; __int64 i64BlShift = 0; __int64 i64Mask; int iWidth; int iHeight; int iRemainder; bool gOddWidth; int iRet = 0; int i; // // Enforce the lower limit for the alpha value. // if ( iAlpha < 0 ) iAlpha = 0; // // Enforce the upper limit for the alpha value. // if ( iAlpha > 256 ) iAlpha = 256; // // Determine the dimensions of the source surface. // if ( lprcSource ) { // // Get the width and height from the passed rectangle. // iWidth = lprcSource->right - lprcSource->left; iHeight = lprcSource->bottom - lprcSource->top; } else { // // Get the with and height from the surface description. // memset( &ddsdSource, 0, sizeof ddsdSource ); ddsdSource.dwSize = sizeof ddsdSource; ddsdSource.dwFlags = DDSD_WIDTH | DDSD_HEIGHT; lpDDSSource->GetSurfaceDesc( &ddsdSource ); // // Remember the dimensions. // iWidth = ddsdSource.dwWidth; iHeight = ddsdSource.dwHeight; } // // Calculate the rectangle to be locked in the target. // rcDest.left = iDestX; rcDest.top = iDestY; rcDest.right = iDestX + iWidth; rcDest.bottom = iDestY + iHeight; // // Lock down the destination surface. // memset( &ddsdTarget, 0, sizeof ddsdTarget ); ddsdTarget.dwSize = sizeof ddsdTarget; lpDDSDest->Lock( &rcDest, &ddsdTarget, DDLOCK_WAIT, NULL ); // // Lock down the source surface. // memset( &ddsdSource, 0, sizeof ddsdSource ); ddsdSource.dwSize = sizeof ddsdSource; lpDDSSource->Lock( lprcSource, &ddsdSource, DDLOCK_WAIT, NULL ); switch ( dwMode ) { /* 16 bit mode ( 555 ). This algorithm can process four pixels at once. */ case RGBMODE_555: // // Determine the padding bytes for the target and the source. // dwTargetPad = ddsdTarget.lPitch - ( iWidth * 2 ); dwSourcePad = ddsdSource.lPitch - ( iWidth * 2 ); // // We process four pixels at once, so the // width must be a multiple of four. // iRemainder = ( iWidth & 0x03 ); iWidth = ( iWidth & ~0x03 ) / 4; // // Set the bit masks for red, green and blue. // i64MaskRed = 0x7c007c007c007c00; i64MaskGreen =0x03e003e003e003e0; i64MaskBlue = 0x001f001f001f001f; // // Compose the quadruple alpha value. // i64Alpha = iAlpha; i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ) | ( i64Alpha << 48 ); // Get the address of the target. lpbTarget = ( BYTE* ) ddsdTarget.lpSurface; // Get the address of the source. lpbSource = ( BYTE* ) ddsdSource.lpSurface; do { // Reset the width. i = iWidth; // // Alpha-blend four pixels at once. // __asm { // // Initialize the counter and skip // if the latter is equal to zero. // push ecx mov ecx, i cmp ecx, 0 jz skip555 // // Load the frame buffer pointers into the registers. // push edi push esi mov edi, lpbTarget mov esi, lpbSource do_blend555: // // Skip these four pixels if they are all black. // cmp dword ptr [esi], 0 jnz not_black555 cmp dword ptr [esi + 4], 0 jnz not_black555 jmp next555 not_black555: // // Alpha blend four target and source pixels. // /* The mmx registers will basically be used in the following way: mm0: red target value mm1: red source value mm2: green target value mm3: green source value mm4: blue target value mm5: blue source value mm6: original target pixel mm7: original source pixel /* Note: Two lines together are assumed to pair in the processor´s U- and V-pipes. */ movq mm6, [edi] // Load the original target pixel. nop movq mm7, [esi] // Load the original source pixel. movq mm0, mm6 // Load the register for the red target. pand mm0, i64MaskRed // Extract the red target channel. movq mm1, mm7 // Load the register for the red source. pand mm1, i64MaskRed // Extract the red source channel. psrlw mm0, 10 // Shift down the red target channel. movq mm2, mm6 // Load the register for the green target. psrlw mm1, 10 // Shift down the red source channel. movq mm3, mm7 // Load the register for the green source. psubw mm1, mm0 // Calculate red source minus red target. pmullw mm1, i64Alpha // Multiply the red result with alpha. nop pand mm2, i64MaskGreen // Extract the green target channel. nop pand mm3, i64MaskGreen // Extract the green source channel. psraw mm1, 8 // Divide the red result by 256. psrlw mm2, 5 // Shift down the green target channel. paddw mm1, mm0 // Add the red target to the red result. psllw mm1, 10 // Shift up the red source again. movq mm4, mm6 // Load the register for the blue target. psrlw mm3, 5 // Shift down the green source channel. movq mm5, mm7 // Load the register for the blue source. pand mm4, i64MaskBlue // Extract the blue target channel. psubw mm3, mm2 // Calculate green source minus green target. pand mm5, i64MaskBlue // Extract the blue source channel. pmullw mm3, i64Alpha // Multiply the green result with alpha. psubw mm5, mm4 // Calculate blue source minus blue target. pxor mm0, mm0 // Create black as the color key. pmullw mm5, i64Alpha // Multiply the blue result with alpha. psraw mm3, 8 // Divide the green result by 256. paddw mm3, mm2 // Add the green target to the green result. pcmpeqw mm0, mm7 // Create a color key mask. psraw mm5, 8 // Divide the blue result by 256. psllw mm3, 5 // Shift up the green source again. paddw mm5, mm4 // Add the blue target to the blue result. por mm1, mm3 // Combine the new red and green values. pand mm6, mm0 // Keep old target where the color key applies. por mm1, mm5 // Combine new blue value with the others. pandn mm0, mm1 // Keep new target where no color key applies. por mm6, mm0 // Assemble new target value. movq [edi], mm6 // Write back new target value. next555: // // Advance to the next four pixels. // add edi, 8 add esi, 8 // // Loop again or break. // dec ecx jnz do_blend555 // // Write back the frame buffer pointers and clean up. // mov lpbTarget, edi mov lpbSource, esi pop esi pop edi emms skip555: pop ecx } // // Alpha blend any remaining pixels. // for ( i = 0; i < iRemainder; i++ ) { // Read in one source pixel. dwSourceTemp = *( ( WORD* ) lpbSource ); // If this is not the color key ... if ( dwSourceTemp != 0 ) { // // ... apply the alpha blend to it. // // Read in one target pixel. dwTargetTemp = *( ( WORD* ) lpbTarget ); // Extract the red channels. dwTgtRed = ( dwTargetTemp >> 10 ) & 0x1f; dwSrcRed = ( dwSourceTemp >> 10 ) & 0x1f; // Extract the green channels. dwTgtGreen = ( dwTargetTemp >> 5 ) & 0x1f; dwSrcGreen = ( dwSourceTemp >> 5 ) & 0x1f; // Extract the blue channels. dwTgtBlue = dwTargetTemp & 0x1f; dwSrcBlue = dwSourceTemp & 0x1f; // Write the destination pixel. *( ( WORD* ) lpbTarget ) = ( WORD ) ( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) >> 8 ) + dwTgtRed ) << 10 | ( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) >> 8 ) + dwTgtGreen ) << 5 | ( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) >> 8 ) + dwTgtBlue ) ); } // // Proceed to next pixel. // lpbTarget += 2; lpbSource += 2; } // // Proceed to the next line. // lpbTarget += dwTargetPad; lpbSource += dwSourcePad; } while ( --iHeight > 0 ); break; /* 16 bit mode ( 565 ). This algorithm can process four pixels at once. */ case RGBMODE_565: // // Determine the padding bytes for the target and the source. // dwTargetPad = ddsdTarget.lPitch - ( iWidth * 2 ); dwSourcePad = ddsdSource.lPitch - ( iWidth * 2 ); // // We process four pixels at once, so the // width must be a multiple of four. // iRemainder = ( iWidth & 0x03 ); iWidth = ( iWidth & ~0x03 ) / 4; // // Set the bit masks for red, green and blue. // i64MaskRed = 0xf800f800f800f800; i64MaskGreen =0x07e007e007e007e0; i64MaskBlue = 0x001f001f001f001f; // // Compose the quadruple alpha value. // i64Alpha = iAlpha; i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ) | ( i64Alpha << 48 ); // Get the address of the target. lpbTarget = ( BYTE* ) ddsdTarget.lpSurface; // Get the address of the source. lpbSource = ( BYTE* ) ddsdSource.lpSurface; do { // Reset the width. i = iWidth; // // Alpha-blend four pixels at once. // __asm { // // Initialize the counter and skip // if the latter is equal to zero. // push ecx mov ecx, i cmp ecx, 0 jz skip565 // // Load the frame buffer pointers into the registers. // push edi push esi mov edi, lpbTarget mov esi, lpbSource do_blend565: // // Skip these four pixels if they are all black. // cmp dword ptr [esi], 0 jnz not_black565 cmp dword ptr [esi + 4], 0 jnz not_black565 jmp next565 not_black565: // // Alpha blend four target and source pixels. // /* The mmx registers will basically be used in the following way: mm0: red target value mm1: red source value mm2: green target value mm3: green source value mm4: blue target value mm5: blue source value mm6: original target pixel mm7: original source pixel /* Note: Two lines together are assumed to pair in the processor´s U- and V-pipes. */ movq mm6, [edi] // Load the original target pixel. nop movq mm7, [esi] // Load the original source pixel. movq mm0, mm6 // Load the register for the red target. pand mm0, i64MaskRed // Extract the red target channel. movq mm1, mm7 // Load the register for the red source. pand mm1, i64MaskRed // Extract the red source channel. psrlw mm0, 11 // Shift down the red target channel. movq mm2, mm6 // Load the register for the green target. psrlw mm1, 11 // Shift down the red source channel. movq mm3, mm7 // Load the register for the green source. psubw mm1, mm0 // Calculate red source minus red target. pmullw mm1, i64Alpha // Multiply the red result with alpha. nop pand mm2, i64MaskGreen // Extract the green target channel. nop pand mm3, i64MaskGreen // Extract the green source channel. psraw mm1, 8 // Divide the red result by 256. psrlw mm2, 5 // Shift down the green target channel. paddw mm1, mm0 // Add the red target to the red result. psllw mm1, 11 // Shift up the red source again. movq mm4, mm6 // Load the register for the blue target. psrlw mm3, 5 // Shift down the green source channel. movq mm5, mm7 // Load the register for the blue source. pand mm4, i64MaskBlue // Extract the blue target channel. psubw mm3, mm2 // Calculate green source minus green target. pand mm5, i64MaskBlue // Extract the blue source channel. pmullw mm3, i64Alpha // Multiply the green result with alpha. psubw mm5, mm4 // Calculate blue source minus blue target. pxor mm0, mm0 // Create black as the color key. pmullw mm5, i64Alpha // Multiply the blue result with alpha. psraw mm3, 8 // Divide the green result by 256. paddw mm3, mm2 // Add the green target to the green result. pcmpeqw mm0, mm7 // Create a color key mask. psraw mm5, 8 // Divide the blue result by 256. psllw mm3, 5 // Shift up the green source again. paddw mm5, mm4 // Add the blue target to the blue result. por mm1, mm3 // Combine the new red and green values. pand mm6, mm0 // Keep old target where the color key applies. por mm1, mm5 // Combine new blue value with the others. pandn mm0, mm1 // Keep new target where no color key applies. por mm6, mm0 // Assemble new target value. movq [edi], mm6 // Write back new target value. next565: // // Advance to the next four pixels. // add edi, 8 add esi, 8 // // Loop again or break. // dec ecx jnz do_blend565 // // Write back the frame buffer pointers and clean up. // mov lpbTarget, edi mov lpbSource, esi pop esi pop edi emms skip565: pop ecx } // // Alpha blend any remaining pixels. // for ( i = 0; i < iRemainder; i++ ) { // Read in one source pixel. dwSourceTemp = *( ( WORD* ) lpbSource ); // If this is not the color key ... if ( dwSourceTemp != 0 ) { // // ... apply the alpha blend to it. // // Read in one target pixel. dwTargetTemp = *( ( WORD* ) lpbTarget ); // Extract the red channels. dwTgtRed = ( dwTargetTemp >> 11 ) & 0x1f; dwSrcRed = ( dwSourceTemp >> 11 ) & 0x1f; // Extract the green channels. dwTgtGreen = ( dwTargetTemp >> 5 ) & 0x3f; dwSrcGreen = ( dwSourceTemp >> 5 ) & 0x3f; // Extract the blue channels. dwTgtBlue = dwTargetTemp & 0x1f; dwSrcBlue = dwSourceTemp & 0x1f; // Write the destination pixel. *( ( WORD* ) lpbTarget ) = ( WORD ) ( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) >> 8 ) + dwTgtRed ) << 11 | ( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) >> 8 ) + dwTgtGreen ) << 5 | ( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) >> 8 ) + dwTgtBlue ) ); } // // Proceed to next pixel. // lpbTarget += 2; lpbSource += 2; } // // Proceed to the next line. // lpbTarget += dwTargetPad; lpbSource += dwSourcePad; } while ( --iHeight > 0 ); break; /* 16 bit mode ( unknown ). This algorithm can process four pixels at once. */ case RGBMODE_16: // // Determine the padding bytes for the target and the source. // dwTargetPad = ddsdTarget.lPitch - ( iWidth * 2 ); dwSourcePad = ddsdSource.lPitch - ( iWidth * 2 ); // // We process four pixels at once, so the // width must be a multiple of four. // iRemainder = ( iWidth & 0x03 ); iWidth = ( iWidth & ~0x03 ) / 4; // // Determine the distance in bits of each bit mask from the right. // while ( ( ( ddsdTarget.ddpfPixelFormat.dwRBitMask >> i64RdShift ) & 0x01 ) == 0 ) i64RdShift++; while ( ( ( ddsdTarget.ddpfPixelFormat.dwGBitMask >> i64GrShift ) & 0x01 ) == 0 ) i64GrShift++; while ( ( ( ddsdTarget.ddpfPixelFormat.dwBBitMask >> i64BlShift ) & 0x01 ) == 0 ) i64BlShift++; // // Compose the bit masks for each color channel. // i64MaskRed = ddsdTarget.ddpfPixelFormat.dwRBitMask; i64MaskRed |= ( i64MaskRed << 16 ) | ( i64MaskRed << 32 ) | ( i64MaskRed << 48 ); i64MaskGreen = ddsdTarget.ddpfPixelFormat.dwGBitMask; i64MaskGreen |= ( i64MaskGreen << 16 ) | ( i64MaskGreen << 32 ) | ( i64MaskGreen << 48 ); i64MaskBlue = ddsdTarget.ddpfPixelFormat.dwBBitMask; i64MaskBlue |= ( i64MaskBlue << 16 ) | ( i64MaskBlue << 32 ) | ( i64MaskBlue << 48 ); // // Compose the quadruple alpha value. // i64Alpha = iAlpha; i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ) | ( i64Alpha << 48 ); // Get the address of the target. lpbTarget = ( BYTE* ) ddsdTarget.lpSurface; // Get the address of the source. lpbSource = ( BYTE* ) ddsdSource.lpSurface; do { // Reset the width. i = iWidth; // // Alpha-blend four pixels at once. // __asm { // // Initialize the counter and skip // if the latter is equal to zero. // push ecx mov ecx, i cmp ecx, 0 jz skip16 // // Load the frame buffer pointers into the registers. // push edi push esi mov edi, lpbTarget mov esi, lpbSource do_blend16: // // Skip these four pixels if they are all black. // cmp dword ptr [esi], 0 jnz not_black16 cmp dword ptr [esi + 4], 0 jnz not_black16 jmp next16 not_black16: // // Alpha blend four target and source pixels. // /* The mmx registers will basically be used in the following way: mm0: red target value mm1: red source value mm2: green target value mm3: green source value mm4: blue target value mm5: blue source value mm6: original target pixel mm7: original source pixel /* Note: Two lines together are assumed to pair in the processor´s U- and V-pipes. */ movq mm6, [edi] // Load the original target pixel. nop movq mm7, [esi] // Load the original source pixel. movq mm0, mm6 // Load the register for the red target. pand mm0, i64MaskRed // Extract the red target channel. movq mm1, mm7 // Load the register for the red source. pand mm1, i64MaskRed // Extract the red source channel. nop psrlw mm0, i64RdShift // Shift down the red target channel. movq mm2, mm6 // Load the register for the green target. psrlw mm1, i64RdShift // Shift down the red source channel. movq mm3, mm7 // Load the register for the green source. pand mm2, i64MaskGreen // Extract the green target channel. psubw mm1, mm0 // Calculate red source minus red target. pmullw mm1, i64Alpha // Multiply the red result with alpha. movq mm5, mm7 // Load the register for the blue source. pand mm3, i64MaskGreen // Extract the green source channel. movq mm4, mm6 // Load the register for the blue target. psraw mm1, 8 // Divide the red result by 256. nop paddw mm1, mm0 // Add the red target to the red result. nop psllw mm1, i64RdShift // Shift up the red source again. pxor mm0, mm0 // Create black as the color key. psrlw mm2, i64GrShift // Shift down the green target channel. pcmpeqw mm0, mm7 // Create a color key mask. psrlw mm3, i64GrShift // Shift down the green source channel. pand mm6, mm0 // Keep old target where the color key applies. pand mm4, i64MaskBlue // Extract the blue target channel. psubw mm3, mm2 // Calculate green source minus green target. pmullw mm3, i64Alpha // Multiply the green result with alpha. nop psrlw mm4, i64BlShift // Shift down the blue source channel. nop pand mm5, i64MaskBlue // Extract the blue source channel. psraw mm3, 8 // Divide the green result by 256. paddw mm3, mm2 // Add the green target to the green result. nop psllw mm3, i64GrShift // Shift up the green source again. por mm1, mm3 // Combine the new red and green values. psrlw mm5, i64BlShift // Divide the blue result by 256. nop psubw mm5, mm4 // Add the blue target to the blue result. nop pmullw mm5, i64Alpha // Multiply the blue result with alpha. nop nop nop psraw mm5, 8 // Divide the blue result by 256. nop paddw mm5, mm4 // Add the blue target to the blue result. nop psllw mm5, i64BlShift // Shift up the blue source again. nop por mm1, mm5 // Combine new blue value with the others. nop pandn mm0, mm1 // Keep new target where no color key applies. nop por mm6, mm0 // Assemble new target value. nop movq [edi], mm6 // Write back new target value. next16: // // Advance to the next two pixels. // add edi, 8 add esi, 8 // // Loop again or break. // dec ecx jnz do_blend16 // // Write back the frame buffer pointers and clean up. // mov lpbTarget, edi mov lpbSource, esi pop esi pop edi emms skip16: pop ecx } // // Alpha blend any remaining pixels. // for ( i = 0; i < iRemainder; i++ ) { // Read in the next source pixel. dwSourceTemp = *( ( WORD* ) lpbSource ); // If the source pixel is not black ... if ( dwSourceTemp != 0 ) { // ... read in the next target pixel. dwTargetTemp = *( ( WORD* ) lpbTarget ); // Extract the red channels. dwTgtRed = dwTargetTemp & ddsdTarget.ddpfPixelFormat.dwRBitMask; dwSrcRed = dwSourceTemp & ddsdSource.ddpfPixelFormat.dwRBitMask; // Extract the green channels. dwTgtGreen = dwTargetTemp & ddsdTarget.ddpfPixelFormat.dwGBitMask; dwSrcGreen = dwSourceTemp & ddsdSource.ddpfPixelFormat.dwGBitMask; // Extract the blue channel. dwTgtBlue = dwTargetTemp & ddsdTarget.ddpfPixelFormat.dwBBitMask; dwSrcBlue = dwSourceTemp & ddsdSource.ddpfPixelFormat.dwBBitMask; // Calculate the alpha-blended red channel. dwRed = ( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) ) >> 8 ) + dwTgtRed ) & ddsdTarget.ddpfPixelFormat.dwRBitMask; // Calculate the alpha-blended green channel. dwGreen = ( ( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) ) >> 8 ) + dwTgtGreen ) & ddsdTarget.ddpfPixelFormat.dwGBitMask; // Calculate the alpha-blended blue channel. dwBlue = ( ( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) ) >> 8 ) + dwTgtBlue ) & ddsdTarget.ddpfPixelFormat.dwBBitMask; // Write the destination pixel. *( ( WORD* ) lpbTarget ) = ( WORD ) ( dwRed | dwGreen | dwBlue ); } // // Proceed to next pixel. // lpbTarget += 2; lpbSource += 2; } // // Proceed to the next line. // lpbTarget += dwTargetPad; lpbSource += dwSourcePad; } while ( --iHeight > 0 ); break; /* 24 bit mode. */ case RGBMODE_24: // // Determine the padding bytes for the target and the source. // dwTargetPad = ddsdTarget.lPitch - ( iWidth * 3 ); dwSourcePad = ddsdSource.lPitch - ( iWidth * 3 ); // // Compose the triple alpha value. // i64Alpha = iAlpha; i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ); // Create a general purpose mask. i64Mask = 0x0000000000ffffff; // Get the address of the target. lpbTarget = ( BYTE* ) ddsdTarget.lpSurface; // Get the address of the source. lpbSource = ( BYTE* ) ddsdSource.lpSurface; do { // // Alpha blend the pixels in the current row. // __asm { // Reset the width counter. push ecx mov ecx, iWidth // // Load the frame buffer pointers into the registers. // push edi push esi mov edi, lpbTarget mov esi, lpbSource // Load the mask into an mmx register. movq mm3, i64Mask // Load the alpha value into an mmx register. movq mm5, i64Alpha // Clear an mmx register to facilitate unpacking. pxor mm6, mm6 push eax do_blend24: // // Skip this pixel if it is black. // mov eax, [esi] test eax, 00ffffffh // Do not 'and' so that the high order byte is kept. jnz not_black24 jmp next24 not_black24: // // Get a target and a source pixel. // /* The mmx registers will basically be used in the following way: mm0: target value mm1: source value mm2: working register mm3: mask ( 0x00ffffff ) mm4: working register mm5: alpha value mm6: zero for unpacking mm7: original target /* Note: Two lines together are assumed to pair in the processor´s U- and V-pipes. */ movd mm0, [edi] // Load the target pixel. movq mm4, mm3 // Reload the mask ( 0x00ffffff ). movd mm1, eax // Load the source pixel. movq mm7, mm0 // Save the target pixel. punpcklbw mm0, mm6 // Unpack the target pixel. punpcklbw mm1, mm6 // Unpack the source pixel. movq mm2, mm0 // Save the unpacked target values. nop pmullw mm0, mm5 // Multiply the target with the alpha value. nop pmullw mm1, mm5 // Multiply the source with the alpha value. nop psrlw mm0, 8 // Divide the target by 256. nop psrlw mm1, 8 // Divide the source by 256. nop psubw mm1, mm0 // Calculate the source minus target. nop paddw mm2, mm1 // Add former target value to the result. nop packuswb mm2, mm2 // Pack the new target. nop pand mm2, mm4 // Mask of unwanted bytes. nop pandn mm4, mm7 // Get the high order byte we must keep. nop por mm2, mm4 // Assemble the value to write back. nop movd [edi], mm2 // Write back the new value. next24: // // Advance to the next pixel. // add edi, 3 add esi, 3 // // Loop again or break. // dec ecx jnz do_blend24 // // Write back the frame buffer pointers and clean up. // mov lpbTarget, edi mov lpbSource, esi pop eax pop esi pop edi emms pop ecx } // // Proceed to the next line. // lpbTarget += dwTargetPad; lpbSource += dwSourcePad; } while ( --iHeight > 0 ); break; /* 32 bit mode. This algorithm can process two pixels at once. */ case RGBMODE_32: // // Determine the padding bytes for the target and the source. // dwTargetPad = ddsdTarget.lPitch - ( iWidth * 4 ); dwSourcePad = ddsdSource.lPitch - ( iWidth * 4 ); // If the width is odd ... if ( iWidth & 0x01 ) { // ... set the flag ... gOddWidth = true; // ... and calculate the width. iWidth = ( iWidth - 1 ) / 2; } // If the width is even ... else { // ... clear the flag ... gOddWidth = false; // ... and calculate the width. iWidth /= 2; } // // Compose the triple alpha value. // i64Alpha = iAlpha; i64Alpha |= ( i64Alpha << 16 ) | ( i64Alpha << 32 ); // Get the address of the target. lpbTarget = ( BYTE* ) ddsdTarget.lpSurface; // Get the address of the source. lpbSource = ( BYTE* ) ddsdSource.lpSurface; do { // Reset the width. i = iWidth; // // Alpha blend two pixels at once. // __asm { // // Initialize the counter and skip // if the latter is equal to zero. // push ecx mov ecx, i cmp ecx, 0 jz skip32 // // Load the frame buffer pointers into the registers. // push edi push esi mov edi, lpbTarget mov esi, lpbSource // Load the alpha value into an mmx register. movq mm5, i64Alpha push eax do_blend32: // // Skip these two pixels if they are both black. // mov eax, [esi] test eax, 00ffffffh jnz not_black32 mov eax, [esi + 4] test eax, 00ffffffh jnz not_black32 jmp next32 not_black32: // // Alpha blend two target and two source pixels. // /* The mmx registers will basically be used in the following way: mm0: target pixel one mm1: source pixel one mm2: target pixel two mm3: source pixel two mm4: working register mm5: alpha value mm6: original target mm7: original source /* Note: Two lines together are assumed to pair in the processor´s U- and V-pipes. */ movq mm6, [edi] // Load the target pixels. pxor mm4, mm4 // Clear mm4 so we can unpack easily. movq mm7, [esi] // Load the source pixels. movq mm0, mm6 // Create copy one of the target. movq mm2, mm6 // Create copy two of the target. punpcklbw mm0, mm4 // Unpack the first target copy. movq mm1, mm7 // Create copy one of the source. psrlq mm2, 32 // Move the high order dword of target two. punpcklbw mm2, mm4 // Unpack the second target copy. movq mm3, mm7 // Create copy two of the source. psrlq mm3, 32 // Move the high order dword of source two. punpcklbw mm1, mm4 // Unpack the first source copy. punpcklbw mm3, mm4 // Unpack the second source copy. pslld mm7, 8 // Shift away original source highest bytes. movq mm4, mm0 // Save target one. pmullw mm0, mm5 // Multiply target one with alpha. pmullw mm1, mm5 // Multiply source one with alpha. psrld mm7, 8 // Complete high order byte clearance. psrlw mm1, 8 // Divide source one by 256. nop psrlw mm0, 8 // Divide target one by 256. nop psubw mm1, mm0 // Calculate source one minus target one. nop paddw mm1, mm4 // Add the former target one to the result. nop movq mm4, mm2 // Save target two. pmullw mm2, mm5 // Multiply target two with alpha. pmullw mm3, mm5 // Multiply source two with alpha. nop psrlw mm2, 8 // Divide target two by 256. nop psrlw mm3, 8 // Divide source two by 256. nop psubw mm3, mm2 // Calculate source two minus source one. nop paddw mm3, mm4 // Add the former target two to the result. pxor mm4, mm4 // Clear mm4 so we can pack easily. packuswb mm1, mm4 // Pack the new target one. packuswb mm3, mm4 // Pack the new target two. psllq mm3, 32 // Shift up the new target two. pcmpeqd mm4, mm7 // Create a color key mask. por mm1, mm3 // Combine the new targets. pand mm6, mm4 // Keep old target where color key applies. pandn mm4, mm1 // Clear new target where color key applies. nop por mm6, mm4 // Assemble the new target value. nop movq [edi], mm6 // Write back the new target value. next32: // // Advance to the next pixel. // add edi, 8 add esi, 8 // // Loop again or break. // dec ecx jnz do_blend32 // // Write back the frame buffer pointers and clean up. // mov lpbTarget, edi mov lpbSource, esi pop eax pop esi pop edi emms skip32: pop ecx } // // Handle an odd width. // if ( gOddWidth ) { // Read in the next source pixel. dwSourceTemp = *( ( DWORD* ) lpbSource ); // If the source pixel is not black ... if ( ( dwSourceTemp & 0xffffff ) != 0 ) { // ... read in the next target pixel. dwTargetTemp = *( ( DWORD* ) lpbTarget ); // Extract the red channels. dwTgtRed = dwTargetTemp & 0xff0000; dwSrcRed = dwSourceTemp & 0xff0000; // Extract the green channels. dwTgtGreen = dwTargetTemp & 0xff00; dwSrcGreen = dwSourceTemp & 0xff00; // Extract the blue channel. dwTgtBlue = dwTargetTemp & 0xff; dwSrcBlue = dwSourceTemp & 0xff; // Calculate the destination pixel. dwTargetTemp = ( ( ( ( iAlpha * ( dwSrcRed - dwTgtRed ) >> 8 ) + dwTgtRed ) & 0xff0000 ) | ( ( ( iAlpha * ( dwSrcGreen - dwTgtGreen ) >> 8 ) + dwTgtGreen ) & 0xff00 ) | ( ( iAlpha * ( dwSrcBlue - dwTgtBlue ) >> 8 ) + dwTgtBlue ) ); // Write the destination pixel. *( ( DWORD* ) lpbTarget ) = dwTargetTemp; } // // Proceed to the next pixel. // lpbTarget += 4; lpbSource += 4; } // // Proceed to the next line. // lpbTarget += dwTargetPad; lpbSource += dwSourcePad; } while ( --iHeight > 0 ); break; /* Invalid mode. */ default: iRet = -1; } // Unlock the target surface. lpDDSDest->Unlock( &rcDest ); // Unlock the source surface. lpDDSSource->Unlock( lprcSource ); // Return the result. return iRet; } PS. If anyone wants the non MMX versions of this routine please let me know and I will send it to you. It works will with IDirect Draw7 surfaces. http://www.DelphiGamer.com := for all your Object Pascal game development needs; Edited by - savage on January 16, 2002 12:15:16 PM
http://www.PascalGameDevelopment.com := go on, write a game instead;
Advertisement
If I remember right, the in-line assembler doesn''t like MMX instructions (prior to Delphi 6, anyway). Try using this expert, which converts MMX instructions to their op-code equivalent. Maybe it''ll fix the errors:

Hori''s MMX Expert

Alimonster

There are no stupid questions, but there are a lot of inquisitive idiots.
HI AliMonster,
Thanks for the suggestion. I tried it and unfortunately it did not do anything to the code I highlighter and pressed Ctrl+A.

Any other suggestions?


Thanks,


Dominique.

http://www.DelphiGamer.com := for all your Object Pascal game development needs;
http://www.PascalGameDevelopment.com := go on, write a game instead;
First of all, Delphi6 should be able to compile MMX instructions. For lower versions you do have to convert MMX routines into machine code.
And finally, as alternate way, you may try compiling DLL in your C/C++ compiler and using it in Delphi.

Hope this helps
- Lifepower
Hi LifePower,
As I am using Delphi 4 and 5, It looks like I will have to go fro the "convert MMX routines into machine code" option as I do not want to add another DLL into the mix.


Thanks for your suggestions it looks like I will have to trawl through the Intel site this week-end.


Thanks



Dominique.

http://www.DelphiGamer.com := for all your Object Pascal game development needs;
http://www.PascalGameDevelopment.com := go on, write a game instead;
Greetz... anyway, just wanted to add that Hori''s MMX Expert is actually converting MMX instructions into the given machine code. For the conversion task you''d have either to use the utility mentioned above or look for another converter... otherwise it''s not very efficient (in terms of time) to convert your program into machine code.
Hi LifePower,
How does Hori''s Expert work? Because when I highlight the MMX code and press Ctrl+A it does not do anything for me in Delphi 4.


Thanks,



Dominique.

http://www.DelphiGamer.com := for all your Object Pascal game development needs;
http://www.PascalGameDevelopment.com := go on, write a game instead;
Yep, I got Hori''s expert to work just fine (Delphi 4, Standard edition). I started up Delphi 4, went to the "Install Packages" menu and added the .bpl for Delphi 4. The menu worked fine. That''s not much use to you, but this will be... here''s the translation for the asm blocks (I stripped out the C, so you''ll have to use the comments to guide you. It''s in the same order as your post).

---------START OF CODE

//
// Alpha-blend four pixels at once.
//
asm

//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip555

//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource

do_blend555:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black555
cmp dword ptr [esi + 4], 0
jnz not_black555
jmp next555

not_black555:
//
// Alpha blend four target and source pixels.
//

(* The mmx registers will basically be used in the following way:

mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel

Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)

db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop

db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.

pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.

pand mm1, i64MaskRed // Extract the red source channel.
db $0F,$71,$D0,$0A /// psrlw mm0, 10 // Shift down the red target channel.

db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
db $0F,$71,$D1,$0A /// psrlw mm1, 10 // Shift down the red source channel.

db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.

pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop

pand mm2, i64MaskGreen // Extract the green target channel.
nop

pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.

db $0F,$71,$D2,$05 /// psrlw mm2, 5 // Shift down the green target channel.
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.

db $0F,$71,$F1,$0A /// psllw mm1, 10 // Shift up the red source again.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.

db $0F,$71,$D3,$05 /// psrlw mm3, 5 // Shift down the green source channel.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.

pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.

pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.

db $0F,$F9,$EC /// psubw mm5, mm4 // Calculate blue source minus blue target.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.

pmullw mm5, i64Alpha // Multiply the blue result with alpha.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.

db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.

db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
db $0F,$71,$F3,$05 /// psllw mm3, 5 // Shift up the green source again.

db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.

db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.

db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.

db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.

next555:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8

//
// Loop again or break.
//
dec ecx
jnz do_blend555

//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms

skip555:
pop ecx

end;

//
// Alpha-blend four pixels at once.
//
asm

//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip565

//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource

do_blend565:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black565
cmp dword ptr [esi + 4], 0
jnz not_black565
jmp next565

not_black565:
//
// Alpha blend four target and source pixels.
//

(* The mmx registers will basically be used in the following way:

mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel

Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)

db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop

db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.

pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.

pand mm1, i64MaskRed // Extract the red source channel.
db $0F,$71,$D0,$0B /// psrlw mm0, 11 // Shift down the red target channel.

db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.
db $0F,$71,$D1,$0B /// psrlw mm1, 11 // Shift down the red source channel.

db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.

pmullw mm1, i64Alpha // Multiply the red result with alpha.
nop

pand mm2, i64MaskGreen // Extract the green target channel.
nop

pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.

db $0F,$71,$D2,$05 /// psrlw mm2, 5 // Shift down the green target channel.
db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.

db $0F,$71,$F1,$0B /// psllw mm1, 11 // Shift up the red source again.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.

db $0F,$71,$D3,$05 /// psrlw mm3, 5 // Shift down the green source channel.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.

pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.

pand mm5, i64MaskBlue // Extract the blue source channel.
pmullw mm3, i64Alpha // Multiply the green result with alpha.

db $0F,$F9,$EC /// psubw mm5, mm4 // Calculate blue source minus blue target.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.

pmullw mm5, i64Alpha // Multiply the blue result with alpha.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.

db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.

db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
db $0F,$71,$F3,$05 /// psllw mm3, 5 // Shift up the green source again.

db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.

db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.
db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.

db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.

db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.

next565:
//
// Advance to the next four pixels.
//
add edi, 8
add esi, 8

//
// Loop again or break.
//
dec ecx
jnz do_blend565

//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms

skip565:
pop ecx
end;

//
// Alpha-blend four pixels at once.
//
asm

//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip16

//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource

do_blend16:
//
// Skip these four pixels if they are all black.
//
cmp dword ptr [esi], 0
jnz not_black16
cmp dword ptr [esi + 4], 0
jnz not_black16
jmp next16

not_black16:
//
// Alpha blend four target and source pixels.
//

(* The mmx registers will basically be used in the following way:

mm0: red target value
mm1: red source value
mm2: green target value
mm3: green source value
mm4: blue target value
mm5: blue source value
mm6: original target pixel
mm7: original source pixel

Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)

db $0F,$6F,$37 /// movq mm6, [edi] // Load the original target pixel.
nop

db $0F,$6F,$3E /// movq mm7, [esi] // Load the original source pixel.
db $0F,$6F,$C6 /// movq mm0, mm6 // Load the register for the red target.

pand mm0, i64MaskRed // Extract the red target channel.
db $0F,$6F,$CF /// movq mm1, mm7 // Load the register for the red source.

pand mm1, i64MaskRed // Extract the red source channel.
nop

psrlw mm0, i64RdShift // Shift down the red target channel.
db $0F,$6F,$D6 /// movq mm2, mm6 // Load the register for the green target.

psrlw mm1, i64RdShift // Shift down the red source channel.
db $0F,$6F,$DF /// movq mm3, mm7 // Load the register for the green source.

pand mm2, i64MaskGreen // Extract the green target channel.
db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate red source minus red target.

pmullw mm1, i64Alpha // Multiply the red result with alpha.
db $0F,$6F,$EF /// movq mm5, mm7 // Load the register for the blue source.

pand mm3, i64MaskGreen // Extract the green source channel.
db $0F,$6F,$E6 /// movq mm4, mm6 // Load the register for the blue target.

db $0F,$71,$E1,$08 /// psraw mm1, 8 // Divide the red result by 256.
nop

db $0F,$FD,$C8 /// paddw mm1, mm0 // Add the red target to the red result.
nop

psllw mm1, i64RdShift // Shift up the red source again.
db $0F,$EF,$C0 /// pxor mm0, mm0 // Create black as the color key.

psrlw mm2, i64GrShift // Shift down the green target channel.
db $0F,$75,$C7 /// pcmpeqw mm0, mm7 // Create a color key mask.

psrlw mm3, i64GrShift // Shift down the green source channel.
db $0F,$DB,$F0 /// pand mm6, mm0 // Keep old target where the color key applies.

pand mm4, i64MaskBlue // Extract the blue target channel.
db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate green source minus green target.

pmullw mm3, i64Alpha // Multiply the green result with alpha.
nop

psrlw mm4, i64BlShift // Shift down the blue source channel.
nop

pand mm5, i64MaskBlue // Extract the blue source channel.
db $0F,$71,$E3,$08 /// psraw mm3, 8 // Divide the green result by 256.

db $0F,$FD,$DA /// paddw mm3, mm2 // Add the green target to the green result.
nop

psllw mm3, i64GrShift // Shift up the green source again.
db $0F,$EB,$CB /// por mm1, mm3 // Combine the new red and green values.

psrlw mm5, i64BlShift // Divide the blue result by 256.
nop

db $0F,$F9,$EC /// psubw mm5, mm4 // Add the blue target to the blue result.
nop

pmullw mm5, i64Alpha // Multiply the blue result with alpha.
nop

nop
nop

db $0F,$71,$E5,$08 /// psraw mm5, 8 // Divide the blue result by 256.
nop

db $0F,$FD,$EC /// paddw mm5, mm4 // Add the blue target to the blue result.
nop

psllw mm5, i64BlShift // Shift up the blue source again.
nop

db $0F,$EB,$CD /// por mm1, mm5 // Combine new blue value with the others.
nop

db $0F,$DF,$C1 /// pandn mm0, mm1 // Keep new target where no color key applies.
nop

db $0F,$EB,$F0 /// por mm6, mm0 // Assemble new target value.
nop

db $0F,$7F,$37 /// movq [edi], mm6 // Write back new target value.

next16:
//
// Advance to the next two pixels.
//
add edi, 8
add esi, 8

//
// Loop again or break.
//
dec ecx
jnz do_blend16

//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop esi
pop edi
db $0F,$77 /// emms

skip16:
pop ecx
end;
//
// Alpha blend the pixels in the current row.
//
asm

// Reset the width counter.
push ecx
mov ecx, iWidth

//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource

// Load the mask into an mmx register.
movq mm3, i64Mask

// Load the alpha value into an mmx register.
movq mm5, i64Alpha

// Clear an mmx register to facilitate unpacking.
db $0F,$EF,$F6 /// pxor mm6, mm6

push eax

do_blend24:
//
// Skip this pixel if it is black.
//
mov eax, [esi]
test eax, 00ffffffh // Do not ''and'' so that the high order byte is kept.
jnz not_black24
jmp next24

not_black24:
//
// Get a target and a source pixel.
//

(* The mmx registers will basically be used in the following way:

mm0: target value
mm1: source value
mm2: working register
mm3: mask ( 0x00ffffff )
mm4: working register
mm5: alpha value
mm6: zero for unpacking
mm7: original target

Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)

db $0F,$6E,$07 /// movd mm0, [edi] // Load the target pixel.
db $0F,$6F,$E3 /// movq mm4, mm3 // Reload the mask ( 0x00ffffff ).

db $0F,$6E,$C8 /// movd mm1, eax // Load the source pixel.
db $0F,$6F,$F8 /// movq mm7, mm0 // Save the target pixel.

db $0F,$60,$C6 /// punpcklbw mm0, mm6 // Unpack the target pixel.
db $0F,$60,$CE /// punpcklbw mm1, mm6 // Unpack the source pixel.

db $0F,$6F,$D0 /// movq mm2, mm0 // Save the unpacked target values.
nop

db $0F,$D5,$C5 /// pmullw mm0, mm5 // Multiply the target with the alpha value.
nop

db $0F,$D5,$CD /// pmullw mm1, mm5 // Multiply the source with the alpha value.
nop

db $0F,$71,$D0,$08 /// psrlw mm0, 8 // Divide the target by 256.
nop

db $0F,$71,$D1,$08 /// psrlw mm1, 8 // Divide the source by 256.
nop

db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate the source minus target.
nop

db $0F,$FD,$D1 /// paddw mm2, mm1 // Add former target value to the result.
nop

db $0F,$67,$D2 /// packuswb mm2, mm2 // Pack the new target.
nop

db $0F,$DB,$D4 /// pand mm2, mm4 // Mask of unwanted bytes.
nop

db $0F,$DF,$E7 /// pandn mm4, mm7 // Get the high order byte we must keep.
nop

db $0F,$EB,$D4 /// por mm2, mm4 // Assemble the value to write back.
nop

db $0F,$7E,$17 /// movd [edi], mm2 // Write back the new value.

next24:
//
// Advance to the next pixel.
//
add edi, 3
add esi, 3

//
// Loop again or break.
//
dec ecx
jnz do_blend24

//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
db $0F,$77 /// emms

pop ecx

end;

//
// Alpha blend two pixels at once.
//
asm

//
// Initialize the counter and skip
// if the latter is equal to zero.
//
push ecx
mov ecx, i
cmp ecx, 0
jz skip32

//
// Load the frame buffer pointers into the registers.
//
push edi
push esi
mov edi, lpbTarget
mov esi, lpbSource

// Load the alpha value into an mmx register.
movq mm5, i64Alpha

push eax

do_blend32:
//
// Skip these two pixels if they are both black.
//
mov eax, [esi]
test eax, 00ffffffh
jnz not_black32
mov eax, [esi + 4]
test eax, 00ffffffh
jnz not_black32
jmp next32

not_black32:
//
// Alpha blend two target and two source pixels.
//

(* The mmx registers will basically be used in the following way:

mm0: target pixel one
mm1: source pixel one
mm2: target pixel two
mm3: source pixel two
mm4: working register
mm5: alpha value
mm6: original target
mm7: original source

Note: Two lines together are assumed to pair
in the processor´s U- and V-pipes. *)

db $0F,$6F,$37 /// movq mm6, [edi] // Load the target pixels.
db $0F,$EF,$E4 /// pxor mm4, mm4 // Clear mm4 so we can unpack easily.

db $0F,$6F,$3E /// movq mm7, [esi] // Load the source pixels.
db $0F,$6F,$C6 /// movq mm0, mm6 // Create copy one of the target.

db $0F,$6F,$D6 /// movq mm2, mm6 // Create copy two of the target.
db $0F,$60,$C4 /// punpcklbw mm0, mm4 // Unpack the first target copy.

db $0F,$6F,$CF /// movq mm1, mm7 // Create copy one of the source.
db $0F,$73,$D2,$20 /// psrlq mm2, 32 // Move the high order dword of target two.

db $0F,$60,$D4 /// punpcklbw mm2, mm4 // Unpack the second target copy.
db $0F,$6F,$DF /// movq mm3, mm7 // Create copy two of the source.

db $0F,$73,$D3,$20 /// psrlq mm3, 32 // Move the high order dword of source two.
db $0F,$60,$CC /// punpcklbw mm1, mm4 // Unpack the first source copy.

db $0F,$60,$DC /// punpcklbw mm3, mm4 // Unpack the second source copy.
db $0F,$72,$F7,$08 /// pslld mm7, 8 // Shift away original source highest bytes.

db $0F,$6F,$E0 /// movq mm4, mm0 // Save target one.
db $0F,$D5,$C5 /// pmullw mm0, mm5 // Multiply target one with alpha.

db $0F,$D5,$CD /// pmullw mm1, mm5 // Multiply source one with alpha.
db $0F,$72,$D7,$08 /// psrld mm7, 8 // Complete high order byte clearance.

db $0F,$71,$D1,$08 /// psrlw mm1, 8 // Divide source one by 256.
nop

db $0F,$71,$D0,$08 /// psrlw mm0, 8 // Divide target one by 256.
nop

db $0F,$F9,$C8 /// psubw mm1, mm0 // Calculate source one minus target one.
nop

db $0F,$FD,$CC /// paddw mm1, mm4 // Add the former target one to the result.
nop

db $0F,$6F,$E2 /// movq mm4, mm2 // Save target two.
db $0F,$D5,$D5 /// pmullw mm2, mm5 // Multiply target two with alpha.

db $0F,$D5,$DD /// pmullw mm3, mm5 // Multiply source two with alpha.
nop

db $0F,$71,$D2,$08 /// psrlw mm2, 8 // Divide target two by 256.
nop

db $0F,$71,$D3,$08 /// psrlw mm3, 8 // Divide source two by 256.
nop

db $0F,$F9,$DA /// psubw mm3, mm2 // Calculate source two minus source one.
nop

db $0F,$FD,$DC /// paddw mm3, mm4 // Add the former target two to the result.
db $0F,$EF,$E4 /// pxor mm4, mm4 // Clear mm4 so we can pack easily.

db $0F,$67,$CC /// packuswb mm1, mm4 // Pack the new target one.
db $0F,$67,$DC /// packuswb mm3, mm4 // Pack the new target two.

db $0F,$73,$F3,$20 /// psllq mm3, 32 // Shift up the new target two.
db $0F,$76,$E7 /// pcmpeqd mm4, mm7 // Create a color key mask.

db $0F,$EB,$CB /// por mm1, mm3 // Combine the new targets.
db $0F,$DB,$F4 /// pand mm6, mm4 // Keep old target where color key applies.

db $0F,$DF,$E1 /// pandn mm4, mm1 // Clear new target where color key applies.
nop

db $0F,$EB,$F4 /// por mm6, mm4 // Assemble the new target value.
nop

db $0F,$7F,$37 /// movq [edi], mm6 // Write back the new target value.

next32:
//
// Advance to the next pixel.
//
add edi, 8
add esi, 8

//
// Loop again or break.
//
dec ecx
jnz do_blend32

//
// Write back the frame buffer pointers and clean up.
//
mov lpbTarget, edi
mov lpbSource, esi
pop eax
pop esi
pop edi
db $0F,$77 /// emms

skip32:
pop ecx
end;
---------END OF CODE

Word out,
Alimonster


There are no stupid questions, but there are a lot of inquisitive idiots.
Looks like some of the MMX instructions weren't converted by the expert... oh well, at least you've got about half the work now . I think anything beginning with 'p' (e.g. 'pand') is an mmx instruction, and mm* are the MMX registers.

EDIT - except push and pop, of course! Also, I've noticed that some of the instructions *have* been converted in one place, but not elsewhere. It seems the expert can only deal with some instructions when their data are the MMX registers - for example, search the code above for "// pand" and you'll see it got converted sometimes.

Alimonster

There are no stupid questions, but there are a lot of inquisitive idiots.

Edited by - Alimonster on January 18, 2002 10:52:56 AM
Hi Alimonster,
Thanks for the conversion, I will see if I can get it to work.


Thanks again.


Dominique.

http://www.DelphiGamer.com := for all your Object Pascal game development needs;

Edited by - savage on January 18, 2002 11:51:52 AM
http://www.PascalGameDevelopment.com := go on, write a game instead;

This topic is closed to new replies.

Advertisement