Advertisement Jump to content


This topic is now archived and is closed to further replies.

Jack C.

ASM blitting with color key question

This topic is 6786 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey all, I''ve been thinking about implementing my own blit routines in assembly. Since I''m using 8-bit color, I figure that the fastest way to perform the blits would be to pack every four bytes into a DWORD and then write that to the surface. But what should I do about source color keying? Just skipping the bits when I pack could would throw off the alignment, and the added code to set things straight (ie, taking care of dangling bytes or packing bytes together that aren''t next to each other in the bitmap, and then figuring out a way to split them up later and put them in their proper place) might outweigh the benefits. Sorry for taking this question here, but I''ve yet to find any asm blit source code that takes this into account (most of what I''ve seen are like 32-bit versions of memcpy). Am I stuck using two for loops that check for transparency and go pixel by pixel (yuck)? Thanks

Share this post

Link to post
Share on other sites
Try using the mmx instructions. Then you can copy 8 bytes at a time. When you first create your image create a buffer of equal size, for your mask. Then compare every byte in your image with the colorkey (use PCMPEQB). If it is the same set that byte of your mask to 0xFF, else 0x00. Also if the byte is equal set it to 0x00 in your image.

Then when you are bltting use PAND to and your dest by the mask. Use POR to or that with the source, and then store it back in the destination.

Even though there are 4 memory accesses it should be very fast since it works on 8 pixels at a time without if statements.

Share this post

Link to post
Share on other sites
Ok, here are some examples (all works in 16bit mode):

x86 code:

mov edi,dest
mov esi,source1
mov ecx,amount
mov ebx,source2
lea esi,[esi+ecx*2]
lea edi,[edi+ecx*2]
lea ebx,[ebx+ecx*2]
shr ecx,1
xor ecx,-1
xor eax,eax
inc ecx
xor edx,edx
push ebp
; little unoptimized, but faster than compare/jump
mov ax,[esi+ecx*2]
mov dx,[ebx+ecx*2]
mov ebp,eax
cmp eax,1
sbb eax,eax
xor edx,ebp
and eax,edx
xor eax,ebp
mov [edi+ecx*2],ax
inc ecx
jnz @looper

pop ebp

MMX code (can be enahanced by proceeding more data in free registers at the same time)

movd mm0, [esi]
pxor mm7, mm7

movd mm1, [edi]
pcmpeqd mm7, mm0

add esi,4
pand mm7, mm1

dec ecx
por mm7, mm0

movd [edi], mm7

lea edi,[edi+4]
jnz @looper


Share this post

Link to post
Share on other sites

  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!