This topic is now archived and is closed to further replies.


DDraw blitter; suggestions?

Recommended Posts

I've written my own blitter that uses DirectDraw as a means of creating surfaces and the likes. Essentially, what it does is create a dynamic array in which it'll store the size of a sprite and the actual sprite data (32 bit RGB pixel data). The actual blitter works fine, though my benchmark program rated it as being slower than BltFast is. Since I'm not including all the bells and whistles BltFast uses (such as RECTs for the parameters), is it possible to make my blitter faster than the BltFast function is? Here's my code...
void DrawSprite(int *pSurface, long Pitch, int *pSprite, int XPos, int YPos)
	int  MyPos = 3;

	Pitch     = Pitch >> 2;
	int *MySurface = pSurface + (Pitch * YPos) + XPos;

	for(register int j = 0; j < pSprite[2]; j++)
		for(register int i = 0; i < pSprite[1]; i++)
			if((pSprite[MyPos] & 0xFFFFFF) != 0xFF00FF) memcpy(MySurface, &pSprite[MyPos], 4);
			MyPos    ++;
		MySurface += Pitch - pSprite[1];
It takes a pointer to surface its drawing to as well as the pitch (so I can simply retreive those once during my game's screen redraw event instead of every time I write a sprite to the backbuffer), a pointer to the array in which the sprite is stored (elements 1 and 2 are the X and Y size, respectively, and the rest are 32bit integer RGB values) and the X and Y position to blit to. Also, would there be any ways to reduce the number of parameters without sacrificing speed? I know I could easily pass a LPDIRECTDRAWSURFACE7 and lock the surface in the function but that would be horribly slow for multiple sprites. Comments would be appreciated. I'm pretty sure there's a bunch of ways to speed this up. For instance, the test for transparency munches speed a LOT; if I remove it I get speeds up to 40-45% faster than BltFast. Simply memcpy'ing row by row makes it blazing fast, but I can't do that since it won't test for transparency. For the record, 0xFF00FF is the value all my sprites use for transparency. Thanks! Edit: Could I do this in inline assembly (well, duh, the compiler converts this to asm anyways at runtime )? If so, how would I go about doing it? I don't have much experience with assembly language though I'm willing to learn. Will it make any difference, in terms of speed? [edited by - RuneLancer on November 1, 2002 11:56:27 PM]

Share this post

Link to post
Share on other sites
Guest Anonymous Poster
What do you mean by "bells and whistles"? BltFast is hardly different from Blt in its use and in my opinion easier as you don''t need to setup a clipper.
You will never get Blt to be faster than BltFast, unless you have a hardware accelerated videocard, which will give them equal performance (both functions will use HW/A when present).

Share this post

Link to post
Share on other sites
Hmm, while input is appreciated, you went beyond the point, oh anonymous one. ^^;

See, I''m writing my own blitter, not trying to make Blt faster than BltFast. I''ve already gotten my own blitter to perform considerably faster than BltFast if I don''t check for transparency but if I throw in the test for it the way I do, the only way I can perform this is by checking pixel by pixel. Whis is about 40-60% slower than BltFast is.

The reason I''m trying to create my own blitter is to be able to add transparency (ie, the 25%, 50%, etc.. transparencies, not the color-keying type) and tinting (say, making a character glow green when poisoned in an RPG, for instance), maybe some other effects too, such as rotations and skewing, and whatnot. While BltFast is a wonderful way to use 2D in a game, it falls short of some of the things I''d like to do.

So, come to think of it, basically what I''m looking for right now is a way to optimize that transparency checking bit there. With it out of the way I can get some pretty nifty speeds, but other than to draw tiles it would have no use in that case..

Share this post

Link to post
Share on other sites
You might try replacing the memcpy with just a straight "set" of the appropriate *MySurface int. (which I would make unsigned btw, along with pSprite.) E.g.,

if((pSprite...) *MySurface = ...;

I''ve done GDI stuff like this. All of my code always used while loops instead of fors. I don''t know if this would be faster or not in terms of what the compiler would do with it.

Re: inline asm, yes that "could" speed things up. I did a DIB paste function that way and it did speed things up. But for the project I was working on it wasn''t sufficiently faster to warrant doing the rest of the functions in asm, so I just did the one. But, maybe it could serve as an example? Here it is:

void DibRectPaste( HDIB pDstDib, HDIB pSrcDib, int x, int y )
// Clip Rect
int ipx = (x >= 0) ? x : 0;
int ipy = (y >= 0) ? y : 0;
int idx = ((x + pSrcDib-> < pDstDib-> ? pSrcDib-> : pDstDib-> - x;
int idy = ((y + pSrcDib-> < pDstDib-> ? pSrcDib-> : pDstDib-> - y;

idx = (x >= 0) ? idx : idx + x;
idy = (y >= 0) ? idy : idy + y;

// Return if nothing to do
if( (idx <= 0) || (idy <= 0) ) return;

// Prepare buffer addresses
COLORREF *src = pSrcDib->m_pBits + ((ipy - y)*pSrcDib-> + ipx - x;
COLORREF *dst = pDstDib->m_pBits + (ipy*pDstDib-> + ipx;

int iws = pSrcDib->;
int iwd = pDstDib->;

#ifndef _WINDOWS
while( idy-- ) {
for( int i = 0; i < idx; i++ ) {
dst = src[i];
src += iws;
dst += iwd;
__asm {
cld ; upward direction
mov ecx, idy ; pre-load # scan lines
mov ebx, iws ; pre-load source scan line width
shl ebx, 2 ; source scan line width in bytes
mov edx, iwd ; pre-load destination scan line width
shl edx, 2 ; destination scan line width in bytes
mov esi, src ; pre-load source address
mov edi, dst ; pre-load destination address
yloop: push ecx ; save # scan lines left to process
push esi ; save current source address
push edi ; save current destination address
mov ecx, idx ; # words / scan line
rep movsd ; move ''em from source to destination
pop edi ; restore current destination address
pop esi ; restore current source address
pop ecx ; restore # scan lines left to process
add esi, ebx ; point to next source scan line
add edi, edx ; point to next destination scan line
loop yloop ; loop until all scan lines processed

Share this post

Link to post
Share on other sites
Errmm.... You''re kind of missing the point.

BltFast() is performed in hardware on the graphics card as AP said. So the card can be blitting and the CPU could be doing something completely different, so in effect the cost of the Blt() to the CPU is like practically nothing.

The other point is, reading stuff from the graphics card and placing it in a register is painfully slow, the graphics card doesn''t need to do this. This would explain the dramatic slow-down when you introduce transparency.

From what you say, I think the best thing for you to do would be to write it using the 3D libaries, you can then perform partial transparnecy in hardware.


Share this post

Link to post
Share on other sites
Just being a*al retentive here, but there is no such thing as "partial transparency"
That would be translucency, and there are plenty of articles here on Gamedev on Alpha blending in software, so read up on them.
I''ve just today finished my alpha blend blitter, and I can blit a translucent bitmap over the background (both 640*480 16-bit) and still maintain 200+ fps all in software. so as long as you figure out how to do it. it is fully possible to do it in 2D software. but if you want the fast easy way, then do it all in D3D as previously stated.

Ooh, Yay. ATI made an ad that made the best and most informative forum on the planet totally horrible to browse, and it even crasches IE ever three minutes!

So should I buy a Radeon? naaah!

Share this post

Link to post
Share on other sites
Allright, I'd like to clarify something about my question.

I was asking for a solution and not a lecture. I'm aware that reading from the graphic memory and the likes is slow since I'm not using graphic acceleration with my blitter, unlike Blt() and BltFast(), but that doesn't matter. That's besides the point.

I'm writing my own blitter as a personal challenge. Do quines (programs that produce their source code as an ouput; generally very hard to read and even harder, if not next to impossible, to incorporate in a real program that does something) produce any useful result? Not quite, you could simply, as the developper, go peek at your own source, give youself a pat on the back for being a clever lil' bastard, and go on with your life.

Thanks for all the help to those who've provided some assistance. Turns out I found a better way to rid myself of the transparency check by pre-calculating the mask and only writing the necessary pixels to screen instead of skipping the unecessary onces. I've actually hit speeds faster than BltFast() in most cases (and in one instance while working in graphic memory, there was a slight, micro-precise increase in speed).

This is all meant as a personal challenge. Blt() and BltFast() don't even exist to me right now, other than as means of comparason. My intention is to better myself, not to produce something that would revolutionize the world.

Come to think of it, the DirectX forum probably wasn't the best one to post this in, but since it was related to DDraw and some of its functions... Anyhow, I got what I wanted to work, so thanks. ^^

Edit: typos

[edited by - RuneLancer on November 3, 2002 4:12:38 PM]

Share this post

Link to post
Share on other sites