Archived

This topic is now archived and is closed to further replies.

Optimized bliting and put_pixel routines

This topic is 6194 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Thanx! (why didnt i think of that?)

Any more idees? and how would i write a good blit?
Just by using put_pixel or copying parts of memory? (and how would i do that in DJGPP?)

Share this post


Link to post
Share on other sites
I have programmed also in DJGPP. So I know what you're talking about. When using memcpy it is faster but you'll have to write a clipper for that, I recommend this if you aren't putting the images over the screen edges. Too bad programming assembler in DJGPP stinks. If you aren't familiar with assembler you might get the hang the weird syntax. Tell me if you don't know about clipping.

Share this post


Link to post
Share on other sites
I don't know about clipping - and this would seem like a good time, if you could give me an explanation - I would much appreciate it.

Thanks,

-Mezz

Share this post


Link to post
Share on other sites
A couple suggestions for your optimization of putpix:

1) make it inline. as it is now, most of the function's processing time is spent calling and returning from it.
2) shouldn't color be an unsigned char?

Michael Abrash wrote a good book on optimizing functions like this. IIRC it's called "The Zen of Optimization" or "The Black Art of Optimization" or something.

Mason McCuskey
Spin Studios
www.spin-studios.com

Share this post


Link to post
Share on other sites
Thanx for all the great suggestions!

How would i make inline asm? i've done some .asm before but i dont know how to make inline. and how would i make the same putpix routine?

And about bliting, how would i do a good, effective blit?


/Jonatan Hedborg

Share this post


Link to post
Share on other sites
In 256 color mode you should use an unsigned char. You have 3 methods for sprite drawing (maybe more):
-normal sprite drawing (/w putpix)
-RLE sprites
-Compiled sprites

RLE sprites: If a black (transparant) color is detected the byte (or word) after it indicates the number of black pixels there will be (RLE=run length encoded).

Compiled sprites is a piece of asm code that you can save. You can do some really neat stuff with these. But I haven't tried this method yet.

There are lots of tutorials on the web (but from experience I know that searching using a search engine will not point you in the right direction).

Share this post


Link to post
Share on other sites
do a search for NASM. that's the Netwide Assembler and it have almost identical syntax to Intel's. plus it creates .obj files that can be used with DJGPP

------------------
-Justin

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
What method of doing bliting is faster?

1. Using RLE sprites and software bliting
2. Using hardware bliting (in Direct X)

Share this post


Link to post
Share on other sites
There's an old book, something like "The Black art of 3d Game Programming" back in the days of DOS. It covers mode 13 graphics pretty thoroughly, including optimizing blits, etc. with assembly.

Here's the basics for a fast blit in mode 13.

Basic blit, if you want to blit your image to x, y. You copy each line from memory to the x position and the appropriate y position. So naively:

for (i = 0; i < height; i++)
for (j = 0; j < width; j++)
vidmemory[x + j + (y + i) * 320] = image[j + i * width];

Now Intel processors in 32-bit mode copy write data to memory more effectively when writing 32-bit data to properly aligned memory locations, i.e. long ints. Next most effective is 16-bit data then 8-bit data.

So copy the data in long int size chunks from image to vidmemory. If the beginning of your line doesn't align properly copy smaller data until it is aligned. (if x % 4 = 0 just copy long ints, if x % 4 = 1 copy a char then a short int, if x % 4 = 2 copy a short int, if x % 4 = 3 copy a char.) You may not be copying a number of bytes that are divisible by four either, so you need to do a similar comparison for the end of the line.

Also transfers from the image work best if they happen in properly aligned chunks as well. So in general you need to pad your image memory format at the end of every line to bring it up to a number of characters divisible by 4. Which is exactly what the bitmap file format specifies. So if you store your images as bitmaps, you can read the data right in.

I believe the memcpy implementation does something similar, but you shouldn't use it because then you have the overhead of function calls and jumps to non-local memory in a function that you want to be really fast.

Now that it's copying the image fast we've got to handle the case where it needs to be clipped against the screen. Clipping in the y-direction is pretty easy, just don't draw the lines that fall off-screen. And clipping in the x-direction has the advantage that you always will clip against a properly aligned memory boundary (at least for vidmemory).

Also you shouldn't recalculate the array indices every cycle through the loop. At the beginning of every line, calculate a pointer to the beginning of the video memory you're using (vidmemory + x + (y + i) * 320), and increment that in your inner loop. i.e. ptr += 4. Similarly for your image pointer, increment that through your loop. You shouldn't have to recalculate it for every new line.

This can all be done in straight C/C++ without resorting to an inline-assembler provided that your compiler is at least half-way intelligent, and DJGPP is.

Still for peace of mind, I implement this stuff as inline assembly.

Good luck, and remember: if when you're finished the code doesn't look as ugly as heck, then you didn't do it right.

Share this post


Link to post
Share on other sites

To Stan :

Hardware blit wins pretty much hands down.

I'm currently doing a slightly modified RLE scheme with software drawing, and its decently fast, which is why I haven't bothered with hardware acceleration yet. But remember, the beauty of RLE is that it saves in overhead and blitting of non-visible pixels. But, the raw bandwidth that you can move pixels to the video card with isn't changed.

------------------
- Remnant
- (Steve Schmitt)

Share this post


Link to post
Share on other sites
Well, you can optimize that even more. If you use a good optimizing compiler (DJGPP _MIGHT_ do this, not sure), you can just say buffer[y * 320 + x] = c, and it will optimize the multiplication very nicely. If not, here is some good assembly:

mov edi,[y]
shl edi,6
lea edi,[edi*4+edi]
add edi,[buffer]
add edi,[x]
mov al,[c]
mov [edi],al

As for the blit, hardware acceleration is the way to go. If you can't do that however, compiled sprites. This is where you actually generate like a function that specifically draws a particular sprite, pixel by pixel. This avoids any jumping, colorkey testing, etc...only limitation is memory and the fact that it can be a pain in the ass. If that is a problem too, use RLE.

Share this post


Link to post
Share on other sites
How would i do a realy fast mode13h bliting routine?
And is this put_pixel optimized? can i be done faster (can be c/c++ or inline asm)

void putpix(int x, int y, int color)
{
if(x < 320 && x > -1 && y < 200 && y > -1)
{
vgamemory[(y<<8)+(y<<6)+x] = color;
}
}

vgamemory is a pointer to VGA-memory (duh)

Share this post


Link to post
Share on other sites
Compiled sprites don't really work well except on 386's. They wreak havoc with the cache. Plus they can't be clipped. I think RLE sprites are much better. Better still would be an MMX sprite blitter though.

Rock

Share this post


Link to post
Share on other sites
Thers actually one more blitting scheme to consider, it has predictable performance is easy to implmement and well can easily be unrolled.

The drawback is that it makes all your sprites twice as big :/

the idea is this, save every pixel as a "alpha","color" pair and when blitting you and the destination with alpha and or it with color. So basicly a you set alpha to zero for non transparent pixels and you set it to 0xFF for 8bpp 0xFFFF for 16bpp etc. and for a transparent pixel the color should be zero. easy huh?

Share this post


Link to post
Share on other sites
quote:
Original post by HIND-D

How would i do a realy fast mode13h bliting routine?




How about doing a table look-up for y-coordinates offsets?






[home page] [e-mail]

---
"Lifting shadows off a dream once broken
She can turn a drop of water into an ocean"

Share this post


Link to post
Share on other sites
Many years ago I wrote a mode 13 library that uses a combination of C and assembler. It''s hardly something I''m proud of, but it may have some useful tidbits. It''s fully commented and totally free to use (no strings attached). It includes assembly language blitting routines, plus a few things like a GIF loader, brezenham line routine, FLI player, etc. Nothing special, but somebody did write an article about it for C Users Journal at one point.

ftp://ftp.islandnet.com/Mark_Morley/vgl20.zip

and

ftp://ftp.islandnet.com/Mark_Morley/vgl20fli.zip for the sample FLI animation.

Toom

Share this post


Link to post
Share on other sites