Optimized bliting and put_pixel routines
Any more idees? and how would i write a good blit?
Just by using put_pixel or copying parts of memory? (and how would i do that in DJGPP?)
Thanks,
-Mezz
1) make it inline. as it is now, most of the function's processing time is spent calling and returning from it.
2) shouldn't color be an unsigned char?
Michael Abrash wrote a good book on optimizing functions like this. IIRC it's called "The Zen of Optimization" or "The Black Art of Optimization" or something.
Mason McCuskey
Spin Studios
www.spin-studios.com
How would i make inline asm? i've done some .asm before but i dont know how to make inline. and how would i make the same putpix routine?
And about bliting, how would i do a good, effective blit?
/Jonatan Hedborg
-normal sprite drawing (/w putpix)
-RLE sprites
-Compiled sprites
RLE sprites: If a black (transparant) color is detected the byte (or word) after it indicates the number of black pixels there will be (RLE=run length encoded).
Compiled sprites is a piece of asm code that you can save. You can do some really neat stuff with these. But I haven't tried this method yet.
There are lots of tutorials on the web (but from experience I know that searching using a search engine will not point you in the right direction).
------------------
-Justin
1. Using RLE sprites and software bliting
2. Using hardware bliting (in Direct X)
Here's the basics for a fast blit in mode 13.
Basic blit, if you want to blit your image to x, y. You copy each line from memory to the x position and the appropriate y position. So naively:
for (i = 0; i < height; i++)
for (j = 0; j < width; j++)
vidmemory[x + j + (y + i) * 320] = image[j + i * width];
Now Intel processors in 32-bit mode copy write data to memory more effectively when writing 32-bit data to properly aligned memory locations, i.e. long ints. Next most effective is 16-bit data then 8-bit data.
So copy the data in long int size chunks from image to vidmemory. If the beginning of your line doesn't align properly copy smaller data until it is aligned. (if x % 4 = 0 just copy long ints, if x % 4 = 1 copy a char then a short int, if x % 4 = 2 copy a short int, if x % 4 = 3 copy a char.) You may not be copying a number of bytes that are divisible by four either, so you need to do a similar comparison for the end of the line.
Also transfers from the image work best if they happen in properly aligned chunks as well. So in general you need to pad your image memory format at the end of every line to bring it up to a number of characters divisible by 4. Which is exactly what the bitmap file format specifies. So if you store your images as bitmaps, you can read the data right in.
I believe the memcpy implementation does something similar, but you shouldn't use it because then you have the overhead of function calls and jumps to non-local memory in a function that you want to be really fast.
Now that it's copying the image fast we've got to handle the case where it needs to be clipped against the screen. Clipping in the y-direction is pretty easy, just don't draw the lines that fall off-screen. And clipping in the x-direction has the advantage that you always will clip against a properly aligned memory boundary (at least for vidmemory).
Also you shouldn't recalculate the array indices every cycle through the loop. At the beginning of every line, calculate a pointer to the beginning of the video memory you're using (vidmemory + x + (y + i) * 320), and increment that in your inner loop. i.e. ptr += 4. Similarly for your image pointer, increment that through your loop. You shouldn't have to recalculate it for every new line.
This can all be done in straight C/C++ without resorting to an inline-assembler provided that your compiler is at least half-way intelligent, and DJGPP is.
Still for peace of mind, I implement this stuff as inline assembly.
Good luck, and remember: if when you're finished the code doesn't look as ugly as heck, then you didn't do it right.