Jack_1313.
Put everything in System ram, absolutely everything. I was down the same road a while back.
I thought that If I use HW blits on vram, the copy the back buffer to system ram, I could do faster alpha blending. This is not the case. It will be so much faster to do everything in SoftWare, in System Ram, and then copy everything to a Vram surface.
Either skip alpha blending, or skip Video Ram. There is nothing in between.
A whole 1/2 frames per second!
Well summarized Anyways this seems to be one of those things that everyone just figures out themselves... I wonder why documentation is so sparse on this particular one... but in the SDK and books?
I wrote a VERY optimised alpha blending function in MMX asm, using the on site tutorial(s) as guide. And by doing some safe assumptions, it got pretty fast.
But I thought that I'd still do all the non translucent sprites, and all tiles in Vram, with surface->blt so it would be blazing fast with hardware acceleration. And then I wrote a 64 bit mem copy function, that had the screen size hardcoded, and loops unrolled to match, just to copy the vram back buffer to sram. And I dare say you'd be hard pressed to write a faster memcopy for that specific purpose.
But it still wasn't as fast as I had expected, so I tried moving everything to Sram, still using DirectDraw surface->blt, and it was way faster.
Best way is to do everything in system memory, and then copy everything TO a video ram surface and flip.
I thought I'd post this just to back up my rather short previous post.
[/edit]
Just thought I'd add so,ething.
This solves the slow lock problem too, IF you write your own blitters, you will always lock the surfaces at the start of the rendering, keep it locked throuhg all the blitting, and then unlock it before you copy it to video ram.
[/edit]
[edited by - Bad Maniac on July 4, 2003 11:28:09 PM]
But I thought that I'd still do all the non translucent sprites, and all tiles in Vram, with surface->blt so it would be blazing fast with hardware acceleration. And then I wrote a 64 bit mem copy function, that had the screen size hardcoded, and loops unrolled to match, just to copy the vram back buffer to sram. And I dare say you'd be hard pressed to write a faster memcopy for that specific purpose.
But it still wasn't as fast as I had expected, so I tried moving everything to Sram, still using DirectDraw surface->blt, and it was way faster.
Best way is to do everything in system memory, and then copy everything TO a video ram surface and flip.
I thought I'd post this just to back up my rather short previous post.
[/edit]
Just thought I'd add so,ething.
This solves the slow lock problem too, IF you write your own blitters, you will always lock the surfaces at the start of the rendering, keep it locked throuhg all the blitting, and then unlock it before you copy it to video ram.
[/edit]
[edited by - Bad Maniac on July 4, 2003 11:28:09 PM]
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement