Archived

This topic is now archived and is closed to further replies.

BltFast() vs Moving memory yourself

This topic is 6357 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I''ve created 2 little test programs. Test1 uses DX7 Bltfast() and Test2 just does a memcpy-type move. Each moves an 800x600x16 bitmap from a sysmem surface to the backbuffer then Flip()s with a nosync so that won''t disguise the blt speeds. I get a 90 fps speed increase in Test2 with a PIII/500 & GeForce II. I wonder what would happen on other cards. If anyone wants to give it a try and post your results and hardware tested I zipped the programs and a test bitmap up that can be d/l''d here: http://gameznet.com/golgotha/downloads/finaltest.zip

Share this post


Link to post
Share on other sites
Well I tried it on my PIII 533b w/ a Voodoo3 3000 (16Mb AGP) and got 123fps with test1.exe and 159fps with test2.exe. And this is with AGP support turned off (I got dual monitor debugging going, and it doesn''t seem to like AGP w/ my second card? Odd eh?) So I''m not sure the difference w/ AGP.

But heh, your using C++ memcopy routines? Cause I got mmx going, should I try a test with that too? Actually I think I will just to see if there''s a substancial increase in speed or not. E-mail me if you want a copy of this test program...
- Ben

Share this post


Link to post
Share on other sites
With my PII-450, TNT2 (AGP 2x enabled) running Windows 2000 Professional, I get:

Test1 - 189 fps
Test2 - 190 fps

Both frame rates would drop to 170ish if I moved the mouse (even though there wasn''t a visible pointer), making me wonder what was going on in the background.

Share this post


Link to post
Share on other sites
I don't know since all any mouse movement messages would fall to the winproc default (ie DefWindowProc). I guess you'd have to ask Microsoft on that. What kind of TNT2 card do you have?

Edited by - Hootie on July 13, 2000 4:07:36 AM

Edited by - WitchLord on July 14, 2000 9:08:31 AM

Share this post


Link to post
Share on other sites
Not surprising, really, that the numbers are coming out similar. Try putting the bitmap on a video memory surface, and test again. Test1 will be extremely quick, and test2 will be awfully slow.

The graphics hardware makes very little difference, since BltFast is doing pretty much the same as memcpy.

TheTwistedOne
http://www.angrycake.com

Share this post


Link to post
Share on other sites
I'm well aware of hardware accel in vidmem. This test is to see the difference between BltFast() and plain old memory moves when dealing with sysmem-based source surfaces. In many cases the mem move is quite a bit faster (25 - 90%). Especially with high-end video cards on faster machines.

Many would believe BltFast() (the supposedly highest optimized blit function of DDraw) would be equal or faster than a straightout memory move. The reality, in many cases, is that it's not.


>The graphics hardware makes very little difference, since BltFast is doing pretty much the same as memcpy.

On the contrary, the hardware makes a big difference. Here's the results of a test with a very fast PIII & Geforce 256:

Test1: 157 FPS
Test2: 285 FPS
PIII 800
128MB RAM
ASUS AGP V6800 Deluxe geForce 256 DDR 32MB
Win 98



Edited by - Hootie on July 13, 2000 6:37:02 AM

Share this post


Link to post
Share on other sites
You also must take into account the fact that BltFast uses hardware acceleration when supported (but what card doesnt support it nowadays?)

It wont make a difference in your test program I''m usre because it sounds like all you''re doing is blitting an image, so you can use the CPU. But in a real game yu might want that extra CPU power for AI and other shite so you would wantthe gfx card to take care of the blitting.

One more point to make, BltFast also does transparency whereas memcpy does not. I''ve actually tried all these tests myself, I dont remember the results but I know I ended up using DX''s blitting routines.


------------------------------
fclose(fp)
------------------------------

Share this post


Link to post
Share on other sites
My TNT2 is a Leadtek Winfast S320 II or something equally unmemorable. It has 32 Mb of RAM (hence I tend to assume that everything will be in video memory, in which blitfast is really hard to beat). I upped the clock speed on it a bit, but its still nowhere near the speed of a TNT2 Ultra. Its fine for most of what I do.

Share this post


Link to post
Share on other sites
You know, you do have to use sysmem for a bunch of stuff sometimes (I know, I do, i have a 16mb card, goes fast at 1024x768x16bpp), and sometimes it''s life or death whether you do your full-screen blits really really fast or not. And on my measly P-233mmx it was over a 10% difference. PS Could I see the code for that? Compare to my routine...

Share this post


Link to post
Share on other sites
Well on my p3-600 w/ 128MB ram and a GeForce 256 DDR i get on test 1, 130fps, on test2 i get 218.

I NEED TO SAVE THE CHICKENS!!!!!!!!!!!!!!!!!!!!!!!!

Share this post


Link to post
Share on other sites
Bracket: Cool! I asked because you're getting some pretty kickin TNT2 scores outta that little card.


ByteMe95: Yep, memcpy() doesn't do any transparency but I've tried using DDBLTFAST_NOCOLORKEY and it doesn't boost the BltFast() speed at all on my machines. It should but it doesn't.




Edited by - WitchLord on July 14, 2000 8:59:39 AM

Share this post


Link to post
Share on other sites
test1.exe --> FPS = 40
test2.exe --> FPS = 100

It seems like copying memory in software is faster than blitting in hardware on my machine. That's because my video card sucks big time. I have a NVIDIA, which supports DirectDraw 1.0.

Edited by - Gladiator on July 13, 2000 8:07:46 PM

Share this post


Link to post
Share on other sites
Yikes! You know I ran some tests on my system (For my own personal benifet not to compare to yours) just to test out the different blitting methods, a fullscreen 640x480 hardware blit from vidmem to vidmem is 42/1193 of a millisecond! That''s right of a MILLISECOND! I''m using the QueryPerformanceTimer api''s for timing, but wow I didn''t think hardware was *THAT* much better! Wow that sure motivates you to use in vidmem stuff more..

But I get what you wanting to do... I want the same, to have a sysmem backbuffer so you can access all the pixels individually like for alpha blending or storing many mb of graphics in sysram for an rts or something.

Just an interesting fact that relates to sys to vid blitting...
- Ben

Share this post


Link to post
Share on other sites
Yah, I''d also be interested in taking a peek at the source code. (Someone else was also interested above) You can e-mail me or post the relevant code (Don''t post the DD init and shutdown of course! ;-)
Thanks,
Ben

Share this post


Link to post
Share on other sites
Test 1 scored 25 fps and test 2 scored about 100 fps. On a PII-333 with a Riva 128 card.

"Paranoia is the belief in a hidden order behind the visible." - Anonymous

Share this post


Link to post
Share on other sites
Gladiator: Yes I only program in Assebmler, but any good programmer can make sense of other programming languages. I could never right a C or C++ function worth showing you, but I could read 90% of the ones I see and create an assembler version from the theory behind it. Can you not understand the just of other languages other than C? Usually there pretty straight forward.
- Ben

P.S. Hootie I''m still interested in the source!

Share this post


Link to post
Share on other sites
TEST 1 = 193 fps
TEST 2 = 221 fps

Athlon 650MHZ
VisionTek GeForce 265 (Not DDR) AGP 4x
128Meg of 133Mhz ECC-SDRAM
Win98b

(I was running WinAmp, Yahoo Messanger, and SB Live! AudioHQ)

*Note 133mhz memory helps a bunch


Edited by - WitchLord on July 14, 2000 9:01:52 AM

Share this post


Link to post
Share on other sites
For those of you interested in the code, my plan was to write an article for GameDev on programming high performance DDraw once I got a variety of speed figures to present with it.

I'm here to help people and share my findings.

Thanks.

Edited by - WitchLord on July 14, 2000 9:05:15 AM

Share this post


Link to post
Share on other sites
Well, this is news to me. I was under the impression that BltFast from system->video memory would be quicker than memcpy-type affairs.

I shall have to bear this in mind for non-transparent blts (mental note to self; write expanded test cases).

Thanks very much.

TheTwistedOne
http://www.angrycake.com

Edited by - TheTwistedOne on July 14, 2000 8:43:29 AM

Share this post


Link to post
Share on other sites
At the request of Hootie I have cleaned up this thread quite a bit. Try not to mess it up again.

For those of you who don''t know what I''m talking about, just ignore this message as it doesn''t have anything to do with you

- WitchLord

Share this post


Link to post
Share on other sites
WitchLord: Thank you!


cyberben: Yes, vidmem to vidmem is a zillion times faster and you should always use that if the videocard has the memory and you have static (unchanging) graphics.

What led me to test all this stuff out is that my engine creates each frame dynamically in system memory then just blts the whole thing to the backbuffer and does a flip. In my case, every pixel on the frame could change from one frame to the next. Impossible to use videomem because the graphics are not composed of a series of static images or sub-images.

Share this post


Link to post
Share on other sites
zedzeek: memcpy() does 32-bit moves on win32 platforms. Look at the asm listing when compiling it to see for yourself. Never assume anything.

Share this post


Link to post
Share on other sites