okay to assume lPitch?
I know what the docs say, you are supposed to get lPitch every time you lock a surface and never assume you know what it is.
But maybe is it okay to assume lPitch in certain circumstances? I am planning to use a backbuffer in system ram (not part of flipping chain). In this case, I see no reason why lPitch wouldn't always equal the width I specify * bytes per pixel.
Also, because I know the surface is in sytem ram, I don't have to worry about old banked cards and vflatd.386.
I should be able to write some screaming ASM routines that take advantage of a few assumptions.
Does anyone see any reason why this wouldn't work?
Nathan.
nathany.com
Edited by - nathany on 5/7/00 5:24:23 PM
No i dont see why not....
but take care the docs dont speak in vain
i tested it an sometimes lPitch is really diffrent from normal (but i dont remember if it was in video or sysmem)
u can test it (a lot ) using DDRAWTest program in sdk (ddtest.exe)
But even better why use surfaces after all....just use a dx by dy by bytes_per_pixel matrix in system ram for each surface ... do your job.....then do a single lock to backbuffer and move the final rezult to video ram....anyway u can do it faster in asm esp if u assume fixed sizes and change mul''s with shl''s and add''s and just increment addres in THE INNER LOOP ...
Hope u will succed ... because i am going to do the same....
when i optimize my game..
but take care the docs dont speak in vain
i tested it an sometimes lPitch is really diffrent from normal (but i dont remember if it was in video or sysmem)
u can test it (a lot ) using DDRAWTest program in sdk (ddtest.exe)
But even better why use surfaces after all....just use a dx by dy by bytes_per_pixel matrix in system ram for each surface ... do your job.....then do a single lock to backbuffer and move the final rezult to video ram....anyway u can do it faster in asm esp if u assume fixed sizes and change mul''s with shl''s and add''s and just increment addres in THE INNER LOOP ...
Hope u will succed ... because i am going to do the same....
when i optimize my game..
hi bogdanontanu,
probably in video ram, in that case I can see why it could differ. i think what i''ll do is test the lPitch after creating the surface and abort if it isn''t what i expect... but that should never happen.
Well, I was thinking that all sprites and stuff would be like you say... not surfaces at all. But then I am planning to have a surface in system ram that is my back buffer (not in flipping chain though). Every frame i''ll lock it, do a bunch of ASM bltting, and unlock it. Then Blt from this surface to the primary surface on vsync (i still have to look into Blt on vsync stuff to make sure this will work).
The Blt from system to video memory might get a little boost from Bltting hardware. And if running in a window, I could do a scaling Blt if the hardware is there (resizeable window).
yup. the only pain in the butt is that then I need to write different functions for each resolution (or rather, width of the backbuffer... in windowed mode the backbuffer is the same width regardless of the actual screen resolution). but that''s okay.
Good luck with your game,
Nathan.
nathany.com
quote:Original post by bogdanontanu
i tested it an sometimes lPitch is really diffrent from normal (but i dont remember if it was in video or sysmem)
u can test it (a lot ) using DDRAWTest program in sdk (ddtest.exe)
probably in video ram, in that case I can see why it could differ. i think what i''ll do is test the lPitch after creating the surface and abort if it isn''t what i expect... but that should never happen.
quote:But even better why use surfaces after all....just use a dx by dy by bytes_per_pixel matrix in system ram for each surface ... do your job.....then do a single lock to backbuffer and move the final rezult to video ram....
Well, I was thinking that all sprites and stuff would be like you say... not surfaces at all. But then I am planning to have a surface in system ram that is my back buffer (not in flipping chain though). Every frame i''ll lock it, do a bunch of ASM bltting, and unlock it. Then Blt from this surface to the primary surface on vsync (i still have to look into Blt on vsync stuff to make sure this will work).
The Blt from system to video memory might get a little boost from Bltting hardware. And if running in a window, I could do a scaling Blt if the hardware is there (resizeable window).
quote:anyway u can do it faster in asm esp if u assume fixed sizes and change mul''s with shl''s and add''s and just increment addres in THE INNER LOOP ...
yup. the only pain in the butt is that then I need to write different functions for each resolution (or rather, width of the backbuffer... in windowed mode the backbuffer is the same width regardless of the actual screen resolution). but that''s okay.
quote:Hope u will succed ... because i am going to do the same.... when i optimize my game..
Good luck with your game,
Nathan.
nathany.com
Take care nathany:
I have tested my game on several PC''s now an i have found (yet dont know why) an AMD K6-2/350 with an S3 Savage with 8MbVRAM on witch my game was running at 50fps with lots of surfaces in videoram.....
but when i changed all surfaces (except for flip chain )into system ram fps drooped incredibly to <10fps
almost same as a P150 (no MMX) with a 4MbVRAM no 3D system (my lowest target ever... )
on the other hand it only dropped to aprox 30fps on a Pentium 2 /400 system (my real target)
So a was able to speed it up by doing only system to system blt an ONLY ONE LAST big system to video blt
Take care that many system to video blts do runn SLOWER than system to system ... and i still wonder WHY?
My friend''s AMD K6/Savage system runns other games ok...and he is still wondering what the hell is my game doing.... if i only knew
testing with performance counters shows an 80.000 thicks for the system to video 800x600 simple blt...oh my God ... this makes me think to replace even this blt with my asm
(and his CAPS shows accelerated system to video blt!!!)
and of course Starcraft runns perfectlly on his system
Test it on many systems because you can have big surprizes...
Best luck
I have tested my game on several PC''s now an i have found (yet dont know why) an AMD K6-2/350 with an S3 Savage with 8MbVRAM on witch my game was running at 50fps with lots of surfaces in videoram.....
but when i changed all surfaces (except for flip chain )into system ram fps drooped incredibly to <10fps
almost same as a P150 (no MMX) with a 4MbVRAM no 3D system (my lowest target ever... )
on the other hand it only dropped to aprox 30fps on a Pentium 2 /400 system (my real target)
So a was able to speed it up by doing only system to system blt an ONLY ONE LAST big system to video blt
Take care that many system to video blts do runn SLOWER than system to system ... and i still wonder WHY?
My friend''s AMD K6/Savage system runns other games ok...and he is still wondering what the hell is my game doing.... if i only knew
testing with performance counters shows an 80.000 thicks for the system to video 800x600 simple blt...oh my God ... this makes me think to replace even this blt with my asm
(and his CAPS shows accelerated system to video blt!!!)
and of course Starcraft runns perfectlly on his system
Test it on many systems because you can have big surprizes...
Best luck
quote:Original post by bogdanontanu
Take care that many system to video blts do runn SLOWER than system to system ... and i still wonder WHY?
Assuming that it''s a straight pixel-for-pixel copy:
system->system is fast because it''s a simple, highly optimised assembly memcpy.
video->video should be fast because it can be done by the display hardware (asynchronously).
system->video is slow because you''re copying data across the PCI/AGP bus (synchronously).
video->system is even slower again, because display hardware is optimised to receive data, not copy it back.
TheTwistedOne
http://www.angrycake.com
TheTwistedOne: thanks. I copied your message over to the thread "2D harder to do than 3D today?" where similar things are being talked about.
nathany.com
nathany.com
About the lPitch thing - no you CAN NEVER, EVER, EVER assume that you know what it is - it depends on the system allocation of memory. Your memory boundaries can lie on 32 bit multiples, in which case, if you don''t have a perfect alignment, it will padd it with up to three bytes when necessary.
It''ll look fine under most circumstances, until you run into one allocation where it goes wrong, and then you''ll spend hours finding the problem, only to kick yourself for being lazy.
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
It''ll look fine under most circumstances, until you run into one allocation where it goes wrong, and then you''ll spend hours finding the problem, only to kick yourself for being lazy.
#pragma DWIM // Do What I Mean!
~ Mad Keith ~
**I use Software Mode**
yes, NEVER assume you know the pitch of a surface. Practically all the surfaces i create on my computer dont have a pitch equal to the width.
here are the situations where im sure DX can change the lpitch without telling you:
Lost surfaces (if the user multi-tasks).
and
Texture management.
other than that, im pretty sure that DX doesnt mess with the lpitch of surfaces at all.
===============================================
If there is a witness to my little life,
To my tiny throes and struggles,
He sees a fool;
And it is not fine for gods to menace fools.
Lost surfaces (if the user multi-tasks).
and
Texture management.
other than that, im pretty sure that DX doesnt mess with the lpitch of surfaces at all.
===============================================
If there is a witness to my little life,
To my tiny throes and struggles,
He sees a fool;
And it is not fine for gods to menace fools.
quote:Original post by MadKeithV
About the lPitch thing - no you CAN NEVER, EVER, EVER assume that you know what it is - it depends on the system allocation of memory. Your memory boundaries can lie on 32 bit multiples, in which case, if you don''t have a perfect alignment, it will padd it with up to three bytes when necessary.
hm. sounds like it could be off in sysram too then? but pad the width of each line by 3 bytes, wouldn''t it just allocate one big block of sysram, not individual lines?
quote:It''ll look fine under most circumstances, until you run into one allocation where it goes wrong, and then you''ll spend hours finding the problem, only to kick yourself for being lazy.
it''s not a matter of laziness, it''s a matter of optimization. if i knew the pitch for a 640 wide, 16-bit surface was always 1280 bytes, i could optimize my ASM routines to use shifts and such vs multiplying by the lPitch var.
- n8
nathany.com
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement