Archived

This topic is now archived and is closed to further replies.

Blackstream

Fast Pixels

Recommended Posts

I feel like such a newbie for asking this (and that is probably because I am), but I am having problems drawing pixels fast. First off, I have a Pentium 166mhz with 32MB RAM and a video card that doesn''t want to accelerate anything, even though it is a Riva Viper v550 16MB. That may have something to do with it, but I think there is more to it than that. Right now, I have an inline pixel drawing function basically filling up the screen to an offscreen surface. The resolution of the surface is 800x600x8bpp, and I get about 18 frames/sec just filling up the surface. I heard something about register variables and using DDLOCK_WRITEONLY, and I''ll try out some more stuff while I wait for your guy''s answer. What''s that? I didn''t ask the question. Doh. My question is, can you store the pitch of a surface and count on it staying the same for that surface as long as the surface and the resolution don''t change? If I can, I could probably make a table of pitch*y values for that surface... Yes, I know a table of pitch*y values would be way taxing on memory if I had a lot of surfaces and a table for each one. Your guys thoughts would be nice. I have a sneaky feeling that my comp is a big cause of my pain, though. -Blackstream

Share this post


Link to post
Share on other sites
same problem here

any pixel manipulation...MUST be done in system memory NOT in video memory...this will boost your speed

IMHO a pitch table will not help much nowadays because it will kill your CPU L1/L2 cache...

best is to drop all directx surfaces into system memory. Just use plain matrix and then just once in the end use a surface to blit from system to video back buffer the Flip....


i do i like that and i get more fps...on doing just what u do i mean

Bogdan

Share this post


Link to post
Share on other sites
just a hint here,

you probably do something like
for(y=0;y<600;y++)
for(x=0;x<800;x++)
surface[y*pitch+x]=color;

in place of make a table of y*pitch values, you could simply do something like

for(y=0;y<600;y++)
{
y_pitch=y*pitch;
for(x=0;x<800;x++)
surface[y_pitch+x]=color;
}

this will already eleminate the multiplication in the inner loop

cyberg

Share this post


Link to post
Share on other sites
To answer a few things:

First, about register variables, I don''t think Visual C++ will let you use register variables, even if you specify the register keyword.

I''m not sure about any benefits of DDLOCK_WRITEONLY as I''ve never tried it. But if you''re only going to be writing, chances are that you don''t need to manually manipulate the pixels anyway. For example, you can fill up the surface using the blitter. The only times I use manual pixel manipulation is for doing operations that are dependent on what colors are already present on the surface... which is why I''ve never tried out DDLOCK_WRITEONLY.

If you don''t know how to do color fills with the blitter, the code looks something like this:

    
DDBLTFX fx;

ZeroMemory(&fx, sizeof(fx));
fx.dwSize = sizeof(fx);
fx.dwFillColor = 0; // black


lpdds->Blt(NULL, NULL, NULL, DDBLT_WAIT | DDBLT_COLORFILL, &fx);


-Ironblayde
 Aeon Software

The following sentence is true.
The preceding sentence is false.

Share this post


Link to post
Share on other sites
Yeah, IronBlayde, I don''t need to draw pixels. In fact, I''ve never, ever needed to draw pixels in anything I''ve ever written. Although I can forsee making a particle engine in the future, and that may requre the drawing of pixels, just not 800x600 pixels. However, I''m not going to be able to make a software renderer for my exploration into 3D (Yes, I know about Direct3D and OpenGL, but I would rather learn to do it the hard way first, so I understand better what I am doing), if I can''t draw that many pixels speedily. Actually though, here is some interesting stuff I found out.

800x600x8pp using regular pixel plotter: 11 frames per second

800x600x8pp using inline pixel plotter: 16 frames per second

800x600x8pp using taking my plotter and putting the code directly into my drawing loop (no function overhead): 25 frames per second

800x600x8pp using system memory, no functions, and a pitch table: 25 frames per second (yes, no speed increases)

640x400x8pp using inline functions: 35 fps

600x400x8pp using sysmem, no functions, and a pitch table: 45 fps

320x200x8pp using inline functions: 160 fps!!!!!!!!!!

I don''t know what happened, but there was a sudden speed increase at the end.

Basically, it looks like the only thing that increased the speed of my plotting was lowering the resolution and getting rid of
pixel plotting functions.

-Blackstream

Share this post


Link to post
Share on other sites
Your pixel plotting speed will most likely be proportional to the area, this is why there is a sudden speed increase. Is this the ModeX 320x200 or Mode 13h ?

I too am doing a software 3D engine for learning purposes. I have got to triangle rasterising. I can do about 300 tris per second in 400x300x16. If it was 8-bit it would be double that though. And this is in Debug build with no asm optimisations. I'll probably use MMX in the final version.



Please state the nature of the debugging emergency.


sharewaregames.20m.com

Edited by - furby100 on November 3, 2000 3:52:33 AM

Share this post


Link to post
Share on other sites
Hmm, it isn''t ModeX, I just set the resolution to 320x200, no STANDARDVGA stuff set. 300 tris per second, is that good?

Let''s see, 1 cube = 2 tris per side * 6 sizes = 12 tris.

300 tris per second / 12 tris = 25 fps. Dang. How in the heck did the people of old (people who made Doom and stuff) manage to make their 3d games with worse computers go so fast? I''ve really got to admire those dudes. My guess is a heavy dose of dirty tricks, optimizations, and pure assembly.

-Blackstream

Share this post


Link to post
Share on other sites
They used a technique called back faced culling. That means that the polys that you don''t see don''t get rendered. So if you have a cube, you''re definitely not going to be able to see all 6 sides, at most you''ll see 3 sides. Thus, the most number of polys you''ll need to draw a cube is 6 (3 * 2) and not 12.

Just by doing this, you''ve cut your polygon numbers by half. There are other techniques, but I think I''ll let the Pros answer it

Share this post


Link to post
Share on other sites
They used BSP in Doom to eliminate non-visible polys quickly. Also in DOOM it was 256 color, I am 65536 color, and so you can double the tris per second, and it was in mode 13h 320x200 so the tris/second can be multiplied by the inverse area scale factor, making 940 tris/second and also I can render quads 3/4 as fast as tris, making 704 quads/sec. (A quad does not take two tris because of an algorithmic optimisation to do with slope computation). This makes the FPS into 704/3 (backface culling) = 235 fps. Also, bear in mind this is without any sort of optimisation. If I optimise a bit more then I will end up with a higher fps. If I use MMX/SSE for fills and stuff I will get an even better frame rate, and also I am rendering to VRAM, which older games did not do, if I render to sysram instead I will speed up a bit, although updating the screen will take longer.

PS. You say you had no STANDARDVGAMODE stuff, that means you _are_ using ModeX. If you use DDSDM_STANDARDVGAMODE, it means you want Mode 13h. Because mode 13h only allows you to use 64000 bytes of VRAM it is more efficient to backbuffer in system ram than to page flip.



Please state the nature of the debugging emergency.


sharewaregames.20m.com

Share this post


Link to post
Share on other sites
In the ddscaps of the ddsurfacedesc used in CreateSurface, DDSCAPS_SYSTEMMEMORY needs to be used to get good speed. I think this has to be done to front and back surfaces or it won''t work. I''m not sure though.

DDSURFACEDESC2 ddsd;
ddsd.ddscaps.dwCaps |= DDSCAPS_SYSTEMMEMORY.

That should speed it up a bit.

Share this post


Link to post
Share on other sites

Here''s some code that may speed up your application (this assumes you have MMX on your CPU)

    
void DrawMMX( DDSURFACEDESC *dest, BYTE *src, RECT &area, int bpp )
{
int w=(area.right-area.left)*bpp/8;
int h=area.bottom-area.top;
char *sptr=(char*)src;
char *dptr=(char*)(unsigned char*)((unsigned char*)dest->lpSurface);
DWORD adddpos=dest->lPitch-w*8;
DWORD addspos=bpp*(area.right-area.left)-w*8;
__asm
{
mov edx,h
mov esi,sptr
mov edi,dptr
NextLine:
mov ecx,w
NextPixels:
movq mm0,[esi]
movq [edi],mm0
add esi,8
add edi,8
dec ecx
jnz NextPixels
add edi,adddpos
add esi,addspos
dec edx
jnz NextLine
emms
}
}
//

//

// EXAMPLE of use:

//

//

if ( g_bMMX_Enabled )
{
DDSURFACEDESC ddsd;
ddsd.dwSize = sizeof(ddsd);
if ( DD_OK != g_pDDSPrimary->Lock(NULL, &ddsd, DDLOCK_WAIT, NULL))
{
return FALSE;
}

RECT rc={0,0,CSDScreen::GetWidth(), CSDScreen::GetHeight()};
DrawMMX( &ddsd, CSDScreen::GetDib()->GetBits(), rc, 3 );

g_pDDSPrimary->Unlock( NULL );
}


Share this post


Link to post
Share on other sites
For your information, Doom and Doom2 used ModeX.

Also the speed increase from 640x400x8 to 320x200x8 is due the fact you are filling 4 times less pixels (45x4 = 180fps - overhead for copying = 160fps).

Nice trick Neocron - got any more of that?


Stay Lucky, Graham "Mournblade" Reeds.
http://homepage.dtn.ntl.com/grahamr

Share this post


Link to post
Share on other sites
I thought that Doom and DoomII were not really true 3D. Didn''t it use a type of ray casting engine? Maybe I am just confusing myself!



"If at first you DO succeed...try not to look astonished!!"

BASSOFeeSH@aol.com
><>

Share this post


Link to post
Share on other sites
They were not true 3D as such, no, as in the walls were in fact lines. It used BSP based on lines, rather than based on planes like Quake, which is true 3D. However, the Display was 3D, even though the levels and monsters weren''t.



Please state the nature of the debugging emergency.


sharewaregames.20m.com

Share this post


Link to post
Share on other sites