Archived

This topic is now archived and is closed to further replies.

Craazer

doesn anybody know how to do this...

Recommended Posts

I have seen some techniques in blitting operations vere u draw two or even more pixels at the same time to the back buffer, this was done shometing like masking two values same time from the source surface but im not sure. so does anyone know how to make ur own blitter wich draws more than one pixel in a time? i mean optimazing this code:
  
for(int y=0;y<height;++y)
for(int x=0;x<width;++x)
back_buffer = pixel;
// just example of basic blit

//if u could draw two pixels u could half the width and height....


  

Share this post


Link to post
Share on other sites
you can do this in C/C++ by using a larger data type pointer to move the pixels with

16 bit Pixel using a DWord: 2 byte moves 2 pixels at a time
because a DWord moves 4 bytes at a time and there are 2 bytes in each pixel

1. Setup the source and destination pointers as larger data types and put the correct addresses in them

DWORD *pdw_source=NULL;
DWORD *pdw_destination=NULL;

pdw_source = (DWORD *)sourceBuffer[offset];
pdw_destination = (DWORD *)destinationBuffer[offset];

2. divide the number of pixels to be plotted by number of pixels per larger data type
** NOTE: it will simplify your code if you only plot widths divisible by number of pixel per larger type Otherwise you will have to test for the odd pixels so they will be plotted **

// number of 2 byte pixels groups plotted with a DWORD movement
NumPixelGroups = image.width / 2;

// Number of elements to move to next scan line if pitch is in bytes
s_NextLine = source.pitch / 4;
d_NextLine = destination.pitch / 4;

// Set Odd_Pixel flag for 16 bit pixels plotted with a DWORD pointer
Odd_Pixel = source.width & 0x00000001;

3. What the rendering process might look like

            
while(NumScanLines--) // NumScanLines is equal to the height that will be plotted

{
for(i=0;i<NumPixelGroups;i++)
{
pdw_destination[i] = pdw_source[i];
}
// Test for odd pixel

if(Odd_Pixel)
{
*(WORD *)&pdw_destination[NumPixelGroups] = *(WORD *)&pdw_source[NumPixelGroups];

}

// move to next scanline

// pdw_destination = (DWORD *)(((WORD *)pdw_destination)+(d_NextLine); // Can this be done????

// pdw_source = (DWORD *)(((WORD *)pdw_source)+(s_NextLine)); // Can this be done????


pdw_destination += d_NextLine;
pdw_source += s_NextLine;
}

This how i would try to do this but that doesn't mean it is the best or easiest way to do so.

The odd pixel plotting is very ugly. I hope someone has a better solution.

Anyway, hope this helps you.

[edit] damn source tags!!
[edit] sorry code was incorrect
[edit] okay i have tested this code in MS VC++6 and it worked!!
[edit] improved next line indexing by divided Next Line variables by 4 to match the pointer data type
[edited by - CodeJunkie on November 2, 2002 9:47:53 PM]

[edited by - CodeJunkie on November 2, 2002 9:55:35 PM]

[edited by - CodeJunkie on November 2, 2002 11:10:51 PM]

[edited by - CodeJunkie on November 3, 2002 2:30:18 AM]

[edited by - CodeJunkie on November 3, 2002 2:43:18 AM]

[edited by - CodeJunkie on November 3, 2002 10:04:51 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by SenseiDragon
Why would you want to blit only a couple pixels at a time instead of just blitting rects with Blt or BltFast?


Thats not what i ment, i want to make own blitter becose it has some special abilities like alpha. and pixel by pixel was way too slow.

Share this post


Link to post
Share on other sites
Okay as far as i know cpus are 32 bit and most transfer 32 bit at a time and no more (but i could be wrong). Anyway, here are some things you could try for the fun of it.

1. well i think the double type uses 8 bytes(there might be a type that holds 10 bytes). you could use that type to plot with. (haven't tried it yet)

2. unroll the loop to cut down the loop over head(might help a little). however, this requires different address calculation methods but can be used with whatever type.

Speical Note: Handling The Odd Pixel Cases
16 bit color(2 bytes per pixel)
1 DWORD - single pixel test
2 DWORDs - Single pixel, Double Pixel, and Quad Pixel tests
3 DWORDS - Single pixel, Double pixel, Quad pixel, and 8 pixel tests
or
1 double(guessing 8 bytes) - Single, Double, and Quad pixel tests
etc....

The easiest solution is to limit the blitter to widths divisible by he number of pixels plotted at a time so you don't have to handle the odd pixel cases.

However, a general bitmap blitter will have to plot any width.

Good Luck!!

[edited by - CodeJunkie on November 6, 2002 1:02:12 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by CodeJunkie
Okay as far as i know cpus are 32 bit and most transfer 32 bit at a time and no more (but i could be wrong). Anyway, here are some things you could try for the fun of it.

1. well i think the double type uses 8 bytes(there might be a type that holds 10 bytes). you could use that type to plot with. (haven''t tried it yet)

2. unroll the loop to cut down the loop over head(might help a little). however, this requires different address calculation methods but can be used with whatever type.

Speical Note: Handling The Odd Pixel Cases
16 bit color(2 bytes per pixel)
1 DWORD - single pixel test
2 DWORDs - Single pixel and Double Pixel tests
3 DWORDS - Single pixel, Double pixel, Quad pixel
or
1 double(guessing 8 bytes) - Single, Double, and Quad pixel tests
etc....

The easiest solution is to limit the blitter to widths divisible by he number of pixels plotted at a time so you don''t have to handle the odd pixel cases.

However, a general bitmap blitter will have to plot any width.

Good Luck!!


Hmm so it isnt bossible then.. well after all were using 32bit windows.

its just that when ur alpha plending shometing u have to read the backbuffer to get its color, and that is soooo slow.

for example if i have loop wich loops 200*200 times and reads backbuffer everytime, my frame rate drops from 60 to 16 !!!

so thats why the __int64 seems to be only good option. :/

Share this post


Link to post
Share on other sites
quote:
Original post by Craazer
[quote]Original post by CodeJunkie
Okay as far as i know cpus are 32 bit and most transfer 32 bit at a time and no more (but i could be wrong). Anyway, here are some things you could try for the fun of it.

1. well i think the double type uses 8 bytes(there might be a type that holds 10 bytes). you could use that type to plot with. (haven''t tried it yet)

2. unroll the loop to cut down the loop over head(might help a little). however, this requires different address calculation methods but can be used with whatever type.

Speical Note: Handling The Odd Pixel Cases
16 bit color(2 bytes per pixel)
1 DWORD - single pixel test
2 DWORDs - Single pixel and Double Pixel tests
3 DWORDS - Single pixel, Double pixel, Quad pixel
or
1 double(guessing 8 bytes) - Single, Double, and Quad pixel tests
etc....

The easiest solution is to limit the blitter to widths divisible by he number of pixels plotted at a time so you don''t have to handle the odd pixel cases.

However, a general bitmap blitter will have to plot any width.

Good Luck!!


Hmm so it isnt bossible then.. well after all were using 32bit windows.

its just that when ur alpha plending shometing u have to read the backbuffer to get its color, and that is soooo slow.

for example if i have loop wich loops 200*200 times and reads backbuffer everytime, my frame rate drops from 60 to 16 !!!

so thats why the __int64 seems to be only good option. :/




But since most CPUs are only 32 bits, __int64 is only 32 bits. You can''t have 64 bits on something that works at 32 bits max.



[Cyberdrek | the last true sorcerer | Spirit Mage - mutedfaith.com]

Share this post


Link to post
Share on other sites
The best soultion for you would have to involve not reading from vram(if that is what is happenning). Sorry, this is just a guess because I''ve only read in tutorials not to read from vram because its slow.

Try rendering on a system memory surface that might improve the performace. But causes side effects:
1. An extra blit to move system surface to video surface (this might not make using this solution worth doing becuase it is extra work being done vs just reading vram method)
2. All surfaces might have to be in system memory to avoid reading them from vram (taking up more system RAM)

Note: This solution is only worthy of doing if reading the vram is a lot slower then reading from system memory

I haven''t got that far with rendering to have tried this yet.

Hope the idea helps a little.

Share this post


Link to post
Share on other sites
quote:
Original post by CodeJunkie
The best soultion for you would have to involve not reading from vram(if that is what is happenning). Sorry, this is just a guess because I''ve only read in tutorials not to read from vram because its slow.

Try rendering on a system memory surface that might improve the performace. But causes side effects:
1. An extra blit to move system surface to video surface (this might not make using this solution worth doing becuase it is extra work being done vs just reading vram method)
2. All surfaces might have to be in system memory to avoid reading them from vram (taking up more system RAM)

Note: This solution is only worthy of doing if reading the vram is a lot slower then reading from system memory

I haven''t got that far with rendering to have tried this yet.

Hope the idea helps a little.


VRAM slow??

i had 60% speed increase when i switched to VRAM. though i did this to ofscreen surfaces so my backbuffer has allways been in system memory.
And im not sure hove can i have it(backbuffer) to VRAM becose i used VIDEO memory caps but it didnt seem to effect.


Share this post


Link to post
Share on other sites
quote:

VRAM slow??

i had 60% speed increase when i switched to VRAM. though i did this to ofscreen surfaces so my backbuffer has allways been in system memory.
And im not sure hove can i have it(backbuffer) to VRAM becose i used VIDEO memory caps but it didnt seem to effect



You can set up a Back Buffer like this: (in DX 7)


  
// Create the Surface(s) (at lease the Primary)

ZeroMemory(&SurfaceDesc, sizeof(DDSURFACEDESC2));
SurfaceDesc.dwSize = sizeof(DDSURFACEDESC2);
SurfaceDesc.dwFlags = DDSD_CAPS | DDSD_BACKBUFFERCOUNT;
SurfaceDesc.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE | DDSCAPS_FLIP | DDSCAPS_COMPLEX;
SurfaceDesc.dwBackBufferCount = 1;
hresult = RenDirectDraw7Ptr->CreateSurface(&SurfaceDesc, &RenPrimary, NULL);
if(hresult!=DD_OK)
{
// Failed to create Surface

return(FALSE);
}

// Get the Back Buffer''s Pointer

ZeroMemory(&ddscaps, sizeof(DDSCAPS2));
ddscaps.dwCaps = DDSCAPS_BACKBUFFER;
hresult = RenPrimary->GetAttachedSurface(&ddscaps, &RenBackBuffer);
if (hresult!=DD_OK)
{
// Failed to Get Back Buffer

return(FALSE);
}


First, render to the back buffer then flip the surfaces:


  
void Render(void)
{
HRESULT hresult;

// Update Frame

UpdateFrame();

// Flip Surfaces

// Wait Until Flip is done (Surface may have been lost or is being used)

while (TRUE)
{
hresult=RenPrimary->Flip(NULL,DDFLIP_WAIT);// wait for vsync

if(hresult == DD_OK) // Flip Was Done

{break;}

// Check if Surface was lost

if(hresult == DDERR_SURFACELOST)
{
hresult = RenPrimary->Restore();
if(hresult != DD_OK) // Surface could not be restored

{break;}
}

// If the surface was not still being drawn on there was a

// bad error Message

if(hresult != DDERR_WASSTILLDRAWING)
{break;}
}

// End of Render

return;
}


Share this post


Link to post
Share on other sites
Uh.. well thanks CodeJunkie but i know how to do that.

quote:
And im not sure hove can i have it(backbuffer) to


when i sayd that i ment hove can u have backbuffer to VRAM i used the caps to do it but it didnt seem to effect.

Share this post


Link to post
Share on other sites