Improve this blending function?
I''m working on a blending function that would be used to brighten or darken a surface.
iw is the width and ih is the height... of the surface.
ALPHA ranges from 0 to 256, or greater than 256 to brighten the surface.
The problem is that it is not very well-optimized. What options (other than maybe lookup tables) do I have?
void BlendPrimary(LPDIRECTDRAWSURFACE7 lpdds, int iw, int ih, int ALPHA)
{
int sb, sg, sr;
int red, green, blue;
int i=0, j=0;
DDSURFACEDESC2 ddsd;
memset(&ddsd,0,sizeof(DDSURFACEDESC2));
ddsd.dwSize = sizeof(DDSURFACEDESC2);
lpdds->Lock(NULL, &ddsd, DDLOCK_WAIT, NULL);
WORD *dTemp=(WORD *)ddsd.lpSurface;
for (i=0; i> 5) & 0x3f;
sr = (dTemp[spot] >> 11) & 0x1f;
blue = (ALPHA * (sb)) >> 8;
green = (ALPHA * (sg)) >> 8;
red = (ALPHA * (sr)) >> 8;
dTemp[spot]=(blue | (green << 5) | (red << 11));
}
}
lpdds->Unlock(NULL);
}
--Mike
MikeDoty.com
Well, I dunno too much about optimizing, but for a start, u might want to make all your local variables static. This will save them being pushed onto the stack each time the function is called. May provide a little bit of performance gain - I''m not sure
---------------
I finally got it all together...
...and then forgot where I put it.
---------------
I finally got it all together...
...and then forgot where I put it.
Well..
int spot=i+(j*ddsd.lPitch/2);
calculating ddsd.lPitch/2 would give you lil speed increase (or at least useing >>1 instead /2)
also instead using sb/sg/sr just make all calculations in one line, this should give your compiler a better idea of what you''re up to and possible let it create better code.
if you would modify alpha value to use values between 0..63 (note, that in your 16bpp mode it won''t change visual quality!)you could then make one multiply less by multiplying together R and B value like this:
1.we start here
RRRRRGGGGGGBBBBB
2.we and to get rid of G values
and 1111100000011111
we have now
RRRRR000000BBBBB
3.we multiply it by alpha (alpha is between 0 and 63)
- we end up with a 32bit number
(some unused bits at the begginging) RRRRRRXXXXXBBBBBBXXXXX
where X is a part of multiply which doesn''t interest us
4.we shift it right 6 bits
RRRRRRXXXXXBBBBB
5.we and it to get R and B values
and with:
1111100000011111
to get:
RRRRR000000BBBBB
now we only add normally calculated Green color value and we''re finished.
but still, instead shifting it right, making an and, multiplying and shifting right and shifting left we could do this:
1.start
RRRRRGGGGGGBBBBB
2.and
0000011111100000
to get
00000GGGGGG00000
3.multiply(alpha assumed 0..63), 32bit result
(some bits unused at front) GGGGGGXXXXXX00000
4.shift it right 6 bits
(some bits unused) GGGGGGXXXXX
5.and
0000011111100000
to get
00000GGGGGG00000
NOTE:
instead shifting both calculations right 6 bits , then performing an AND instruction and adding them together (RB calculation and G calculation) you can do this:
1.perform an AND instruction to RB value
2.perform an AND instruction to G value
3.ADD them
4.SHIFT them right 6 places
instead as indicated before
1.SHIFT RB value
2.AND RB value
3.SHIFT G value
4.AND G value
5.ADD them
- one less instruction to do
another idea(especially when you''re using it on large surfaces)
1.use DWORD memory access instead WORD - greater memory transfer speed
2.calculate it using 3 muls instead 4 (2*RB mul and 1*GG mul and you''re using 3 multiplys instead 4)
but to make it really fast you should go with asm and MMX instructions...
hope it helps you In case of troubles feel free to email me
With best regards,
Mirek Czerwiñski
http://kris.top.pl/~kherin/
int spot=i+(j*ddsd.lPitch/2);
calculating ddsd.lPitch/2 would give you lil speed increase (or at least useing >>1 instead /2)
also instead using sb/sg/sr just make all calculations in one line, this should give your compiler a better idea of what you''re up to and possible let it create better code.
if you would modify alpha value to use values between 0..63 (note, that in your 16bpp mode it won''t change visual quality!)you could then make one multiply less by multiplying together R and B value like this:
1.we start here
RRRRRGGGGGGBBBBB
2.we and to get rid of G values
and 1111100000011111
we have now
RRRRR000000BBBBB
3.we multiply it by alpha (alpha is between 0 and 63)
- we end up with a 32bit number
(some unused bits at the begginging) RRRRRRXXXXXBBBBBBXXXXX
where X is a part of multiply which doesn''t interest us
4.we shift it right 6 bits
RRRRRRXXXXXBBBBB
5.we and it to get R and B values
and with:
1111100000011111
to get:
RRRRR000000BBBBB
now we only add normally calculated Green color value and we''re finished.
but still, instead shifting it right, making an and, multiplying and shifting right and shifting left we could do this:
1.start
RRRRRGGGGGGBBBBB
2.and
0000011111100000
to get
00000GGGGGG00000
3.multiply(alpha assumed 0..63), 32bit result
(some bits unused at front) GGGGGGXXXXXX00000
4.shift it right 6 bits
(some bits unused) GGGGGGXXXXX
5.and
0000011111100000
to get
00000GGGGGG00000
NOTE:
instead shifting both calculations right 6 bits , then performing an AND instruction and adding them together (RB calculation and G calculation) you can do this:
1.perform an AND instruction to RB value
2.perform an AND instruction to G value
3.ADD them
4.SHIFT them right 6 places
instead as indicated before
1.SHIFT RB value
2.AND RB value
3.SHIFT G value
4.AND G value
5.ADD them
- one less instruction to do
another idea(especially when you''re using it on large surfaces)
1.use DWORD memory access instead WORD - greater memory transfer speed
2.calculate it using 3 muls instead 4 (2*RB mul and 1*GG mul and you''re using 3 multiplys instead 4)
but to make it really fast you should go with asm and MMX instructions...
hope it helps you In case of troubles feel free to email me
With best regards,
Mirek Czerwiñski
http://kris.top.pl/~kherin/
ps.about using DWORD access, I hope it''s clear that we''re going to calculate two pixels at one time to fully use our 32bit processor.
With best regards,
Mirek Czerwiñski
http://kris.top.pl/~kherin/
With best regards,
Mirek Czerwiñski
http://kris.top.pl/~kherin/
oh yeah... so i finally decided to regester here, The DC is still my main place cos thats where im helping design a virtual reality OS... and Hi Tim... how da new engine going ?
hehe, antilogichyper, you sure you''re posting it right?
With best regards,
Mirek Czerwiñski
http://kris.top.pl/~kherin/
With best regards,
Mirek Czerwiñski
http://kris.top.pl/~kherin/
are you crazy or something? NEVER use the pitch directx gives you like that. a video card can put ANY amount of padding including odd byte amounts (but unlikly).
instead:
also it looks like you are reading and writing toi the same surface (nothing wrong with that) but NEVER read and write form a video memory surface or you can kiss yoru framerate goodbye.
instead:
unsigned short *dxPtrlocal=dxPtr; // points to surface datafor(y=0; y<height; y++){ for(x=0; x<width; x++) { dxPtrlocal[x]=value } dxPtrlocal= (unsigned short *)((char*)dxPtrlocal)+dxpitch);}
also it looks like you are reading and writing toi the same surface (nothing wrong with that) but NEVER read and write form a video memory surface or you can kiss yoru framerate goodbye.
Would this be faster:
void BlendPrimary(LPDIRECTDRAWSURFACE7 lpdds, int iw, int ih, int ALPHA)
{
int red, green, blue;
int alpha1=ALPHA;
int alpha2=ALPHA<<5;
int alpha3=ALPHA<<11;
int redMask=0xf800; //#define these to go faster
int greenMask=0x7e0; //#define these to go faster
int blueMask=0x1f; //#define these to go faster
int i=0, j=0;
DDSURFACEDESC2 ddsd;
memset(&ddsd,0,sizeof(DDSURFACEDESC2));
ddsd.dwSize = sizeof(DDSURFACEDESC2);
lpdds->Lock(NULL, &ddsd, DDLOCK_WAIT, NULL);
WORD *dTemp=(WORD *)ddsd.lpSurface;
for (i=0; i {
for (j=0; j {
int spot=i+(j*(ddsd.lPitch>>1));
blue = ((alpha1 * (dTemp[spot]&blueMask)) >> 8)&blueMask
green = ((alpha2 * (dTemp[spot]&greenMask)) >> 13)&greenMask
red = ((alpha3 * (dTemp[spot]&redMask)) >> 19)&redMask
dTemp[spot]=(blue | green | red);
}
}
I don't know if this is valid, I just sort of threw it together on the spot. I haven't done any alpha blending yet, so this is just a guess.
ooohhh, 3 edits(3 strikes, your out)
---
Make it work.
Make it right.
Make it fast.
Edited by - CaptainJester on December 7, 2001 2:46:22 PM
Edited by - CaptainJester on December 7, 2001 2:47:51 PM
Edited by - CaptainJester on December 10, 2001 2:16:22 PM
void BlendPrimary(LPDIRECTDRAWSURFACE7 lpdds, int iw, int ih, int ALPHA)
{
int red, green, blue;
int alpha1=ALPHA;
int alpha2=ALPHA<<5;
int alpha3=ALPHA<<11;
int redMask=0xf800; //#define these to go faster
int greenMask=0x7e0; //#define these to go faster
int blueMask=0x1f; //#define these to go faster
int i=0, j=0;
DDSURFACEDESC2 ddsd;
memset(&ddsd,0,sizeof(DDSURFACEDESC2));
ddsd.dwSize = sizeof(DDSURFACEDESC2);
lpdds->Lock(NULL, &ddsd, DDLOCK_WAIT, NULL);
WORD *dTemp=(WORD *)ddsd.lpSurface;
for (i=0; i {
for (j=0; j {
int spot=i+(j*(ddsd.lPitch>>1));
blue = ((alpha1 * (dTemp[spot]&blueMask)) >> 8)&blueMask
green = ((alpha2 * (dTemp[spot]&greenMask)) >> 13)&greenMask
red = ((alpha3 * (dTemp[spot]&redMask)) >> 19)&redMask
dTemp[spot]=(blue | green | red);
}
}
I don't know if this is valid, I just sort of threw it together on the spot. I haven't done any alpha blending yet, so this is just a guess.
ooohhh, 3 edits(3 strikes, your out)
---
Make it work.
Make it right.
Make it fast.
Edited by - CaptainJester on December 7, 2001 2:46:22 PM
Edited by - CaptainJester on December 7, 2001 2:47:51 PM
Edited by - CaptainJester on December 10, 2001 2:16:22 PM
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement