Archived

This topic is now archived and is closed to further replies.

Sailorstick

Optimising...

Recommended Posts

Heres a nifty little function I made that draws a sphere like circle on the screen. It works as intended but is quite slow. Are there any obvious optimisations I could be making here?
void Screen::FancyEllipse(HBRUSH* hbr, int x, int y, int w, int h){
int scanw;
int position;
int BLUE=0;
int GREEN=1;
int RED=2;
int Red, Green, Blue;
int halfRed, halfGreen, halfBlue;
LOGBRUSH brushinfo;
COLORREF colour;
int relX;
int relY;
int horzradius = w/2;
int vertradius = h/2;
int horzR2 = horzradius*horzradius;
int vertR2 = vertradius*vertradius;
double x2OnR2, y2OnR2;

	GetObject(*hbr, sizeof(LOGBRUSH), &brushinfo);
	colour = brushinfo.lbColor;
	Red=GetRValue(colour);
	Green=GetGValue(colour);
	Blue=GetBValue(colour);
	halfRed=Red/2;
	halfGreen=Green/2;
	halfBlue=Blue/2;
	
	//move thru bounding rectangle:

	//access the bits of the BackBuffer and modify

	scanw=this->width*3;  //note: screen width should always be multiple of 4

	if(this->backBmpBits){
		for(int bmpY=y; bmpY<y+h; bmpY++){
			for(int bmpX=x; bmpX<x+w; bmpX++){
				relX = bmpX-x-horzradius;
				relY = bmpY-y-vertradius;
				x2OnR2 = (double)(relX*relX)/horzR2;
				y2OnR2 = (double)(relY*relY)/vertR2;
				if(x2OnR2 + y2OnR2 <= 1){
					position = bmpY*scanw+bmpX*3;
					backBmpBits[position+RED] = FitToByte(Red- sin(x2OnR2+y2OnR2) * halfRed);
					backBmpBits[position+GREEN] = FitToByte(Green- sin(x2OnR2+y2OnR2) * halfGreen);
					backBmpBits[position+BLUE] = FitToByte(Blue- sin(x2OnR2+y2OnR2) * halfBlue);
				}
			}
		}
	}
}

Share this post


Link to post
Share on other sites
The obvious candidates are integer division by 2.

Replace w/2 with w >> 1, and likewise for all your integer divisions by a power of 2.

[edited by - sbennett on July 4, 2003 5:01:08 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by sbennett
The obvious candidates are integer division by 2.

Replace w/2 with w >> 1, and likewise for all your integer divisions by a power of 2.




That wouldn''t help a bit.. the compiler should optimize it. And even if it doesn''t, it''s outside the loop, so the gain is almost none.

Share this post


Link to post
Share on other sites
quote:
Original post by Sailorstick
for(int bmpY=y; bmpY< y+h; bmpY++)
for(int bmpX=x; bmpX< x+w; bmpX++)


y+h and x+w can be put in two variables. You are adding them repetitively for the entire loops.

int someVar = y+h;
int someVar2 = x+w;
for(int bmpY=y; bmpY< someVar ; bmpY++)
for(int bmpX=x; bmpX< someVar2 ; bmpX++)

Oh, and I don't know if declaring bmpY and bmpX once outside the loop might help, but try it anyway. Theoretically, bmpX is allocated and deallocated for every iteration of bmpY.
quote:
relX = bmpX-x-horzradius;
relY = bmpY-y-vertradius;
This one also. x-horzradius and y-vertradius.
quote:

backBmpBits[position+RED] = FitToByte(Red- sin(x2OnR2+y2OnR2) * halfRed);
backBmpBits[position+GREEN] = FitToByte(Green- sin(x2OnR2+y2OnR2) * halfGreen);
backBmpBits[position+BLUE] = FitToByte(Blue- sin(x2OnR2+y2OnR2) * halfBlue);
sin(x2OnR2+y2OnR2) is another repetitive task. Calculate it once, put it in a variable, and use it.

[edited by - alnite on July 4, 2003 5:20:09 PM]

Share this post


Link to post
Share on other sites
Yes, I would have bet that your code is slow ;-)

First, normally w/2 by w>>1 should have no effect, since if optimizations are turned on, any compiler I know of will do that itself.

The problem here is not in the small optimizations, like /2 becoming >>1 or caching the sine (but that surely may help), it''s merely the algorithm itself!

For EACH point in the bounding box, you are computing two squares and performing two division (here again you could cache the Y values, saving time). Moreover, you are computing a sine for EACH point INSIDE or ON the ellipse...

Well that''s a lot of time consuming operations, all on floating points. No wonder it''s slow (even if our computers always tick faster).

Then how shall you do it ? You guessed it, there are better algorithms: we were able to do that on some 286 @ 10MHz (a friend of mine had one).

For one example of such algorithms, search for "Bresenham", you will find the answer to your question... Maybe restrict your search to "bresenham" and "ellipse" because the guy also gave his name to line and circle drawing algorithms.
This algorithm only uses integers, and only compute the points that are drawn on the screen (i.e. those points that are on the ellipse, and not outside or inside).

If your lazy, here is a page with the algorithm fully written out in C, but without explanations:
http://www.robots.ox.ac.uk/~awf/graphics/bres-ellipse.html

jods

Share this post


Link to post
Share on other sites
quote:
Original post by jods
This algorithm only uses integers, and only compute the points that are drawn on the screen (i.e. those points that are on the ellipse, and not outside or inside).
Well that isn''t what I want. I watn to fill the ellipse and to do so I will have to set every pixel inside it. Now the area of the ellipse is almost 3/4 the area of the bounding box. Not much of a time-saver there...
I''ll have a look at the link anyhow, see if I can adapt it to what I want.
Also, the caching of sin. How did I miss that one?!
I was also thinking of creating a high memory sin that precomuptes sin from 0 to 2PI in 0.01 blocks the first time you use it. It''s probably more trouble than it''s worth.
BTW the sin itself isn''t required for the ellipse, it''s just for the spherical effect fill. Otherwise, I would have just used the Ellipse API function.

Share this post


Link to post
Share on other sites
Still check out bresenham''s algorithm. You can easily fill the ellipse using a standard scan line fill algorithm. Work out the coordinates for each side of the ellipse using B''s ellipse algorithm and then draw a line between them. You can run B''s alg. in two directions at the same time (i.e. draw two half ellipses/spheres) so you can get the two end points for each line while the sphere itself is being calculated.

It is ahelluva lot faster than using the ellipse equation. The fact it uses integers should really make much difference nowadays though because floats are faster on a pentium than ints anyway. Check out this link for a comparison in CPU cycles on int vs float and sin caching speed advantage

http://aulos.calarts.edu/pipermail/music-dsp/1998-July/020377.html

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
i''m no optimization guru, but i think using a pure-integer algorithm probably would be better, because if you do scanline fills like stew mentioned, then you can take advantage of mmx to draw them without turning mmx on/off (=slow) inside your ellipse() function (you have to turn it off if you want to use the FPU)

Share this post


Link to post
Share on other sites
quote:
Original post by stew
Still check out bresenham''s algorithm. You can easily fill the ellipse using a standard scan line fill algorithm. Work out the coordinates for each side of the ellipse using B''s ellipse algorithm and then draw a line between them. You can run B''s alg. in two directions at the same time (i.e. draw two half ellipses/spheres) so you can get the two end points for each line while the sphere itself is being calculated.

It is ahelluva lot faster than using the ellipse equation. The fact it uses integers should really make much difference nowadays though because floats are faster on a pentium than ints anyway. Check out this link for a comparison in CPU cycles on int vs float and sin caching speed advantage

http://aulos.calarts.edu/pipermail/music-dsp/1998-July/020377.html



How can Bresenham''s algorithm be faster than using the ellipse equation, when they are the same? Bresenham''s algorithm is derived from the ellipse equation, and you would probably come up with a very similar thing if you tried to make an algorithm from that.

And yes, you only need to change the original Bresenham''s algorithm a little bit to fill the ellipse

Share this post


Link to post
Share on other sites