Optimising...

Started by
10 comments, last by Sailorstick 20 years, 9 months ago
Heres a nifty little function I made that draws a sphere like circle on the screen. It works as intended but is quite slow. Are there any obvious optimisations I could be making here?
void Screen::FancyEllipse(HBRUSH* hbr, int x, int y, int w, int h){
int scanw;
int position;
int BLUE=0;
int GREEN=1;
int RED=2;
int Red, Green, Blue;
int halfRed, halfGreen, halfBlue;
LOGBRUSH brushinfo;
COLORREF colour;
int relX;
int relY;
int horzradius = w/2;
int vertradius = h/2;
int horzR2 = horzradius*horzradius;
int vertR2 = vertradius*vertradius;
double x2OnR2, y2OnR2;

	GetObject(*hbr, sizeof(LOGBRUSH), &brushinfo);
	colour = brushinfo.lbColor;
	Red=GetRValue(colour);
	Green=GetGValue(colour);
	Blue=GetBValue(colour);
	halfRed=Red/2;
	halfGreen=Green/2;
	halfBlue=Blue/2;
	
	//move thru bounding rectangle:

	//access the bits of the BackBuffer and modify

	scanw=this->width*3;  //note: screen width should always be multiple of 4

	if(this->backBmpBits){
		for(int bmpY=y; bmpY<y+h; bmpY++){
			for(int bmpX=x; bmpX<x+w; bmpX++){
				relX = bmpX-x-horzradius;
				relY = bmpY-y-vertradius;
				x2OnR2 = (double)(relX*relX)/horzR2;
				y2OnR2 = (double)(relY*relY)/vertR2;
				if(x2OnR2 + y2OnR2 <= 1){
					position = bmpY*scanw+bmpX*3;
					backBmpBits[position+RED] = FitToByte(Red- sin(x2OnR2+y2OnR2) * halfRed);
					backBmpBits[position+GREEN] = FitToByte(Green- sin(x2OnR2+y2OnR2) * halfGreen);
					backBmpBits[position+BLUE] = FitToByte(Blue- sin(x2OnR2+y2OnR2) * halfBlue);
				}
			}
		}
	}
}
Advertisement
The obvious candidates are integer division by 2.

Replace w/2 with w >> 1, and likewise for all your integer divisions by a power of 2.

[edited by - sbennett on July 4, 2003 5:01:08 PM]
You could also try caching sin(x2OnR2+y2OnR2), that ought to help.
quote:Original post by sbennett
The obvious candidates are integer division by 2.

Replace w/2 with w >> 1, and likewise for all your integer divisions by a power of 2.



That wouldn''t help a bit.. the compiler should optimize it. And even if it doesn''t, it''s outside the loop, so the gain is almost none.

quote:Original post by Sailorstick
for(int bmpY=y; bmpY< y+h; bmpY++)
for(int bmpX=x; bmpX< x+w; bmpX++)

y+h and x+w can be put in two variables. You are adding them repetitively for the entire loops.

int someVar = y+h;
int someVar2 = x+w;
for(int bmpY=y; bmpY< someVar ; bmpY++)
for(int bmpX=x; bmpX< someVar2 ; bmpX++)

Oh, and I don't know if declaring bmpY and bmpX once outside the loop might help, but try it anyway. Theoretically, bmpX is allocated and deallocated for every iteration of bmpY.
quote:relX = bmpX-x-horzradius;
relY = bmpY-y-vertradius;
This one also. x-horzradius and y-vertradius.
quote:
backBmpBits[position+RED] = FitToByte(Red- sin(x2OnR2+y2OnR2) * halfRed);
backBmpBits[position+GREEN] = FitToByte(Green- sin(x2OnR2+y2OnR2) * halfGreen);
backBmpBits[position+BLUE] = FitToByte(Blue- sin(x2OnR2+y2OnR2) * halfBlue);
sin(x2OnR2+y2OnR2) is another repetitive task. Calculate it once, put it in a variable, and use it.

[edited by - alnite on July 4, 2003 5:20:09 PM]
Yes, I would have bet that your code is slow ;-)

First, normally w/2 by w>>1 should have no effect, since if optimizations are turned on, any compiler I know of will do that itself.

The problem here is not in the small optimizations, like /2 becoming >>1 or caching the sine (but that surely may help), it''s merely the algorithm itself!

For EACH point in the bounding box, you are computing two squares and performing two division (here again you could cache the Y values, saving time). Moreover, you are computing a sine for EACH point INSIDE or ON the ellipse...

Well that''s a lot of time consuming operations, all on floating points. No wonder it''s slow (even if our computers always tick faster).

Then how shall you do it ? You guessed it, there are better algorithms: we were able to do that on some 286 @ 10MHz (a friend of mine had one).

For one example of such algorithms, search for "Bresenham", you will find the answer to your question... Maybe restrict your search to "bresenham" and "ellipse" because the guy also gave his name to line and circle drawing algorithms.
This algorithm only uses integers, and only compute the points that are drawn on the screen (i.e. those points that are on the ellipse, and not outside or inside).

If your lazy, here is a page with the algorithm fully written out in C, but without explanations:
http://www.robots.ox.ac.uk/~awf/graphics/bres-ellipse.html

jods
The obvious optimization I see here is to use a better algorithm. There''s no need to scan all the pixels on the screen to draw an ellipse.

Use the ellipse equation.
quote:Original post by jods
This algorithm only uses integers, and only compute the points that are drawn on the screen (i.e. those points that are on the ellipse, and not outside or inside).
Well that isn''t what I want. I watn to fill the ellipse and to do so I will have to set every pixel inside it. Now the area of the ellipse is almost 3/4 the area of the bounding box. Not much of a time-saver there...
I''ll have a look at the link anyhow, see if I can adapt it to what I want.
Also, the caching of sin. How did I miss that one?!
I was also thinking of creating a high memory sin that precomuptes sin from 0 to 2PI in 0.01 blocks the first time you use it. It''s probably more trouble than it''s worth.
BTW the sin itself isn''t required for the ellipse, it''s just for the spherical effect fill. Otherwise, I would have just used the Ellipse API function.

Still check out bresenham''s algorithm. You can easily fill the ellipse using a standard scan line fill algorithm. Work out the coordinates for each side of the ellipse using B''s ellipse algorithm and then draw a line between them. You can run B''s alg. in two directions at the same time (i.e. draw two half ellipses/spheres) so you can get the two end points for each line while the sphere itself is being calculated.

It is ahelluva lot faster than using the ellipse equation. The fact it uses integers should really make much difference nowadays though because floats are faster on a pentium than ints anyway. Check out this link for a comparison in CPU cycles on int vs float and sin caching speed advantage

http://aulos.calarts.edu/pipermail/music-dsp/1998-July/020377.html
i''m no optimization guru, but i think using a pure-integer algorithm probably would be better, because if you do scanline fills like stew mentioned, then you can take advantage of mmx to draw them without turning mmx on/off (=slow) inside your ellipse() function (you have to turn it off if you want to use the FPU)

This topic is closed to new replies.

Advertisement