Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

dnagcat

help optimizing code

This topic is 5251 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have written this function and have no more ideas as to how to optimize it... Structurally speaking, does anyone have any suggestions for optimizing this function:
//
// Apply the current pen tool to the given location.
//
void PatternEditor::applyPen(int vx, int vy, bool shift, QRect* prect)
{
	int pw = pen.width;
	int ph = pen.height;
	int px = vx - (pw / 2);
	int py = vy - (ph / 2);
	ViewObject::Coord center = viewArea()->objectCoord(vx, vy);

	if (prect) {
		if (prect->isValid()) {   //prect already exists but changed in size
			// increase size of rectangle
			if (prect->left() > px)
				prect->setLeft(px);
			if (prect->right() < px + pw)
				prect->setRight(px + pw);
			if (prect->top() > py)
				prect->setTop(py);
			if (prect->bottom() < py + ph)
				prect->setBottom(py + ph);
		}
		else {					  //prect is not valid...set base size
			// set size of rectangle
			prect->setLeft(px);
			prect->setRight(px + pw);
			prect->setTop(py);
			prect->setBottom(py + ph);
		}
	}

	
	float scale = clrmpViewWin->scale();

	int patW = pattern->width();
	int patH = pattern->height();
	int area = patW * patH;
	int pt, dist, x, y, d, dx, dy; 
	
	float sqh = 0.0, sqw = 0.0, radius = 0.0, radSq = 0.0, cw = 32.0 / scale, ch = 32.0 / scale;

	if (largeBrushAction->isOn()) {
		radius = 12.0 / scale;
		radSq = radius * radius;
	}
	else if (medBrushAction->isOn()) {
		radius = 9.0 / scale;
		radSq = radius * radius;
	}
	else if (smallBrushAction->isOn()) {
		radius = 6.0 / scale;
		radSq = radius * radius;
	}

	

	if (largeSquareAction->isOn()) 
		sqw = sqh = (24.0 / 2) / scale;
	else if (medSquareAction->isOn()) 
		sqw = sqh = (18.0 / 2) / scale;
	else if (smallSquareAction->isOn()) 
		sqw = sqh = (12.0 / 2) / scale;
	else if (horizontalSlashAction->isOn()) {
		sqw = (18.0 / 2) / scale;
		sqh = (3.0 / 2) / scale;
	}
	else if (verticalSlashAction->isOn()) {
		sqw = (3.0 / 2) / scale;
		sqh = (18.0 / 2) / scale;
	}
	else if (backSlashAction->isOn() || forwardSlashAction->isOn()) 
		sqw = sqh = (12.0 / 2) / scale;
		
	
	ViewObject::Coord btl, bbr;
	btl.set(center.x() - cw, center.y() - ch);
	bbr.set(center.x() + cw, center.y() + ch);
	int btlx = btl.x(), btly = btl.y();
	int boxH = bbr.y() - btl.y(), boxW = bbr.x() - btl.x();
	
	

	float maxd = 1.5 * SQRT_2 / scale;												//1.5 is the line width
	float maxdx = sqw,  maxdy = sqh;												//1/2 square width and height
	for (y = 0; y < boxH; y++, btly++) {
		btlx = btl.x();																//reset x
		dy = (btly > center.y()) ? btly - center.y() : center.y() - btly;			//y distance
		for (x = 0; x < boxW; x++, btlx++) {
			dx = (btlx > center.x()) ? btlx - center.x() : center.x() - btlx;		//x distance
			switch (activePen()){
				case BackSlash: 
					d = (x > y) ? x - y : y - x;							//|x - y|
					if ((float)d <= maxd && (float)dx <= maxdx && (float)dy <= maxdy 
						&& btlx >= 0 && btlx < patW && btly >= 0 && btly < patH)
						pt = (((btly) * patW) + (btlx));
					else
						continue;
					break;

				case ForwardSlash:	
					d = x + y - (boxW - 1);
                    if (d < 0) 
						d = -d;
					if ((float)d <= maxd && (float)dx <= maxdx && (float)dy <= maxdy 
						&& btlx >= 0 && btlx < patW && btly >= 0 && btly < patH)
						pt = (((btly) * patW) + (btlx));
					else
						continue;
					break;

				case LargeSquare:
				case MedSquare:
				case SmallSquare:
				case VerticalSlash:
				case HorizontalSlash:
					if((float)dx <= maxdx && (float)dy <= maxdy 
						&& btlx >= 0 && btlx < patW && btly >= 0 && btly < patH)
						pt = ((btly * patW) + btlx);
					else
						continue;
					break;

				case LargeRound:
				case MedRound:
				case SmallRound:
					dist = (dx * dx) + (dy * dy);
					if((float)dist <= radSq && btlx >= 0 && btlx < patW && btly >= 0 && btly < patH) 
						pt = ((btly * patW) + btlx);
					else
						continue;
					break;

				default:
					continue;
					break;
			}

			switch (_activeTool) {
			case EraserTool:
						pattern->data->data[pt] = 0;
					break;

				case TerrainTool: {
					int z = pattern->data->data[pt];
					if (shift) {
						if (z > 1)
								pattern->data->data[pt] = z-1;
					}
					else {
						if (z < 255)
								pattern->data->data[pt] = z+1;
					}
					break;
				}

				case BlurTool:
						fltrPix.push_back(pt);							//add pixel to be filtered to container
					break;

				case PaintTool:
					if (_selpix)
							pattern->data->data[pt] = _selpix;
					break;

				default:
					break;
			}
		}
	}
	
	
	if(fltrPix.size() > 0) {
		int newArea = pattern->height() * pattern->width();
		if (_area != newArea) {
			_area = newArea;
			fltr->updateFilterVars(pattern->width(), pattern->height());
		}
		fltr->blurPixels(fltrPix, pattern->data->data, pattern->width() * pattern->height(), 
			pattern->width(), pattern->height()); 
		fltrPix.clear();												//clear container
	}
}
[\code]    
[edited by - dnagcat on September 23, 2003 2:16:35 AM]

Share this post


Link to post
Share on other sites
Advertisement
i dont really know about pens and stuff but just after quick look at the code , i see you do alot of divisions by scale

multiplication is faster so just do

oneoverscale=1/scale;

then just multiply that number instead of dividing by scale

also in this code

if (largeBrushAction->isOn()) { radius = 12.0 / scale; radSq = radius * radius; } else if (medBrushAction->isOn()) { radius = 9.0 / scale; radSq = radius * radius; } else if (smallBrushAction->isOn()) { radius = 6.0 / scale; radSq = radius * radius; }

[\code]

i can see that you will allways do radius*radius , so just take it out of the if statement , just will look cleaner.

but the big problems is optimizing blindly , the stuff i just told you may not improve the speed much or may improve alot, you have to check what slows down this function first before trying to optimize it , use something like intel vtune, or just write you own timer function and run it on different parts of this function to see in what lines the cpu spends most of the time and then try to optimize those lines

Share this post


Link to post
Share on other sites
Since I don't know how big those boxH and boxW typically are, I can't estimate how much time is spent in the loop and how much in the initialization. So I can't say if you should do more precalculations to make the loop faster or optimize precalculations somehow. But I'll assume the loop takes most of the time.

(1)
((float)d <= maxd && (float)dx <= maxdx && (float)dy <= maxdy

Make maxd, maxdx etc. integers so you don't need to convert stuff into floats all the time.

(2)

for (...) {
switch (activePen()){
case X:
break;
}
}

Worth trying:

switch (activePen()){
case X:
for (...) {
}
break;
}

Probably not..

(3)
center.y(), activePen()

Are these function calls costly? If so, make a temporary variable to hold the value.

(4)
pt = (((btly) * patW) + (btlx));

Get rid of the multiplication by using an offset that is increased by patW as btly increases by one.

(5) (important)
It seems like these MUST hold for every loop:
(0 <= btlx < patW) and (0 <= btly < patH)

Clip the loop area so that these hold always, and get rid of the tests from inside the loop. This will make your code shorter AND faster.

In fact, the first part of the loop is just testing if we're inside an ok shape. You should clip your loop area properly and get rid of the first switch (switch(activePen())) altogether.

It seems like LargeRound, etc. "Round" aren't rectangular shapes, so you could make some extra code to handle circular shapes efficiently. Now you need two multiplications for every dot, whereas with this formula:
sqrt(r*r-x*x)
You'd need to calculate that only once per row. Actually half times per row, as the bottom half is the same. And use look up tables to gain more.

Interestingly, if you clip the loop area correctly, you won't even need to calculate dx and dy inside the loop anymore.

(6)
pattern->data->data[pt]

Triple indirection indicates use for a temporary variable. So that would be

tmpdata[pt]


(7)
Yes. I was bored..

[edited by - civguy on September 23, 2003 2:51:06 AM]

Share this post


Link to post
Share on other sites
the function is fairly long. you might be taking a cache hit. try breaking it up into 2 or 3 smaller functions and profile. might end up being a hell of a lot quicker if it is in fact not all fitting into cache.

gonen's advice about inverting the denominator and multiplying by that will help. i see a lot of divide by "scale"'s in there.

oh and if you're going to post code then use the source tags.

[edited by - MelvinElvin on September 23, 2003 3:07:50 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by MelvinElvin
the function is fairly long. you might be taking a cache hit. try breaking it up into 2 or 3 smaller functions and profile. might end up being a hell of a lot quicker if it is in fact not all fitting into cache
Dividing a long sequentically executed function into smaller ones won''t gain you anything. The same instructions have to be fed to the processor anyway.

Share this post


Link to post
Share on other sites
quote:
Original post by MelvinElvin
gonen''s advice about inverting the denominator and multiplying by that will help. i see a lot of divide by "scale"''s in there.
Yet, division by scale is evaluated only at most 6 times in the function, so I doubt changing those to multiplications would be noticeable.

Share this post


Link to post
Share on other sites
quote:
Original post by sjelkjd
PROFILE!!!! Otherwise, you''re only guessing at where the slow parts of your code are.
But isn''t guessing all the more fun?

Share this post


Link to post
Share on other sites
Thanks alot for all the advice... especially civguy!

reducing all the divisions to only 1: scl = 1/scale, then multiplying by scl, and converting all the "/ 2" to "* 0.5" had a big improvement on performance...

[edited by - dnagcat on September 23, 2003 11:12:31 AM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
instead of divide/multiple by 2 you can allways use >>/<< which is <more> faster.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!