Optimization ideas ?

Started by
19 comments, last by tcs 23 years, 5 months ago
I want to optimize this piece of code, it takes far to long to execute... Just wanted to hear your ideas
    
	// Fill the texture with a combination of all landscape textures. Make a height based

	// per-texel choice of the source texture

	for (i=0; i<iTexSize; i++)
	{
		for (j=0; j<iTexSize; j++)
		{
			// Update the progress window

			CProgressWindow::SetProgress((unsigned int) ((i * (float) iTexSize + j) / 
				((float) iTexSize * (float) iTexSize) * 100.0f));

			// Calculate the "average" texture index

			fTexIndex = pHeight->m_Array<i>[j] / 255.0f * iMaxTexture;

			// Calculate the two textures that are blended together

			iHighTex = (int) ceil(fTexIndex);
			iLowTex = (int) floor(fTexIndex);

			// Don''t allow that we exceed the maximum texture count

			if (iHighTex > iMaxTexture - 1)
				iHighTex = iMaxTexture - 1;
			if (iLowTex > iMaxTexture - 1)
				iLowTex = iMaxTexture - 1;

			// Calculate the weights of each texture

			fHighTexWeight = fTexIndex - (float) floor(fTexIndex);
			fLowTexWeight = (float) ceil(fTexIndex) - fTexIndex;

			// Neccessary to avoid black textures when we directly hit a

			// texture index

			if (fHighTexWeight == 0.0f && fLowTexWeight == 0.0f)
			{
				fHighTexWeight = 0.5f;
				fLowTexWeight = 0.5f;
			}
		
			// Calculate the texel offset in the lower texture array

			iIndex = (int) ((j % cTGATextures[iLowTex].GetImageWidth()) * 
				cTGATextures[iLowTex].GetImageWidth() +
				(i % cTGATextures[iLowTex].GetImageHeight())) * 3;

			// Add the lower texture

			cTexel[0] = (unsigned char) 
				(cTGATextures[iLowTex].GetImageData()[iIndex + 0] * fLowTexWeight);
			cTexel[1] = (unsigned char) 
				(cTGATextures[iLowTex].GetImageData()[iIndex + 1] * fLowTexWeight);
			cTexel[2] = (unsigned char) 
				(cTGATextures[iLowTex].GetImageData()[iIndex + 2] * fLowTexWeight);

			// Calculate the texel offset in the higher texture array

			iIndex = (int) ((j % cTGATextures[iHighTex].GetImageWidth()) * 
				cTGATextures[iHighTex].GetImageWidth() +
				(i % cTGATextures[iHighTex].GetImageHeight())) * 3;

			// Add the higher texture

			cTexel[0] += (unsigned char) 
				(cTGATextures[iHighTex].GetImageData()[iIndex + 0] * fHighTexWeight);
			cTexel[1] += (unsigned char) 
				(cTGATextures[iHighTex].GetImageData()[iIndex + 1] * fHighTexWeight);
			cTexel[2] += (unsigned char) 
				(cTGATextures[iHighTex].GetImageData()[iIndex + 2] * fHighTexWeight);

			// Copy the texel to its destination

			memcpy(&pTextureData[(j * iTexSize + i) * 3], cTexel, 3);
		}
	}
    
Tim -------------------------- glvelocity.gamedev.net www.gamedev.net/hosted/glvelocity
Tim--------------------------glvelocity.gamedev.netwww.gamedev.net/hosted/glvelocity
Advertisement
quote:
fTexIndex = pHeight->m_Array[j] / 255.0f * iMaxTexture;


Instead of dividing by 255 * iMax... couldnt you define a constant or something? Like change it to:

const float idiv255 = 1/255;
fTexIndex = pHeight->m_Array[j] * idiv255 * iMaxTexture;<br><br><BLOCKQUOTE><font size=1>quote:<hr height=1 noshade><br>(j % cTGATextures[iHighTex].GetImageHeight())) <br>(i % cTGATextures[iHighTex].GetImageHeight())) <br><br>and<br><br>(j % cTGATextures[iLowTex].GetImageHeight())) <br>(i % cTGATextures[iLowTex].GetImageHeight())) <br> <hr height=1 noshade></BLOCKQUOTE></font> <br><br>Here I notice that you are calculating the same base mod twice. I realize that both instances have further arithmatic applied but couldn't it be calc'd once before either further math is applied?<br><br>Regards,<br>Jumpster<br><br>P.S. Optimization is not really my forte' but these were what I seen in the brief look that I took. I am not sure if either suggestion would benefit your speed anyway. So, take this as you see fit. Just my suggestions.<br><br><br><br><br>Edited by - Jumpster on October 29, 2000 10:56:29 PM </i>
Regards,JumpsterSemper Fi
Since you are calling floor() and ceil() several times on the same variable, I think you can save some calls by storing the "floored" and "ceiled" value just once and cast it on the fly (not needed though because of coercion).

Hope it helps.


Karmalaa

---[home page] [[email=karmalaa@inwind.it]e-mail[/email]]
After taking a closer look at it this morning, here is a better suggestion that I have come up with.

        	// Fill the texture with a combination of all landscape textures. Make a height based	// per-texel choice of the source texture  int   iMaxTexSub1 = iMaxTexture - 1;           // Reduce this calculation to just one time...  float iDiv255    = (1 / 255.0f) * iMaxTexture;   float iTexSizeSq = 1 / (iTexSize * iTexSize);  // Change division to multiplication....	for (i=0; i<iTexSize; i++)	{		for (j=0; j<iTexSize; j++)		{			// Calculate the "average" texture index			fTexIndex = pHeight->m_Array<i>[j] * iDiv255;      // Reduce the number of times this is called to just once each pass...      float Ceiling = ceil(fTexIndex);      float Flooring = floor(fTexIndex);			// Calculate the two textures that are blended together			iHighTex = (int) Ceiling;			iLowTex = (int) Flooring;			// Don't allow that we exceed the maximum texture count			if (iHighTex > iMaxTexture - 1) iHighTex = iMaxTexSub1;			if (iLowTex > iMaxTexture - 1) iLowTex = iMaxTexSub1;			// Calculate the weights of each texture			fHighTexWeight = fTexIndex - Flooring;			fLowTexWeight = Ceiling - fTexIndex;			// Neccessary to avoid black textures when we directly hit a			// texture index			if (fHighTexWeight == 0.0f && fLowTexWeight == 0.0f)			{				fHighTexWeight = 0.5f;				fLowTexWeight = 0.5f;			}					// Calculate the texel offset in the lower texture array      int iLowWidth = cTGATextures[iLowTex].GetImageWidth();			iIndex = (int) ((j % iLowWidth) * iLowWidth + (i % cTGATextures[iLowTex].GetImageHeight())) * 3;			// Add the lower texture      // Replace three calls with just one call and save the value...      <imageDataType> imageData = cTGATextures[iLowTex].GetImageData();			cTexel[0] = (unsigned char) (imageData[iIndex + 0] * fLowTexWeight);			cTexel[1] = (unsigned char) (imageData[iIndex + 1] * fLowTexWeight);			cTexel[2] = (unsigned char) (imageData[iIndex + 2] * fLowTexWeight);			// Calculate the texel offset in the higher texture array      int iHighWidth = cTGATextures[iHighTex].GetImageWidth();			iIndex = (int) ((j % iHighWidth) * iHighWidth + (i % cTGATextures[iHighTex].GetImageHeight())) * 3;			// Add the higher texture      // Replace three calls with just one call and save the value...      <imageDataType> imageData = cTGATextures[iHighTex].GetImageData();			cTexel[0] += (unsigned char) (imageData[iIndex + 0] * fHighTexWeight);			cTexel[1] += (unsigned char) (imageData[iIndex + 1] * fHighTexWeight);			cTexel[2] += (unsigned char) (imageData[iIndex + 2] * fHighTexWeight);			// Copy the texel to its destination			memcpy(&pTextureData[(j * iTexSize + i) * 3], cTexel, 3);		}		// Update the progress window only every iTexSize intervals - in DOS this would have been VERY slow for every pass.     // I imagine it would be here too so I moved the Update to outside the 2nd loop...		CProgressWindow::SetProgress((unsigned int) ((((i * iTexSize) + j) * iTexSizeSq) * 100.0f) ); // Replace div with mul}        



First off, assuming a map with of 255 * 255 (0's inclusive) we have an original count of 65,536 calls to:

1) SetProgress() - reduced to 255 or rougly every .4%

2)(65,536 * 6 = 393,216) cTGAGetTextures[xxxxxxx].GetImageData() - reduced to 131,072

3)(65,536 * 2 = 131,072) calls to each ceil()/Floor - reduced to 65,536 each.

4)131,072 calcs of: iMaxTexture - 1. Reduced to 1-time.
5)65,536 divisions potentially reduced to one division and the remaining as muls.


Items 4 and 5 may be more of a liability than an asset due to readability but given the other improvements, you may want to replace those area affected by numbers 4 and 5 with the original version of the code.

Obviously I have not been able to test it - and if you notice I have a declaration in there twice that needs replaced with real type definition - but I am sure that with the numbers presented, you would notice a significant increase in speed. Some quirks may need to be worked out since I have not been able to test it, I can't garuantee it will work the first time. It was meant to give you an idea of what I see could use assistance.

Of course, if these methods are defined as inline (whether explicitely or through compiler optimization) then only the SetProgress(), ceil() and floor() changes will affect your speed. All others would be inconsiquential.


Regards,
Jumpster


Edited by - Jumpster on October 30, 2000 8:16:28 AM
Regards,JumpsterSemper Fi
Thany for your time and effort ! I already looked into this thing with the progress bar, how stupid ;-) But your other suggestions also look cool, I''ll try them !

Thanx


Tim

--------------------------
glvelocity.gamedev.net
www.gamedev.net/hosted/glvelocity
Tim--------------------------glvelocity.gamedev.netwww.gamedev.net/hosted/glvelocity
Personally, I would update the progress via a variable, and have the progress bar updated by a different thread.
Sure ! And let this thread be running in an DCOM out-process server that runs on another machine ! And don''t let us use a variable, let''s store it into a SQL database on a dedicated UNIX system...

What advantage do you have from your approach, except that your code is slower, more complex and the status bar doesn''t get any updates because the main thread is busy. Oh sure, we could use syncronisation functions to keep the status bar updated.... but why would anyone write 10 pages of code, when a single line does the same more efficient ? This thread was about optimisation, not about how to use multithreading where it just reduces responsibility and slows down...


Tim

--------------------------
glvelocity.gamedev.net
www.gamedev.net/hosted/glvelocity
Tim--------------------------glvelocity.gamedev.netwww.gamedev.net/hosted/glvelocity
        // Old version...cTexel[0] = (unsigned char) (imageData[iIndex + 0] * fLowTexWeight);cTexel[1] = (unsigned char) (imageData[iIndex + 1] * fLowTexWeight);cTexel[2] = (unsigned char) (imageData[iIndex + 2] * fLowTexWeight);// New version...cTexel[0] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);cTexel[1] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);cTexel[2] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);        


I am not sure, but I believe this will also help you out a little-bit. If I am not mistaking, the asm code for
iIndex+0 equates to something like:

  mov eax, [iIndex]    add eax, 0  ...  mov eax, [iIndex]    add eax, 1  ...  mov eax, [iIndex]    add eax, 2  ...  


It seems to me, although I have not checked this out yet, that the iIndex++ would translate to:

  inc [iIndex]  ...  inc [iIndex]  ...  inc [iIndex]  


and provide the same results. Again, I am not sure, this is what I think seems like would happen.

At any rate, it doesn't hurt to try it, right?

Regards,
Jumpster



Edited by - Jumpster on November 1, 2000 12:23:54 PM
Regards,JumpsterSemper Fi
tcs:
Oh man, do you not know what you're talking about.

An inefficiency in his code was calls to
CProgressWindow::setProgress. I was suggesting a different
design to eliminate that call. Therefore my discussion WAS
about optimization.

10 pages of code? Pass me some of that crack, man.

Do you want to process windows messages while processor-
intensive threads run? If you're a good programmer, you do.
Therefore, this should already be multi-threaded.

MyApp::OnBuildTextures{  DWORD threadId; // I never use this, but it's needed in win95  m_pTexThread = CreateThread (NULL, 0, buildTexProc, this, 0,    &threadId);}   


Add a single function to CProgressWindow, which you can access
through MyApp (and therefore the app pointer in the thread
process:
void CProgressWindow::setVar (float* pVar, float min = 0.0,                              float max = 0.0){  // note, you do need a critical section when changing setVar  // since your OnTimer command may access it.  // OnTimer, not shown here, also needs to grab this CS)  EnterCriticalSection (m_csProgress);  if (pVar == NULL)  {    // setting pVar to NULL tells the progress window to stop    // reading the variable    KillTimer (m_hwnd, m_timerId);  }  else  {    // CProgressWindow reads *m_pVar in its OnTimer, calculates    // the distance *m_pVar lies between min and max, and updates    // the progress bar accordingly    m_pVar = pVar;    m_min = min;    m_max = max;  }  LeaveCriticalSection (m_csProgress);}   


Finally, place the algorithm in buildTexProc, add a couple of
lines to make this work:
DWORD WINAPI buildTexProc (PVOID pParam);{  MyApp& app = *((MyApp*) pParam); // cast parameter  float progress = 0.0;  app.getProgressWindow ()->setVar (&progress, 0.0,     iTexSize * iTexSize);  // start of algorithm  for (int i=0; i  {    for (int j=0; j    {      progress += 1.0; // wow, less calculations here, too      //...rest of algorithm    }  } // end of algorithm    // progress will go out of scope soon, so let CProgressWindow  // know about it  app->getProgressWindow ()->setVar (NULL);  // let message loop know this thread is done  PostMessage (app.m_hwnd, WM_BUILD_TEX_DONE, 0, 0);}   


HOLY CARP THAT WAS COMPLETELY DIFFICULT AND TOOK 10+ PAGES AND
NOW EVERYTHINGS COMPLETELY INEFFICIENT AND....and....and,.. oh
wait, I guess threads aren't that bad, eh? I guarantee this
will give you absolute 0 performance hit as well (might save
something since you're only doing the calculation on progress
once every time you update, not once every time through the
loop). And now your code is much more modular. And if you have
multiprocessors, the texture-building thread might run on a
completely different processor in parallel with the message-
processing loop.

Learn threads. They can help.

--edit--reformatted since line-breaks are screwed up in this
discussion thread

Edited by - Stoffel on November 1, 2000 1:56:03 PM
Christ, I''m glad I read the thread before trying to get a better grasp on the code and give suggestions. I wouldn''t DREAM of helping TCS after seeing him blindly attack Stoffel like that.

I hate nothing more than someone who asks for help and then bitches when other people don''t do all of the work for you. Stoffel''s suggestion DOES have merrit, but TCS just bit his head off because he didn''t understand it.

Cya.

This topic is closed to new replies.

Advertisement