Archived

This topic is now archived and is closed to further replies.

tcs

Optimization ideas ?

Recommended Posts

tcs    122
I want to optimize this piece of code, it takes far to long to execute... Just wanted to hear your ideas
    
	// Fill the texture with a combination of all landscape textures. Make a height based

	// per-texel choice of the source texture

	for (i=0; i<iTexSize; i++)
	{
		for (j=0; j<iTexSize; j++)
		{
			// Update the progress window

			CProgressWindow::SetProgress((unsigned int) ((i * (float) iTexSize + j) / 
				((float) iTexSize * (float) iTexSize) * 100.0f));

			// Calculate the "average" texture index

			fTexIndex = pHeight->m_Array<i>[j] / 255.0f * iMaxTexture;

			// Calculate the two textures that are blended together

			iHighTex = (int) ceil(fTexIndex);
			iLowTex = (int) floor(fTexIndex);

			// Don''t allow that we exceed the maximum texture count

			if (iHighTex > iMaxTexture - 1)
				iHighTex = iMaxTexture - 1;
			if (iLowTex > iMaxTexture - 1)
				iLowTex = iMaxTexture - 1;

			// Calculate the weights of each texture

			fHighTexWeight = fTexIndex - (float) floor(fTexIndex);
			fLowTexWeight = (float) ceil(fTexIndex) - fTexIndex;

			// Neccessary to avoid black textures when we directly hit a

			// texture index

			if (fHighTexWeight == 0.0f && fLowTexWeight == 0.0f)
			{
				fHighTexWeight = 0.5f;
				fLowTexWeight = 0.5f;
			}
		
			// Calculate the texel offset in the lower texture array

			iIndex = (int) ((j % cTGATextures[iLowTex].GetImageWidth()) * 
				cTGATextures[iLowTex].GetImageWidth() +
				(i % cTGATextures[iLowTex].GetImageHeight())) * 3;

			// Add the lower texture

			cTexel[0] = (unsigned char) 
				(cTGATextures[iLowTex].GetImageData()[iIndex + 0] * fLowTexWeight);
			cTexel[1] = (unsigned char) 
				(cTGATextures[iLowTex].GetImageData()[iIndex + 1] * fLowTexWeight);
			cTexel[2] = (unsigned char) 
				(cTGATextures[iLowTex].GetImageData()[iIndex + 2] * fLowTexWeight);

			// Calculate the texel offset in the higher texture array

			iIndex = (int) ((j % cTGATextures[iHighTex].GetImageWidth()) * 
				cTGATextures[iHighTex].GetImageWidth() +
				(i % cTGATextures[iHighTex].GetImageHeight())) * 3;

			// Add the higher texture

			cTexel[0] += (unsigned char) 
				(cTGATextures[iHighTex].GetImageData()[iIndex + 0] * fHighTexWeight);
			cTexel[1] += (unsigned char) 
				(cTGATextures[iHighTex].GetImageData()[iIndex + 1] * fHighTexWeight);
			cTexel[2] += (unsigned char) 
				(cTGATextures[iHighTex].GetImageData()[iIndex + 2] * fHighTexWeight);

			// Copy the texel to its destination

			memcpy(&pTextureData[(j * iTexSize + i) * 3], cTexel, 3);
		}
	}
    
Tim -------------------------- glvelocity.gamedev.net www.gamedev.net/hosted/glvelocity

Share this post


Link to post
Share on other sites
Jumpster    122
quote:

fTexIndex = pHeight->m_Array[j] / 255.0f * iMaxTexture;


Instead of dividing by 255 * iMax... couldnt you define a constant or something? Like change it to:

const float idiv255 = 1/255;
fTexIndex = pHeight->m_Array[i][j] * idiv255 * iMaxTexture;

quote:

(j % cTGATextures[iHighTex].GetImageHeight()))
(i % cTGATextures[iHighTex].GetImageHeight()))

and

(j % cTGATextures[iLowTex].GetImageHeight()))
(i % cTGATextures[iLowTex].GetImageHeight()))



Here I notice that you are calculating the same base mod twice. I realize that both instances have further arithmatic applied but couldn't it be calc'd once before either further math is applied?

Regards,
Jumpster

P.S. Optimization is not really my forte' but these were what I seen in the brief look that I took. I am not sure if either suggestion would benefit your speed anyway. So, take this as you see fit. Just my suggestions.




Edited by - Jumpster on October 29, 2000 10:56:29 PM

Share this post


Link to post
Share on other sites
karmalaa    122
Since you are calling floor() and ceil() several times on the same variable, I think you can save some calls by storing the "floored" and "ceiled" value just once and cast it on the fly (not needed though because of coercion).

Hope it helps.


Karmalaa

Share this post


Link to post
Share on other sites
Jumpster    122
After taking a closer look at it this morning, here is a better suggestion that I have come up with.

        
// Fill the texture with a combination of all landscape textures. Make a height based

// per-texel choice of the source texture

int iMaxTexSub1 = iMaxTexture - 1; // Reduce this calculation to just one time...

float iDiv255 = (1 / 255.0f) * iMaxTexture;
float iTexSizeSq = 1 / (iTexSize * iTexSize); // Change division to multiplication....


for (i=0; i<iTexSize; i++)
{

for (j=0; j<iTexSize; j++)
{
// Calculate the "average" texture index

fTexIndex = pHeight->m_Array<i>[j] * iDiv255;

// Reduce the number of times this is called to just once each pass...

float Ceiling = ceil(fTexIndex);
float Flooring = floor(fTexIndex);

// Calculate the two textures that are blended together

iHighTex = (int) Ceiling;
iLowTex = (int) Flooring;

// Don't allow that we exceed the maximum texture count

if (iHighTex > iMaxTexture - 1) iHighTex = iMaxTexSub1;
if (iLowTex > iMaxTexture - 1) iLowTex = iMaxTexSub1;

// Calculate the weights of each texture

fHighTexWeight = fTexIndex - Flooring;
fLowTexWeight = Ceiling - fTexIndex;

// Neccessary to avoid black textures when we directly hit a

// texture index

if (fHighTexWeight == 0.0f && fLowTexWeight == 0.0f)
{
fHighTexWeight = 0.5f;
fLowTexWeight = 0.5f;
}

// Calculate the texel offset in the lower texture array

int iLowWidth = cTGATextures[iLowTex].GetImageWidth();
iIndex = (int) ((j % iLowWidth) * iLowWidth + (i % cTGATextures[iLowTex].GetImageHeight())) * 3;

// Add the lower texture

// Replace three calls with just one call and save the value...

<imageDataType> imageData = cTGATextures[iLowTex].GetImageData();
cTexel[0] = (unsigned char) (imageData[iIndex + 0] * fLowTexWeight);
cTexel[1] = (unsigned char) (imageData[iIndex + 1] * fLowTexWeight);
cTexel[2] = (unsigned char) (imageData[iIndex + 2] * fLowTexWeight);

// Calculate the texel offset in the higher texture array

int iHighWidth = cTGATextures[iHighTex].GetImageWidth();
iIndex = (int) ((j % iHighWidth) * iHighWidth + (i % cTGATextures[iHighTex].GetImageHeight())) * 3;

// Add the higher texture

// Replace three calls with just one call and save the value...

<imageDataType> imageData = cTGATextures[iHighTex].GetImageData();
cTexel[0] += (unsigned char) (imageData[iIndex + 0] * fHighTexWeight);
cTexel[1] += (unsigned char) (imageData[iIndex + 1] * fHighTexWeight);
cTexel[2] += (unsigned char) (imageData[iIndex + 2] * fHighTexWeight);

// Copy the texel to its destination

memcpy(&pTextureData[(j * iTexSize + i) * 3], cTexel, 3);
}

// Update the progress window only every iTexSize intervals - in DOS this would have been VERY slow for every pass.

// I imagine it would be here too so I moved the Update to outside the 2nd loop...

CProgressWindow::SetProgress((unsigned int) ((((i * iTexSize) + j) * iTexSizeSq) * 100.0f) ); // Replace div with mul

}



First off, assuming a map with of 255 * 255 (0's inclusive) we have an original count of 65,536 calls to:

1) SetProgress() - reduced to 255 or rougly every .4%

2)(65,536 * 6 = 393,216) cTGAGetTextures[xxxxxxx].GetImageData() - reduced to 131,072

3)(65,536 * 2 = 131,072) calls to each ceil()/Floor - reduced to 65,536 each.

4)131,072 calcs of: iMaxTexture - 1. Reduced to 1-time.
5)65,536 divisions potentially reduced to one division and the remaining as muls.


Items 4 and 5 may be more of a liability than an asset due to readability but given the other improvements, you may want to replace those area affected by numbers 4 and 5 with the original version of the code.

Obviously I have not been able to test it - and if you notice I have a declaration in there twice that needs replaced with real type definition - but I am sure that with the numbers presented, you would notice a significant increase in speed. Some quirks may need to be worked out since I have not been able to test it, I can't garuantee it will work the first time. It was meant to give you an idea of what I see could use assistance.

Of course, if these methods are defined as inline (whether explicitely or through compiler optimization) then only the SetProgress(), ceil() and floor() changes will affect your speed. All others would be inconsiquential.


Regards,
Jumpster


Edited by - Jumpster on October 30, 2000 8:16:28 AM

Share this post


Link to post
Share on other sites
tcs    122
Thany for your time and effort ! I already looked into this thing with the progress bar, how stupid ;-) But your other suggestions also look cool, I''ll try them !

Thanx


Tim

--------------------------
glvelocity.gamedev.net
www.gamedev.net/hosted/glvelocity

Share this post


Link to post
Share on other sites
tcs    122
Sure ! And let this thread be running in an DCOM out-process server that runs on another machine ! And don''t let us use a variable, let''s store it into a SQL database on a dedicated UNIX system...

What advantage do you have from your approach, except that your code is slower, more complex and the status bar doesn''t get any updates because the main thread is busy. Oh sure, we could use syncronisation functions to keep the status bar updated.... but why would anyone write 10 pages of code, when a single line does the same more efficient ? This thread was about optimisation, not about how to use multithreading where it just reduces responsibility and slows down...


Tim

--------------------------
glvelocity.gamedev.net
www.gamedev.net/hosted/glvelocity

Share this post


Link to post
Share on other sites
Jumpster    122
        

// Old version...

cTexel[0] = (unsigned char) (imageData[iIndex + 0] * fLowTexWeight);
cTexel[1] = (unsigned char) (imageData[iIndex + 1] * fLowTexWeight);
cTexel[2] = (unsigned char) (imageData[iIndex + 2] * fLowTexWeight);

// New version...

cTexel[0] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);
cTexel[1] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);
cTexel[2] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);


I am not sure, but I believe this will also help you out a little-bit. If I am not mistaking, the asm code for
iIndex+0 equates to something like:


mov eax, [iIndex]
add eax, 0
...
mov eax, [iIndex]
add eax, 1
...
mov eax, [iIndex]
add eax, 2
...


It seems to me, although I have not checked this out yet, that the iIndex++ would translate to:


inc [iIndex]
...
inc [iIndex]
...
inc [iIndex]


and provide the same results. Again, I am not sure, this is what I think seems like would happen.

At any rate, it doesn't hurt to try it, right?

Regards,
Jumpster



Edited by - Jumpster on November 1, 2000 12:23:54 PM

Share this post


Link to post
Share on other sites
Stoffel    250
tcs:
Oh man, do you not know what you're talking about.

An inefficiency in his code was calls to
CProgressWindow::setProgress. I was suggesting a different
design to eliminate that call. Therefore my discussion WAS
about optimization.

10 pages of code? Pass me some of that crack, man.

Do you want to process windows messages while processor-
intensive threads run? If you're a good programmer, you do.
Therefore, this should already be multi-threaded.


MyApp::OnBuildTextures
{
DWORD threadId; // I never use this, but it's needed in win95
m_pTexThread = CreateThread (NULL, 0, buildTexProc, this, 0,
&threadId);
}


Add a single function to CProgressWindow, which you can access
through MyApp (and therefore the app pointer in the thread
process:

void CProgressWindow::setVar (float* pVar, float min = 0.0,
float max = 0.0)
{
// note, you do need a critical section when changing setVar
// since your OnTimer command may access it.
// OnTimer, not shown here, also needs to grab this CS)
EnterCriticalSection (m_csProgress);
if (pVar == NULL)
{
// setting pVar to NULL tells the progress window to stop
// reading the variable
KillTimer (m_hwnd, m_timerId);
}
else
{
// CProgressWindow reads *m_pVar in its OnTimer, calculates
// the distance *m_pVar lies between min and max, and updates
// the progress bar accordingly
m_pVar = pVar;
m_min = min;
m_max = max;
}
LeaveCriticalSection (m_csProgress);
}


Finally, place the algorithm in buildTexProc, add a couple of
lines to make this work:

DWORD WINAPI buildTexProc (PVOID pParam);
{
MyApp& app = *((MyApp*) pParam); // cast parameter
float progress = 0.0;
app.getProgressWindow ()->setVar (&progress, 0.0,
iTexSize * iTexSize);

// start of algorithm
for (int i=0; i {
for (int j=0; j {
progress += 1.0; // wow, less calculations here, too
//...rest of algorithm
}
} // end of algorithm

// progress will go out of scope soon, so let CProgressWindow
// know about it
app->getProgressWindow ()->setVar (NULL);

// let message loop know this thread is done
PostMessage (app.m_hwnd, WM_BUILD_TEX_DONE, 0, 0);
}


HOLY CARP THAT WAS COMPLETELY DIFFICULT AND TOOK 10+ PAGES AND
NOW EVERYTHINGS COMPLETELY INEFFICIENT AND....and....and,.. oh
wait, I guess threads aren't that bad, eh? I guarantee this
will give you absolute 0 performance hit as well (might save
something since you're only doing the calculation on progress
once every time you update, not once every time through the
loop). And now your code is much more modular. And if you have
multiprocessors, the texture-building thread might run on a
completely different processor in parallel with the message-
processing loop.

Learn threads. They can help.

--edit--reformatted since line-breaks are screwed up in this
discussion thread

Edited by - Stoffel on November 1, 2000 1:56:03 PM

Share this post


Link to post
Share on other sites
JonStelly    127
Christ, I''m glad I read the thread before trying to get a better grasp on the code and give suggestions. I wouldn''t DREAM of helping TCS after seeing him blindly attack Stoffel like that.

I hate nothing more than someone who asks for help and then bitches when other people don''t do all of the work for you. Stoffel''s suggestion DOES have merrit, but TCS just bit his head off because he didn''t understand it.

Cya.

Share this post


Link to post
Share on other sites
Ok, sorry to say this but you are completley wrong here.

quote:
Original post by Jumpster

            

// Old version...

cTexel[0] = (unsigned char) (imageData[iIndex + 0] * fLowTexWeight);
cTexel[1] = (unsigned char) (imageData[iIndex + 1] * fLowTexWeight);
cTexel[2] = (unsigned char) (imageData[iIndex + 2] * fLowTexWeight);

// New version...

cTexel[0] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);
cTexel[1] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);
cTexel[2] = (unsigned char) (imageData[iIndex++] * fLowTexWeight);


I am not sure, but I believe this will also help you out a little-bit. If I am not mistaking, the asm code for
iIndex+0 equates to something like:


mov eax, [iIndex]
add eax, 0
...
mov eax, [iIndex]
add eax, 1
...
mov eax, [iIndex]
add eax, 2
...


It seems to me, although I have not checked this out yet, that the iIndex++ would translate to:


inc [iIndex]
...
inc [iIndex]
...
inc [iIndex]


and provide the same results. Again, I am not sure, this is what I think seems like would happen.

At any rate, it doesn''t hurt to try it, right?



The first one will translate inte statements of the type:

mov esi,imageData
mov edi,cTexel
mov eax,[esi]
mov ebx,[esi+1]
...
mov [edi],eax
mov [edi+1],ebx
...

thats kinda fast and a good compiler will get it to fill both pentium pipes (note string instructions can be used to do this but!!! they are slower) (also when moving bytes probably it will move them in 32 bit chunks if possible)

whereas the second variant would go something like:
mov esi,imageData
mov edi,cTexel
mov eax,[esi]
inc esi
mov [edi],eax
inc edi
...repeat...
uh oh! this aint good... added complexety harder to get to fill both pipes holy macaroni this backfired.
note tho, the offset or index can sometimes add one cycle to the addressing count and that is the same cost that that of inc reg but actially when you know the number of items beeing moved the first version is actually better.

cheers!

Share this post


Link to post
Share on other sites
Jumpster    122
quote:
Original post by DigitalDelusion

Ok, sorry to say this but you are completley wrong here.




Uh... Ok. So I am wrong. It''s a good thing I didn''t *insist* that it would be faster. I just thought that is what would happen. Thanks for the clarification.

Regards,
Jumpster

Share this post


Link to post
Share on other sites
LilBudyWizer    491
I won''t guarantee it is right, but it is at least close.

    
for (i = 0; i < iTexSize; i++)
{
fTexIndex = pHeight->m_Array<i>[j] / 255.0f * iMaxTexture;

iHighTex = (int) ceil(fTexIndex);
iLowTex = (int) floor(fTexIndex);

fLowTexWeight = iHighTex - fTexIndex;
fHighTexWeight = fTexIndex - iLowTex;

if (iHighTex > iMaxTexture - 1)
iHighTex = iMaxTexture - 1;

if (iLowTex > iMaxTexture - 1)
iLowTex = iMaxTexture - 1;

if (fHighTexWeight == 0.0f && fLowTexWeight == 0.0f)
{
fHighTexWeight = 0.5f;
fLowTexWeight = 0.5f;
}

<TextureDataType> *ptTextureData = pTextureData + 3;

for (j = 0; j < iTexSize; j++)
{
cTGATexture *ptLowTex = cTGATexture + iLowTex,
*ptHighTex = cTGATexture + iHighTex;

iWidth = ptLowTex->GetImageWidth();
iHeight = ptLowTex->GetImageHeight();

iLowIndex = (int) ((j % iWidth) * iWidth + (i % iHeight)) << 1;
iLowIndex += iLowIndex;

iWidth = ptHighTex->GetImageWidth();
iHeight = ptHighTex->GetImageHeight();

iHighIndex = (int) ((j % iWidth) * iWidth + (i % iHeight)) << 1;
iHighIndex += iHighIndex;

<ImageDataType> *ptLowData = ptLowTex->GetImageData() + iLowIndex,
*ptHighData = ptHighTex->GetImageData() + iHighIndex;

*(ptTextureData++) = (unsigned char)((*(ptLowData++) * fLowTexWeight) +
(*(ptHighData++) * fHighTexWeight));
*(ptTextureData++) = (unsigned char)((*(ptLowData++) * fLowTexWeight) +
(*(ptHighData++) * fHighTexWeight));
*(ptTextureData++) = (unsigned char)((*(ptLowData++) * fLowTexWeight) +
(*(ptHighData++) * fHighTexWeight));

ptTextureData += iTexSize * 3;
}
}

Share this post


Link to post
Share on other sites
LilBudyWizer    491
Ok, so it wasn''t that close. Aside from some potential differances in rounding and truncation there is the small problem of fTexIndex being calculated off j, not i so moving it out of the inner loop isn''t valid. Considering the number of calculations based off j I would be tempted to move it to the outter loop and loop on i in the inner loop. I believe the only error then would be how ptTextureData is stepped. Well, aside from any other errors I missed, but hey, when you take advice off the internet...

Share this post


Link to post
Share on other sites
ok, ive made a honest try but not having the full source kinda gave me a headach cuz the impossibility of compiling it and see if i broke it or not. So this is higly speculative.
But something like this would work, i guess...

    
// Fill the texture with a combination of all landscape textures. Make a height based

// per-texel choice of the source texture

//ok,pass one move everything thats not j dependant out of the inner loop.
//pass two move everything non i dependant out of the second loop.


float iTex2 = 1.0 / iTexSize * iTexSize * 100.0f;
int cTGAiLowTexTexWidth = cTGATextures[iLowTex].GetImageWidth();
int iHighTexWidth = cTGATextures[iHighTex].GetImageWidth();
unsigned char *iLowTexData = cTGATextures[iLowTex].GetImageData();//i assume this return a pointer.

unsigned char *iHighTexData = cTGATextures[iHighTex].GetImageData();

for (i=0; i<iTexSize; i++)
{
int cTGATexThings = cTGAiLowTexTexWidth + (i % cTGATextures[iLowTex].GetImageHeight()); //wtf is this?

int iHighTexThing = i % iHighTexWidthHeight;


// Calculate the two textures that are blended together

iHighTex = (int) ceil(fTexIndex);
iLowTex = (int) floor(fTexIndex);

// Don''t allow that we exceed the maximum texture count

if (iHighTex > iMaxTexture - 1)
iHighTex = iMaxTexture - 1;
if (iLowTex > iMaxTexture - 1)
iLowTex = iMaxTexture - 1;

// Calculate the weights of each texture

fHighTexWeight = fTexIndex - (float) floor(fTexIndex);
fLowTexWeight = (float) ceil(fTexIndex) - fTexIndex;

// Neccessary to avoid black textures when we directly hit a

// texture index

if (fHighTexWeight == 0.0f && fLowTexWeight == 0.0f)
{
fHighTexWeight = 0.5f;
fLowTexWeight = 0.5f;
}

for (j=0; j<iTexSize; j++)
{
// Update the progress window

CProgressWindow::SetProgress((unsigned int) ((i * (float) iTexSize + j) * iTex2


// Calculate the "average" texture index

fTexIndex = pHeight->m_Array<i>[j] / 255.0f * iMaxTexture;

// Calculate the texel offset in the lower texture array

iIndex = (int) ((j % cTGAiLowTexTexWidth) * cTGATexThings) * 3;

// Add the lower texture

cTexel[0] = (unsigned char)(iLowTexData[iIndex + 0] * fLowTexWeight);
cTexel[1] = (unsigned char)(iLowTexData[iIndex + 1] * fLowTexWeight);
cTexel[2] = (unsigned char)(iLowTexData[iIndex + 2] * fLowTexWeight);

// Calculate the texel offset in the higher texture array

iIndex = (int) ((j % iHighTexWidthWidth) * iHighTexWidthWidth + iHighTexThing) * 3;

// Add the higher texture

cTexel[0] += (unsigned char)(iHighTexData[iIndex + 0] * fHighTexWeight);
cTexel[1] += (unsigned char)(iHighTexData[iIndex + 1] * fHighTexWeight);
cTexel[2] += (unsigned char)(iHighTexData[iIndex + 2] * fHighTexWeight);

// Copy the texel to its destination

memcpy(&pTextureData[(j * iTexSize + i) * 3], cTexel, 3);
}
}



Hope i got that atleast half right
as they say two wrongs doesn''t make one right it usually takes three or more

Share this post


Link to post
Share on other sites
quote:
Original post by Jumpster

Uh... Ok. So I am wrong. It''s a good thing I didn''t *insist* that it would be faster. I just thought that is what would happen. Thanks for the clarification.

Regards,
Jumpster



Uhm, im not really sure if this is sarcasm in the air or what
Im truly sorry if you feel like i tried to stomp you, that really wasn''t the intention.

cheers!

Share this post


Link to post
Share on other sites
Serge K    154
How about this:

    

inline int __stdcall _round( float x )
{
int t;
__asm fld x
__asm fistp t
return t;
}
// float -> BYTE

inline BYTE __stdcall _round_u8( float x )
{
float t = x + (float)0xC00000;
return *(BYTE*)&t;
}
// floor for (x >= 0) && (x < 2^31)
inline int __stdcall _floor_u( float x )
{
DWORD e = (0x7F + 31) - ((*(DWORD*)&x & 0x7F800000) >> 23);
DWORD m = 0x80000000 | (*(DWORD*)&x << 8);
return (m >> e) & -(e<32);
}

//=================================================================


float fTexIndexScale = (iMaxTexture-1) / 255.0f;
BYTE *pDst = pTextureData; // pTextureData[(j * iTexSize + i) * 3]

for(int j=0; j<iTexSize; j++)
{
for(int i=0; i<iTexSize; i++)
{
// Calculate the "average" texture index

float fTexIndex = pHeight->m_Array[ i ][ j ] * fTexIndexScale;

// Calculate the two textures that are blended together

int iLowTex = _floor_u(fTexIndex);
// or this one (choose the fastest):
// int iLowTex = _round( fTexIndex - 0.5f );

float fHighTexWeight = fTexIndex - iLowTex;

// Don''t allow that we exceed the maximum texture count

int iOverflowMask = -(DWORD(iLowTex) < DWORD(iMaxTexture));
// VC6.0 can optimize it into (in pseudo-asm)
// cmp iLowTex, iMaxTexture
// sbb eax, eax // iOverflowMask = eax = (DWORD(iLowTex) < DWORD(iMaxTexture)) ? -1 : 0;

*(int*)&fHighTexWeight &= iOverflowMask; // if(iOverflowMask) fHighTexWeight = 0.0f;
iLowTex -= 1 + iOverflowMask; // if(!iOverflowMask) iLowTex--;

int iHighTex = iLowTex + 1;

int iLowTexWidth = cTGATextures[iLowTex ].GetImageWidth ();
int iLowTexHeight = cTGATextures[iLowTex ].GetImageHeight();
int iHighTexWidth = cTGATextures[iHighTex].GetImageWidth ();
int iHighTexHeight = cTGATextures[iHighTex].GetImageHeight();

// Calculate the texel offset in the lower texture array

// Calculate the texel offset in the higher texture array


// for positive integer numbers:

// if b==2^n -> a%b == a&(b-1)
int iIndexL = (j & (iLowTexWidth - 1)) * iLowTexWidth + (i & (iLowTexHeight - 1));
int iIndexH = (j & (iHighTexWidth - 1)) * iHighTexWidth + (i & (iHighTexHeight - 1));
iIndexL += 2*iIndexL;
iIndexH += 2*iIndexH;

BYTE *pLowTexData = cTGATextures[iLowTex ].GetImageData() + iIndexL;
BYTE *pHighTexData = cTGATextures[iHighTex].GetImageData() + iIndexH;

// Copy the texel to its destination


int iLow0 = pLowTexData[0];
int iLow1 = pLowTexData[1];
int iLow2 = pLowTexData[2];

// _round() or _round_u8() (choose the fastest)
pDst[0] = iLow0 + _round(fHighTexWeight * (pHighTexData[0] - iLow0));
pDst[1] = iLow0 + _round(fHighTexWeight * (pHighTexData[1] - iLow1));
pDst[2] = iLow0 + _round(fHighTexWeight * (pHighTexData[2] - iLow2));
pDst += 3;
}
}

Share this post


Link to post
Share on other sites
Orpheum    122
Im not going to try and come up with an optimized algorithm, enough people have thrown in their $0.02 on that. What really jumped out at me from looking at the code was that there was a lot of casting going on. I dont know what kind of perf hit youre taking by doing this, but it seems to me that casting would introduce a lot of extra processing. If youre going through this loop 65025 times, that adds up! My suggestion would be to overload your functions to give you the data type you need rather than casting each iteration.

Share this post


Link to post
Share on other sites
tcs    122
Thax for all your help guys ! Casting was a BIG issue in all my routines ! I got this fast float/int casting asm code from nvidia.com, it gave me a 33% speed boost, never knew it was that expensive ;-)

Oh and I''m pretty familar with threads. But if I see that I have to handle critical sections and such stuff, just to update my progress bar... Better just move that crap out of the inner loop and you''re fine, 2048 calls to the bar are not very expensive. I''m really not that "oh it''s MFC, let''s better say its crappy before I try to understand it" guy. I''ve written lots of multithreaded winsock servers on such stuff, but I just think a threat won''t help much. Threads slow down a program, they can just make it more responsible for the user, and better to manage for the programmer. Since we''re talking about a loading screen of a game, there''s no user input to process. Except a cancel button, but that could be handled trough a return value of setprogress. I think I could update my progress bar in the loop.

Oh, and Stoffel: I want to excuse myself for this mindless rant, I just thought it would be somehow funny or whatever. It obiously wan''t, I hope you aren''t upset anymore !

Ok, I know will step trough all your code an try your suggestions, some are very cool!


Tim

--------------------------
glvelocity.gamedev.net
www.gamedev.net/hosted/glvelocity

Share this post


Link to post
Share on other sites
tcs    122
Thanx Serge K !
Your modifications gave a HUGE speed boost ! I learned much from your code, I was even able to speed up most of my other routines


Tim

--------------------------
glvelocity.gamedev.net
www.gamedev.net/hosted/glvelocity

Share this post


Link to post
Share on other sites