• ### Popular Now

• 10
• 10
• 12
• 12
• 14

#### Archived

This topic is now archived and is closed to further replies.

# Optimize this

This topic is 6078 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Im working on a split-only ROAM implementation. Im having some problems with framerates though and i pretty much narrowed the trouble down to this function. Its WAY too slow and is taking up most of my running time. The function traverses a tree which is built up each frame. The tree is also different each frame which means i cant just put the whole thing in a displaylist or vertex array. I think maybe the recursion is taking up alot of time but i really dont know enough about the mechanisms behind to tell.
  void Patch::RenderR(TriNode *Tri,int LeftX, int RightX, int TopX, int LeftY, int RightY, int TopY) { if (Tri->LeftChild) { int CenterX = (int)((LeftX + RightX) / 2); int CenterY = (int)((LeftY + RightY) / 2); RenderR(Tri->LeftChild, TopX, LeftX, CenterX, TopY, LeftY, CenterY); RenderR(Tri->RightChild, RightX, TopX, CenterX, RightY, TopY, CenterY); } else { TrisRenderedTotal++; int LeftZ = HeightMap[(LeftY * MAP_SIZE)+LeftX]; int RightZ = HeightMap[(RightY * MAP_SIZE)+RightX]; int TopZ = HeightMap[(TopY * MAP_SIZE)+TopX ]; float p = (float)MAP_SIZE; float l = (float)PATCH_SIZE/4.0; int lX = WorldX+LeftX; int lY = WorldY+LeftY; int rX = WorldX+RightX; int rY = WorldY+RightY; int tX = WorldX+TopX; int tY = WorldY+TopY; glMultiTexCoord2fARB(GL_TEXTURE0_ARB , lX/p, lY/p); glMultiTexCoord2fARB(GL_TEXTURE1_ARB , lX/l, (float)lY/l); glVertex3f((GLfloat)LeftX, (GLfloat) LeftZ, (GLfloat) LeftY); glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,rX/p, rY/p); glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,rX/l, rY/l); glVertex3f((GLfloat) RightX,(GLfloat) RightZ,(GLfloat) RightY); glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,tX/p, (float)tY/p); glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,tX/l, (float)tY/l); glVertex3f((GLfloat) TopX, (GLfloat) TopZ, (GLfloat) TopY); } } 
If you got any ideas, any at all, please let me know. I dont mind major changes so if im missing some general optimization techniques in ROAM let me know as well.

##### Share on other sites
you''re doing pretty many float/int conversions, i dont know anything about ROAM though ...

##### Share on other sites
const your parameters and calculated numbers possible

##### Share on other sites
Okay, this is a very rough first look based on some general optimisation principles. I would check the algorithm if it is taking too long, perhaps that isn''t quite right. A B-Tree is always going to be O(n) to traverse it completely and process every node.

1. Check your MAP_SIZE constant. If possible, make your mapsize a power of 2 and then you can bit shift instead. that will speed up those calcs to start with.

2. Do you need to work in floats? You seem to be working with Ints and then cast MAP_SIZE to a float and divide an int by what is effectively a floated int?? Do you need to do this?

3. l is PATCH_SIZE/4.0, does this change? If not, cache it at the start and also try and make it a power of 2. Does it need to be a float?

3. If you can keep them as ints, use bit shifting again to mimimise your divides through the last part. You don''t re-use anything in each operation as they are all different but you can minimise the number of ops to do.

4. The casting to GLFloat probably isn''t helping either particularly as this is a full re-shuffle of the bits.

Not sure if this has helped at all.

-- That''s my $0.02 worth -- Hang on, where I come from,$0.02 is rounded down. Does that mean my opinion is worthless or priceless?

CHROM

##### Share on other sites
Off the top of my head;

1) Remove the glMultiTexCoord2fARB() and glVertex() functions but keep the work they''re doing (the float division). You want to know how much time is spent doing algorithmic vs gl work.

2) Make the function a loop with a stack preallocated to the depth you''ll need. This will eliminate the function overhead. The STL has a stack you can use.

3) Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children. Check out http://www.longbowdigitalarts.com/ubb/Forum3/HTML/000016.html for details.

4) Try to work with either ints or floats. You''re doing a bunch of conversions in the glVertex() and glMultiTexCoord2fARB() functions.

5) Choose PATCH_SIZE such that PATCH_SIZE/4.0 is a power of two to allow faster division. Same with MAP_SIZE.

6) Const you function. For example, make LeftZ, RightZ, and TopZ const. Any additional information that you give the compiler helps it make better optmization decisions.

7) Eliminate the temporaries for variables p and l.

Good luck.

##### Share on other sites
Thanks for all your replies. At least im learning a lot . Ive changed some of the things you suggested.

  void Patch::RenderR(TriNode *Tri, const int LeftX, const int RightX, const int TopX, const int LeftY, const int RightY, const int TopY){ if (Tri->LeftChild) { int CenterX = (LeftX + RightX) >> 1; int CenterY = (LeftY + RightY) >> 1; RenderR(Tri->LeftChild, TopX, LeftX, CenterX, TopY, LeftY, CenterY); RenderR(Tri->RightChild, RightX, TopX, CenterX, RightY, TopY, CenterY); } else { TrisRenderedTotal++; int LeftZ = HeightMap[(LeftY * MAP_SIZE)+LeftX]; int RightZ = HeightMap[(RightY * MAP_SIZE)+RightX]; int TopZ = HeightMap[(TopY * MAP_SIZE)+TopX ]; float lX = (float)WorldX+LeftX; float lY = (float)WorldY+LeftY; float rX = (float)WorldX+RightX; float rY = (float)WorldY+RightY; float tX = (float)WorldX+TopX; float tY = (float)WorldY+TopY; glMultiTexCoord2fARB(GL_TEXTURE0_ARB , lX / BASE_SIZE, lY / BASE_SIZE); glMultiTexCoord2fARB(GL_TEXTURE1_ARB , lX / DETAIL_SIZE, lY / DETAIL_SIZE); glVertex3i(LeftX, LeftZ, LeftY); glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,rX / BASE_SIZE, rY / BASE_SIZE); glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,rX / DETAIL_SIZE, rY / DETAIL_SIZE); glVertex3i(RightX, RightZ, RightY); glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,tX / BASE_SIZE, tY / BASE_SIZE); glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,tX / DETAIL_SIZE, tY / DETAIL_SIZE); glVertex3i(TopX, TopZ, TopY); }}

Ive elimminated the variables l and p and replaced them with constants instead. The CenterX, CenterY no longer needs conversion thanks to bit-shifting. The list lx,ly,rx etc is converted to float at calculation only. I need to work in floats with these variables since they are used for texture mapping. The function parametres are now consts. And finally the glvertex() calls now use ints instead of floats.

Still got some questions i hope you will take the time to answer.

quote:

Make the function a loop with a stack preallocated to the depth you'll need. This will eliminate the function overhead. The STL has a stack you can use.

Could you point me to a place where this is described in more detail. sounds interesting.

quote:

Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children.

Im already using implicit binary trees other places in the program. I will still have to send at least an int along to let the function know what tree-node i wish to access right ?. If thats the case then why is it faster than sending a pointer instead ?

And again, thanks for all your help, appreciate it .

Edited by - NewDeal on September 4, 2001 12:41:59 PM

##### Share on other sites
I don''t have an immediate reference for the stack. You basically want something like:

struct Params
{ TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
}

##### Share on other sites
Check out "Essentials of Programming Languages", Friedman, Wand and Haynes. The work is in Scheme, but there is information for turning recursion into a stack-loop. (The method has a name, but I can''t remember as its been too long.)

Of the top of my head, its something like:

struct Params
{
TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
};

stack myStack;
bool doPop = false;

while (!doPoP || !myStack.empty())
{
if (doPop)
{
// Reached leaf node.

// Extract saved params from stack
TriNode* Tri = myStack.top().Tri;
int LeftX = myStack.top().LeftX;
...

// Pop stack
myStack.pop();

// Do work.
...
}

// Assume next item is a leaf
doPop = true;

if (Tri->LeftChild)
{
// Node has children.

// Push params onto stack.
myStack.push(Params(Tri, LeftX, TopX, LeftY, RightY, TopY);

// Set stack indicating that the top isn''t a leaf node.
doPop = false;
}
}

You''ll probably want to use references and try to avoid
extra copying. However, if you use an stl stack, be sure
you''re done using the data before you pop if using references.
(There may be consequences based on the stl implementation.)

Implicit Binary Trees:
You are correct, you still have to pass along the location to reference the child. You probably already know this, but the savings are storage itself (no explicit pointers) and quicker access to the children (mathmatical instead of following pointers). Also, with your stucture taking less memory, you reduce your probabilty of cache hits.

Don''t forget to post back to tell us how you made out.

##### Share on other sites
Thanks AP, appreciate it

##### Share on other sites
what u have there is gonna be slow.
the biggest problem is u are calculating one tri at a time and then drawing it.

try this ( u will need to rewrite/replan your code ).

*calculate ALL the triangles in the scene.
*discard the ones that aint visable
*build a list of the visable triangles
*use vertex arrays to draw those.

using the above method will increase the speed from 3-30x (not percent but times)