Back to General and Gameplay Programming

D-loop : New method for optimizing loop

d-loop · 2004-03-24T04:21:12

Hi, http://www.onversity.com/load/d-loop.pdf If you have any questions, feel free to post them here .

General and Gameplay Programming Programming

Started by d-loop March 22, 2004 09:56 AM

24 comments, last by d-loop 20 years ago

d-loop

122

Author

March 23, 2004 12:27 PM

quote:Original post by Tramboi
quote:
That''s right. This is why you have a complet math analyse.

Agreed but I''m not sure it is really relevant to what happens in a CPU with the I$/D$ miches, out of order execution, pipelines stalls, pipeline flushes and so on

quote:
Could you justify all this with VTune on your platform and hardware counters?

Already done. I must say I''m not sure I really understand your point of view.

Basically I''m trying to see if this optimisation is worth it (if so, I could use it of course ) or if it is just noise in the signal of all other optimisations the compiler and programmer (masks, sentinels, precomputed tables) applies to the routines...
I tend to think you mixed too much things here, but maybe I''m wrong.

I understand

.

A part from the use of the mask technic which is not really a good optimisation in most case, you should be satisfied using D-loop when you know the size of the table and don''t want to use asm. (for example SSE)

Lord Bart

226

March 23, 2004 12:41 PM

I compiled this on a Sun 420R server with 4 processors, 4 Gigs memory running Solairs 8 on it, using Sun 6.1 workshop ANSI C++ compiler.

Lord Bart

d-loop

122

Author

March 23, 2004 03:30 PM

quote:Original post by Lord Bart
I compiled this on a Sun 420R server with 4 processors, 4 Gigs memory running Solairs 8 on it, using Sun 6.1 workshop ANSI C++ compiler.

Lord Bart

I just tried your version. It''s slower than the funciton I give (20%). In fact I have to rework it for intel''s compiler. It don''t like things like :

while(char_mask[*(++str1)]==0)

but that

while(char_mask[*(++str1)]==0)
{
}

Well not all compiler like d-loop code. for exemple VC6 don''t like it (don''t now about vc7.1 but I heard that MS did a good job on optimising ALU instructions).

Lord Bart

226

March 23, 2004 05:20 PM

quote:Original post by d-loop
quote:Original post by Lord Bart
I compiled this on a Sun 420R server with 4 processors, 4 Gigs memory running Solairs 8 on it, using Sun 6.1 workshop ANSI C++ compiler.

Lord Bart

I just tried your version. It's slower than the funciton I give (20%). In fact I have to rework it for intel's compiler. It don't like things like :

while(char_mask[*(++str1)]==0)

but that

while(char_mask[*(++str1)]==0)
{
}

Well not all compiler like d-loop code. for exemple VC6 don't like it (don't now about vc7.1 but I heard that MS did a good job on optimising ALU instructions).

Yeap I forgot to put the {} brackets for the while loop, sorry.

Strange it slow down on Intel.

Inc pointer and defer should be faster then asignment of defer to a char and then increment pointer.

your first loop
while(char_mask[car]==0)
{
car = (unsigned char*)*str1++; // deref, asign, inc
}

my first loop
while(char_mask[*(++str1)]==0) // inc, deref no asign.
{
}

And st3 should work out to *(st3+i) which if you replace with *(++st3) for your check should be faster since you inc the pointer instead of a pointer addition. yours while((unsigned char)st3==(unsigned char)str2) { i++; // extra inc of var need for defer above. } mine while(*(++st3)==*(++st2)) { //no need for i inc pointer instead and deref } My code keeps four things in use for main part: str1, str2, st2, st3 all of which are pointers. and are most like kept in registers. Your code has five: car, str1, str2, st3, and i use in the main part. but it should also keep in registers, maybe, not sure on Intel box? But I believe that compiler and processor differences matter most here. Sun processor has lots of registers, not sure how many pipelines. Maybe I track down the gcc compiler on my box and compile it with gcc. But I won't get to it until Thurdays. Also need to figure away I can see the asm code from the Sun compiler. Any way it sped thing up on the Sun <img src="smile.gif" width=15 height=15 align=middle>, but slow things on Intel. <img src="sad.gif" width=15 height=15 align=middle> Well anyway nice little paper by the way. <img src="smile.gif" width=15 height=15 align=middle> Lord Bart <img src="smile.gif" width=15 height=15 align=middle> [edited by - lord bart on March 23, 2004 6:22:02 PM] [edited by - lord bart on March 23, 2004 6:23:22 PM]

psamty10

148

March 23, 2004 06:58 PM

OMFG... dost thou not believe that those kinds of microoptimizations are best left in the hands of the compiler writer.
The code is a little too unreadable for me, Im afraid.

d-loop

122

Author

March 24, 2004 04:21 AM

quote:Original post by Tramboi
By the way,

0d :
cmp ebx,edx
jnz 14 :
add eax,01h
14 :
movzx ebx,BYTE PTR [ecx+01h]
add ecx,01h
test ecx,01h >> ebx?
jnz 0d :

seems wrong...

I''ve checked and

test ecx,01h

should be replace by

test ebx,ebx.

Thx Tramboi

D-loop : New method for optimizing loop

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

D-loop : New method for optimizing loop

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines