Archived

This topic is now archived and is closed to further replies.

speed,speed

This topic is 6345 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

can you help me about for next loop? do you know any loop is faster than for next? thanks for your helps...

Share this post


Link to post
Share on other sites
Loops are pretty speedy in general, that's not what is going to make a difference in your app.

The content of the loop will.

Soo your question should be more about loop optimization than
loop command speed itself i think.

Borland's for/next loop translates pretty lean in asm.

Hope this helps

Gunner.

Edited by - Gunner on August 29, 2000 3:05:56 PM

Share this post


Link to post
Share on other sites
As Gunner said, the loop construct itself is not really an issue when it comes to speed. Unless you are doing a very complicated end-of-loop condition.

The contents of the loop have the greatest effect on the loop execution speed.


  • Precalculate as many values as possible before the loop and store them in variables. Use the variables inside the loop.
  • Unroll loops. The font renderer in the original Quake source is a good example of this. It went something like this...


    for (y = 0; y < 7; y++)
    {
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    (buf++)* = (chardata++)*;
    buf += screenWidth - 8;
    }


    That may not be exactly right, but you should get the idea. Unrolling a loop eliminates a good percentage of the test for the end-of-loop condition.





Steve 'Sly' Williams
Tools Developer
Krome Studios

Share this post


Link to post
Share on other sites
It looked to me that Holy mentioned "for next" loops. These are from basic compiler. If you want something optimized - give up on that compiler, that's the first thing to do ;-)

Edited by - Lifepower on August 30, 2000 9:58:12 AM

Share this post


Link to post
Share on other sites

hi,

i want to change pixels of directdrawsurface , so
i use a for next loop ,but the framerate is not as fast as i wanted.

my code;

surfacedesc.dwsize:=sizeof(tddsurfacedesc);
dxdraw1.surface.isurface4.lock(nil,surfacedesc,ddlock_wait,0);

for y:=0 to 639 do
for x:=0 to 479 do
pword(integer(surfacedesc.lpsurface)+x*2+y*surfacedesc.lpitch)^:=color;

dxdraw1surface.isurface4.unlock(@surfacedesc);


color paremeter comes from 2X2 array that is using an array as 2x2..
When i use arrays for my color ,the framerate is too low...
I do not use of any round,trunc commands in the arrays...
But it is too slow...


thanks for your helps,




Share this post


Link to post
Share on other sites
hi

the standard DXDraw functions for pixel manipulation are MUCH too slow. Download PixelCore from turbo.gamedev.net and use this unit to put pixels on a directdraw surface. The unit is written in pure assembler and it''s very fast.

cya,
tcz

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
From what I''ve seen you actually put the >same< colour across the entire surface??

You can do that LOTS faster then with those double ForLoops, there is some or other DirectX surface clear to a particular color.

>If< for some reason you want to maintain those Loop-structures, then the following could help:

-Use PixelCore. (Even if I advertise my own product... )

-You seem to use 16bit colour? In that case you can do 2 pixels at once like so:

color,col : integer;

col:=color shl 16 or color;
for y:=0 to 479 do
for x:=0 to 319 do
pword(integer(surfacedesc.lpsurface
+x shl 2+y*surfacedesc.lpitch)^:=col;

or with PixelCore:
col:=color shl 16 or color;
for y:=0 to 479 do for x:=0 to 319 do
PutPixel32(x shl 1, y, col);

Something like that should double the speed.

-Unrolled loops, these were mentioned before; you could implement them as follows:

for y:=0 to 479 do for x:=0 to 79 do begin
Pixel(x shl 3,y,color);
Pixel(x shl 3+1,y,color);
Pixel(x shl 3+2,y,color);
..
..
Pixel(x shl 3+7,y,color);
end;

this basically unrolled the loop 8 times. Though in >this< case the benefits are not necessarily tremendous.

-Precalculation, this too was mentioned before, but you could do it like this:

yy : array[0..479] of integer;

for x:=0 to 479 do yy[x]=x*surface.lpitch; //pre-initialized!
.. //later
for y:=0 to 479 do
for x:=0 to 639 do
pword(integer(surfacedesc.lpsurface
+x shl 1+yy[y])^:=color;


= and then ofcoz you can mix all those methods and optimizations into one seriously fast version. So go-on. Stun us all.

A-Lore

Share this post


Link to post
Share on other sites
hi ,

yes,i tried pixelcore library,but it does not seem to be fast
as in my code ,also using lots of putpixel16(x,y,color) does not
change speed of it,cause you see i must change all the pixels of
the screen with the colors that have been calculated and that comes from arrays. I thought that i can use pointers for but
the machine shutdown. if i describe a pointer and then i assign
to it my array ,after that using just one line, assigning dxdraw1.surface pointer to my pointer is not working,so i tried
changing all pixels ,pixel by pixel. This way is very stupid,i know. But how can i change directdrawsurface whole using of arrays or pointers ?


Thanks all of...

Share this post


Link to post
Share on other sites
Hey Holy!

without a more exact description of how your program works - and perhabs more importantly - what you are trying to achieve, we can''t give you specific help.

If you look at the Plasma examples that come with PixelCore, you''ll see that for all pixels a specific color needs to be calculated (real time) involving lots of array references; all that seems to run quite fine.

What speed is your program running at anyway? Are 20fps enough? Waaaay to little? What you got?

- On the pointer and arrays thing... it probably is best if you don''t point any additional pointers to your arrays, as the array name is in itself just a pointer to the array.

using something like:
for y:=0 to 479 do for x:=0 to 639 do
putpixel16(x, y, TheArray[x,y]);

"should" be fine for most purposes. If you absolutely have to improve on this, you can try the following:
write a new (assembler preferably) putpixel procedure, that takes not (x,y) as position, but just (ofs) [the offset, ie the absolute position] of the pixel; you can then use:

for x:=0 to 307199 do putpixel(x,TheArray[x])

which is basically as fast as you can write a non-assembler fullscreen transform.

GoodLuK
A-Lore

Share this post


Link to post
Share on other sites
You can also make a radical thing like make a loop basead on PutPixel of PixelCore, without calculating X and Y for each single pixel.
But like lore keeper said, the problem can be other. Check, for example, if you are using doFlip on DXDraw.

Share this post


Link to post
Share on other sites
hi,

my stupid code is below;

for y:=-240 to 240 do
begin
for aci:=1 to 689 do
begin
if (-points_fast[aci,y].y+240>=80) and (-points_fast[aci,y].y+240<=400) then
pword(integer(surfacedesc.lpsurface)+(points_fast[aci,y].x+320)*2+(-points_fast[aci,y].y+240)*surfacedesc.lpitch)^:=imagetable[639-enginetable[round(aci*0.92)],-y+240];
end;
end;

i am working on adventure game engine which has a property of
cylindirical environment.So i calculate all of points ,then
assign them to directx surface pixels...so on,so on....




Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You should use Trunc instead of Round , is alot faster.

Share this post


Link to post
Share on other sites
Has anybody done any research into x++ vs. ++x in situations such as these? I know that conceptualy ++x should be faster, since it doesn''t have to store the value before it increments it. I wonder if the code generated by the following would be any leaner or if today''s compilers are smart enough to optimize it out (please forgive the logic/syntax errors, i haven''t looked at it that carefully):


for (y = 0; y < 7; ++y)
{
(buf)* = (chardata)*;
(++buf)* = (++chardata)*;
(++buf)* = (++chardata)*;
(++buf)* = (++chardata)*;
(++buf)* = (++chardata)*;
(++buf)* = (++chardata)*;
(++buf)* = (++chardata)*;
(++buf)* = (++chardata)*;
++buf; ++chardata;
buf += screenWidth - 8;
}


quote:
Original post by Sly

...


for (y = 0; y < 7; y++)
{
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
(buf++)* = (chardata++)*;
buf += screenWidth - 8;
}


Share this post


Link to post
Share on other sites
I did a "Panorama" viewer for DelphiX where I used VRAM (systemmemory=false) and StrechBLT to wrap a panoramic cylinder around the viewer. On a P2-450 with a Voodoo Banshee it would clock in around 50FPS. The technique should work fine on most hardware...

I also used integer math (when possible) and SIN256 and COS256 for warping calculations. Maybe I should post the source....?

[ side note: Another way to do it for Sphereical "image bubbles" is to use 3D hardware to texturemap the inside of a sphere. Easy enough to do in retained mode... ]





[ turbo | turbo.gamedev.net ]

Share this post


Link to post
Share on other sites
hi turbo,

how can you use strechblt function ?,First i define a cylinder in space coordinates,then calculate all of points that deformated to my camera and then i assign it to an array.But this way is not fast for my Celeron 366.Because the framerate is about 20 and in P166 ,it is 5-6 frame....

if you post the code,i am so happy for.......

and also using retained mode,if you want to make a realistic world visulation,you have define too big sphere and this way is not fast for,creating spherical panorama. Can you know that how Cryo made Atlantis,may be like a same way? but i think ,not.

Share this post


Link to post
Share on other sites
Okay Holy

the For-Loop itself is still optimizable (refering to a prior reply by me (that entire unrolling, precalcing, etc.)

To get the maximum benefit though (without assembler) you should in my opinion try the following.

-Use a Pixel(offset,col) type routine

-enginetable[round(aci*0.92)]
should become:
enginetable[aci]
infact... the entire imagetable thing should also be done in a single precalc.

simply pre-do the forloops once with the original
engine[round..]
and only use engine[aci] (or something) inside the rendering loop

effectively it could look like:
for y:=0 to 100 do begin
lineY:=y*pitch;
for x:=0 to 200 do begin
PutPixel(lineY+x+20, engine[x]);
end;
end;

something like that... as far as I understood your code, you are trying to attempt a screen-transform, which makes the data look like it would, if you''d look at it through a glass cylinder...?

Well... have fun.

Share this post


Link to post
Share on other sites