Archived

This topic is now archived and is closed to further replies.

AndyTX

Wave Effect Optimization

Recommended Posts

Hello all. Recently I have been programming a wave simulation. I''ve spent many hours optimizing and testing various algorithms but I have pretty much reached the limit of what I can do with the code optimization-wise. The main routine that takes the time is the updating of the wave map. The rendering routine also takes some time but it is probably about as optimized as it is going to get. With that said, here is the (relitively short) routine that I need to optimize even more if possible.
  
int **CurMap = WaveMaps[CurWaveMap];  // Update it to write on the next map


    // Create some row pointers for every row in teh pattern to eek every last

    // ounce of performance out of the computer (avoid [][] referencing)

    int *Row1, *Row2, *Row3, *Row4, *Row5, *CurWrite;

    // Create a mask pointer to avoid referencing [] as much as possible

    bool *MaskPtr;

    // Go through all the values but leaving a 2 pixel border around the edge

    // because that is the size on each side of our area filter

    int CurrentValue;
    int x, y;
    for (y = 2; y < WaveMapHeight - 2; y ++)
    {
        // Set them up to be pointing to the correct column

        Row1 = CurMap[y-2] + 2;
        Row2 = CurMap[y-1] + 2;
        Row3 = CurMap[y] + 2;
        Row4 = CurMap[y+1] + 2;
        Row5 = CurMap[y+2] + 2;
        CurWrite = WaveMaps[!CurWaveMap][y] + 2;

        // Set up the mask ptr

        MaskPtr = Mask[y] + 2;

        for (x = 2; x < WaveMapWidth - 2; x ++)
        {
            // Only do this pixel if it is FALSE in the mask

            if (*MaskPtr)
            {
                // Add up the values from the pointers

                CurrentValue = (*Row1 + *Row2 + *(Row2 - 1) + *(Row2 + 1) +
                    *(Row3 - 2) + *(Row3 - 1) + *(Row3 + 1) + *(Row3 + 2) +
                    *Row4 + *(Row4 - 1) + *(Row4 + 1) + *Row5) / 6 - *CurWrite;

                // Now work out the final wave height

                CurrentValue -= (CurrentValue >> WaveDampening);

                // Finally, write it to the NEW wave map

                *CurWrite = CurrentValue;
            }

            // Update the row pointers

            Row1 ++;
            Row2 ++;
            Row3 ++;
            Row4 ++;
            Row5 ++;
            CurWrite ++;

            // And update the mask pointer

            MaskPtr ++;
        }
    }

    // Lastly, swap the wave maps so that next time we do it on the opposite ones

    CurWaveMap = !CurWaveMap;
  
Just for information, there are two wave maps, each having WaveMapHeight pointers pointing to arrays of ints, size WaveMapWidth (a normal multipointer 2D array). I tried using linear memory and still maintaing row pointers but using linear memory for the stepping through, etc. however it made no discernable difference. The Mask map is the same (2D row pointer array). So basically if anyone can help me with any optimization it would be excellent. A full 640x480x16bit screen covered with waves all updated and rendered runs at 35fps on my Pentium 4 1.4GHz (on my friend''s P4 1.9GHz at 65fps!) and at about 10-15fps on any sub-Pentium4 machine that I have seen (although no P4 optimizations are used...) If there is any way that I could speed this up by making global algorithm changes or minor variable changes or by using MMX or SSE(2) that would be excellent! 35fps is not too bad, but I would like to get it running at that on the average fast-P3 class machine. As my knowledge of MMX is limited I couldn''t find a way to use it to help but if anyone else could that would be great. Thank you very much for any and all suggestions. I have to admit I know of no way to make this any faster! Thanks again.

Share this post


Link to post
Share on other sites

Looks like you are doing a fair bit calculations in the inner loop. You might consider doing the offset calculations before the inner loop. But this depends on how often your if (*MaskPtr) is accessed. So I don''t know if that would speed things up or not. I just like optimizing by cleaning up inner loop stuff.

Guy

Share this post


Link to post
Share on other sites
Yeah one of the problems is how many calculations there are in the very inner loop, especially the one line adding all of the row pointers.

However, I can''t move any of it out of that loop since it has to be recalculated for all of the pixels (hence the y and x loops). Although it isn''t evident since I moved the original offsets into incremented Row pointers, each calculation is unique each time through the loop, since it operates on different pixels. I also once tried to take advantage of the fact that on rows 2, 3 and 4 most there are duplicated values in the addition, so I''d save the sum and minus off the lost ones and add teh gained ones (kinda hard to pixture, sorry). It didn''t help though and the speed increase, if any, was minimal.

Share this post


Link to post
Share on other sites
Ok, this is just an idea that first came to my mind. It may or may not work. It might be worth a try though.

You could do a space sub-division to your wave-array. Then, if there are no updates needed (==no waves created) on certain areas you can skip them fast. The obvius problem here is that you need to track effectively where the changes will happen. This can be done in a function which you use to make changes (initial wave-points) to your arrays. If you do enough sub-division you can track the areas where no updates will occur. Also, since you have a non-wavepoint mask there, you can use the sub-division to skip those pixels in larger batches as well.


Hope this helps.

--BerLan

"Anyhow, I''m just a stupid foreigner so i might be wrong (or my english is just so p00py you can''t understand me)."

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
This probably won''t make much difference as your compiler might do it anyway, but you could move the WaveMapWidth - 2 calculation outside of the comparison.

Share this post


Link to post
Share on other sites
Wow! Thanks for the quick replies guys. Thanks for the tip on moving the WaveMapWidth outside of the comparison, I guess that just slipped my mind

No problem on not seeing how the algorithm worked first time. I am the first to admit that all this optimization has a way of obfiscating code... it was so nice and understandable before!

Also the subdivisions are a very interesting idea. In this case, they would probably add more overhead than they are worth since most of the screen gets covered quickly. However, they would allow for a very large surface at almost no cost (unless it was all wavy!) It might just work, I will look into that. The only problem that I forsee is that the actual wave algorithm could move a wave from one "division" to the next (rippling outwards) so this would somehow have to be tracked at low CPU penalty... Very fascinating idea though, thanks.

Thanks to everyone who has replied so far. Does anyone out there see a good way I could do multiple pixels at a time using MMX or SIMD instructions or anything? I looked briefly but like I said, my MMX knowledge is severely limited (almost non-existant). Thanks again.

Edited by - AndyTX on February 5, 2002 6:24:18 PM

Share this post


Link to post
Share on other sites