Archived

This topic is now archived and is closed to further replies.

compumatrix

Drawing anti-aliased lines and why my function is so slow

Recommended Posts

compumatrix    122
I am using SDL to write a 2D game. I have a function to draw anti-aliased lines. I am using the Wu method for anti-aliased lines with the error adjust and error accumulator and such. When I profile my code using gprof this function takes over 80% of the execution time to draw 4 lines. However, when I comment out the accesses to the screen (the actual plotting of the pixels) this percentage drops to just 3% - 4% of execution time. This large difference between the two makes me wonder what I am doing wrong, because if the bottleneck for drawing a line is the actual plottin of the line on the screen then what is they point in having all these highly optimized methods to draw lines with. Since all these methods to draw lines exist I guess that I must be over-looking something. For the statistics I provided, I am using a Uint16 pointer to the pixel data and I am using an int to store the current index in the pixel data where a pixel is to be drawn. This index variable is calculated during the loop by adding either the minor axis increment or the major axis increment to it when it needs to move allong those axises. I have tried adding these major and minor axis increments to the actual pointer instead of using an index variable and this makes little difference in the results. I am accessing the pixel data when I write to it by doing p[index] = color; I get similar results on other computers I've tested this on but at least one requires me to use a Uint8 pointer instead of a Uint16 one and to access the pixel data in the following manner: *((Uint16*)(p+index)) = color; I found it very odd that I would need to access the pixel data in this manner for the code to work correctly on this particular PC. If anyone has any experience with this and has any ideas as to what I could be doing wrong, please share your thoughts, your help is much appreciated. The code for the method that works to draw the lines correctly on all the computers I have tested it on is below:
//Draw Loop for 16bpp
void CBeamGraphic::DrawLoop16()
{
    SDL_Surface *lp_screen = IGraphics::S_GetScreenSurface();
    Uint16 error_accumulator = 0;
    Uint16 error_accumulator_prev = 0;
    Uint8 *p=NULL;
    Uint16 color = SDL_MapRGB(mlpVideoInfo->vfmt, mRed&0xFF, mGreen&0xFF, mBlue&0xFF);
    int index=0;

    //lock the screen
    SDL_LockSurface(lp_screen);

    //set p so that it is at (mX1,mY1)
    p = (Uint8*)lp_screen->pixels;
    index = mY1 * mPitch + mX1 * mBytesPerPixel;

    //draw it if it is horizontal or vertical
    if (mDeltaY == 0 || mDeltaX == 0)
    {
        for (int cur_pixel = 0; cur_pixel <= mNumPixels; ++cur_pixel)
        {
            *((Uint16*)(p+index)) = color;

            index += mMajorInc;
        }
    }
    //draw it if it is not horizontal or vertical
    else
    {
        for (int cur_pixel = 0; cur_pixel <= mNumPixels; ++cur_pixel)
        {
            //draw pixels at the current position and weighted pixels at the adjacent pixel positions
            *((Uint16*)(p+index)) = SDL_MapRGB(mlpVideoInfo->vfmt, 
                                               mRed & ((error_accumulator>>8)^0xFFFF), 
                                               mGreen & ((error_accumulator>>8)^0xFFFF), 
                                               mBlue & ((error_accumulator>>8)^0xFFFF));

            *((Uint16*)(p+index+mMinorInc)) = color;

            *((Uint16*)(p+index+mMinorInc+mMinorInc)) = SDL_MapRGB(mlpVideoInfo->vfmt, 
                                                                   mRed & (error_accumulator>>8), 
                                                                   mGreen & (error_accumulator>>8), 
                                                                   mBlue & (error_accumulator>>8));

            error_accumulator_prev = error_accumulator;
            error_accumulator += mErrorAdjust;
            
            if (error_accumulator < error_accumulator_prev)
            {
                index +=mMinorInc;
            }
            index += mMajorInc;
        }
    }

    //unlock the screen
    SDL_UnlockSurface(lp_screen);
};
   
[edited by - compumatrix on August 12, 2003 10:29:22 PM] [edited by - compumatrix on August 12, 2003 10:43:19 PM] [edited by - compumatrix on August 12, 2003 10:45:51 PM]

Share this post


Link to post
Share on other sites
Wyrframe    2426

1) Why is "p" a UINT8*, but you repeatedly cast it to a UINT16*? Why not use a UINT16* to boot?

2) Why are you repeatedly dividing by 256 (>> 8), instead of precalculating?

3) An innocent but required question; Do these lines actually show up correctly, if slowly?

4) Why are you adding P to index every time, instead of initially assigning "index = P + mY1 * mPitch + mX1 * mBytesPerPixel;" before the loop start and just incrementing index like you already do?

5) I''m betting the REAL bottleneck is in SDL_MapRGB. Inline a dedicated R8G8B8 -> R5G5B5/R5G6B5 converter to get that extra speed!



// Draw Loop for 16bpp: revisions by Wyrframe

void CBeamGraphic::DrawLoop16() {
SDL_Surface *lp_screen = IGraphics::S_GetScreenSurface();
Uint16 error_accumulator = 0;
Uint16 error_accumulator_prev = 0;
Uint16 *index;
Uint16 color = SDL_MapRGB(mlpVideoInfo->vfmt, mRed & 0xFF, mGreen & 0xFF, mBlue & 0xFF);

// WYRFRAME''s REVISION: A temporary variable of the same type as "error_accumulator".

ERR_ACC_TYPE errAccTmp;

// WYRFRAME''s REVISION: If needed, update these adds which are apparently prepped for

// 8-bit alignment, and store the temporaries here. I''m guessing on the values I''ve

// assigned to them here...

INC_VAR_TYPE tMinorInc, tMajorInc;
tMinorInc = mMinorInc >> 1;
tMajorInc = mMajorInc >> 1;

//lock the screen

SDL_LockSurface(lp_screen);

//set p so that it is at (mX1,mY1)

// WYRFRAME''S REVISION: You already know it is 2 bytes per pixel, and thus 1/2 as many pixels wide as bytes wide.

index = ((Uint16*)lp_screen->pixels) + (mY1 * mPitch >> 1) + (mX1 * 2);

if (mDeltaY == 0 || mDeltaX == 0) {
//draw it if it is horizontal or vertical

for (int cur_pixel = 0; cur_pixel <= mNumPixels; ++cur_pixel) {
*index = color;
index += tMajorInc;
}

} else {
//draw it if it is not horizontal or vertical

for (int cur_pixel = 0; cur_pixel <= mNumPixels; ++cur_pixel) {
//draw pixels at the current position and weighted pixels at the adjacent pixel positions


// WYRFRAME''S REVISION: Use of temporary variable.

errAccTmp = ((error_accumulator >> 8) ^ 0xFFFF);
*(index) = SDL_MapRGB(mlpVideoInfo->vfmt, mRed & errAccTmp, mGreen & errAccTmp, mBlue & errAccTmp );

*(index+tMinorInc) = color;

// WYRFRAME''S REVISION: Use of temporary variable.

errAccTmp = (error_accumulator >> 8);
*(index+tMinorInc+tMinorInc) = SDL_MapRGB(mlpVideoInfo->vfmt, mRed & errAccTmp, mGreen & errAccTmp, mBlue & errAccTmp );

error_accumulator_prev = error_accumulator;
error_accumulator += mErrorAdjust;

if (error_accumulator < error_accumulator_prev) {
index +=tMinorInc;
}

index += tMajorInc;
}
}

//unlock the screen

SDL_UnlockSurface(lp_screen);
}

Share this post


Link to post
Share on other sites
compumatrix    122
Thanks ver much for your help.

>>1) Why is "p" a UINT8*, but you repeatedly cast it to a UINT16*? Why not use
>>a UINT16* to boot?

I had tried that but it wasn''t working on that other computer. I have figured out why though. I was setting mPitch to the width of the surface in the init function of the class if it was 16 bpp in order to test using the Uint16* pointer without modifying the minor and major increment values in the draw function. I was cheating to test it and I got caught it seems

>>2) Why are you repeatedly dividing by 256 (>> 8), instead of precalculating?

Good point.

>>3) An innocent but required question; Do these lines actually show up >>correctly, if slowly?

Yes.

>>4) Why are you adding P to index every time, instead of initially assigning
>>"index = P + mY1 * mPitch + mX1 * mBytesPerPixel;" before the loop start
>>and just incrementing index like you already do?

I had tried using a Uint16* pointer and incrementing it instead of using an index. Then I was messing around with different methods in an attempt to figure out why it wasn''t running right on that other computer.

>>5) I''m betting the REAL bottleneck is in SDL_MapRGB. Inline a dedicated
>>R8G8B8 -> R5G5B5/R5G6B5 converter to get that extra speed!

I had thought of this before also. If I comment out the use of SDL_MapRGB() and instead assign the precalculated color value to those pixels I get the same results (when profiling the program for the same amount of time) as I do when using SDL_MapRGB. I keep trying it, because it makes sense for that to be the bottleneck but I keep getting the same results.

Well at least now I know why it wasn''t running correctly on that other computer.

Currently I am using the code taht you posted with one modification, I have changed one line. From this:
index = ((Uint16*)lp_screen->pixels) + (mY1 * mPitch >> 1) + (mX1 * 2);
to this:
index = ((Uint16*)lp_screen->pixels) + (mY1 * mPitch >> 1) + mX1;

If you (or anyone else) have any more ideas, please share them.

Thanks again.

Share this post


Link to post
Share on other sites
TravisWells    276
I doubt SDL_MapRGB is going to make a big difference, it''s a very simple function (a branch, 4 shifts, and 4 ORs).

Probably the problem is you''re using hardware surfaces.
Have you tried testing the speed when you''re drawing to a software backbuffer?

Share this post


Link to post
Share on other sites