Jump to content
  • Advertisement
Sign in to follow this  
KevinLi

RGB-HSV use SSE

This topic is 4089 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

__inline void ConvertRGB2HSV(
unsigned char sR,unsigned char sG,unsigned char sB,int *dH,int *dS,int *dV)
{
      float fTemp,fDelta;
      int nMin,nMax;
      nMin = min( sR, min( sG, sB ));
      nMax = max( sR, max( sG, sB ));
      fDelta = nMax - nMin;
      *dV = nMax;
      if(nMax == nMin)
             *dH = *dS = 0;
      else
      {
             fTemp = fDelta / nMax;
             *dS= (int) (fTemp * 255);
             if(sR == nMax)
                   fTemp = (float)(sG - sB) / fDelta;
             else if(sG == nMax)
                   fTemp = 2.0 + ((float)(sB - sR) / fDelta);
             else
                   fTemp = 4.0 + ((float)(sR - sG) / fDelta);
             fTemp *= 60;
             if(fTemp < 0)
                   fTemp += 360;
             if(fTemp == 360)
                   fTemp = 0;
             *dH = (int)fTemp;
      }
}

unsinged char sR,sG,sB;
int dH,dS,dV;
int nMaxH,nMinH,nMaxS,nMinS,nMaxV,nMinV;
nMaxH = nMaxS = nMaxV = 0;
nMinH = nMinS = nMinV = 360;
RGBTRIPLE *pPixel = (RGBTRIPLE *)pSourceBuffer;
//for(int j = 0; j < 576; j++)
//for(int i = 0; i < 768; i++)
int i=0;
while(i++ < 768 * 576)
{
      sR = pPixel->rgbtRed;
      sG = pPixel->rgbtGreen;
      sB = pPixel->rgbtBlue;
      ConvertRGB2HSV( sR, sG, sB, &dH, &dS, &dV);
      if( dH > nMaxH)
            nMaxH = dH;
      if( dH < nMinH)
            nMinH = dH;
      if( dS > nMaxS)
            nMaxS = dS;
      if( dS < nMinS)
            nMinS = dS;
      if( dV > nMaxV)
            nMaxV = dV;
      if( dV < nMinV)
            nMinV = dV;
      pPixel++;
}
ConvertRGB2HSV(...) is too slow.I want to rewrite this function (or the all code)using SSE (C++ code with inline Assembly ,like as: _asm{ movaps xmm1,xmm2 ... }) The compiler is VC6(SP5),support inline assembly with SSE instructions. I want to use SSE,Who can help me?Please!

Share this post


Link to post
Share on other sites
Advertisement
Your algorithm appears to have too many branches to benefit from SIMD reliably. About the only place where you could apply some parallelization is in the final "determine minmax" six comparisons, but these only account for 4% of your execution time anyway. Where did you intend to apply SSE?

My first check would be to determine whether the call is actually inlined (on VC2005, for instance, it isn't).

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Your algorithm appears to have too many branches to benefit from SIMD reliably. About the only place where you could apply some parallelization is in the final "determine minmax" six comparisons, but these only account for 4% of your execution time anyway. Where did you intend to apply SSE?

My first check would be to determine whether the call is actually inlined (on VC2005, for instance, it isn't).


If the compiler is smart, it's not going to be inlined any time soon - the function is far too heavy.

Now, this alorithm is quite difficult to vectorize. However, there are some possible areas of improvement:
* Avoid unnecessary data type changes (and casts). And remember, 2.0 is a double, and 2.0f is a float.
* fDelta shall contain 1,0f / static_cast<float>(nMax - nMin) - you can then get rid of many divisions.

I would rewrite the code like this:
void ConvertRGB2HSV(unsigned char sR,
unsigned char sG,
unsigned char sB,
int *dH,int *dS,int *dV)
{
float fTemp, fDelta;
int nMin, nMax;

nMin = std::min(sR, std::min(sG, SB));
nMax = std::max(sR, std::max(sG, SB));
*dV = nMax;
if (nMin == nMax) {
*dH = *dS = 0;
} else {
// no need to use any float to get dS
*dS = ((nMax - nMin) * 255) / nMax;
// we only compute fDelta here: we incorporate the
// the *60, and we store the inverse of the final value
fDelta = 60.0f / static_cast<float>(nMax - nMin);
// constants also include the *60, so we avoid the final
// multiplication
if(sR == nMax) {
fTemp = static_cast<float>(sG - sB) * fDelta;
} else if(sG == nMax) {
fTemp = 120.0f + static_cast<float>(sB - sR) * fDelta;
} else {
fTemp = 240.0f + static_cast<float>(sR - sG) * fDelta;
}
// cast now - integer comparisons and operations are
// going to be faster than the same operation on floats
*dH = static_cast<int>(fTemp);
if (*dH < 0) *dH += 360;
else if (*dH >= 360) *dH -= 360;
}
}

Now, I won't say you that this version performs incrediby faster than your version. One thing that can be improved is to use a fake float-to-int conversion - since you're dealing with a range that allow you to remove all checks AND you are not using your result anymore.
You can include the small function:
namespace {
// this is, of course, heavily MS oriented. If you are using GCC,
// the syntax will change.
inline int fake_ftol(float v)
{
int retval;
__asm fld a
__asm fistp retval
return retval;
}
}
*dH = static_cast<int>(fTemp);

becomes
*dH = fake_ftol(fTemp);

We are in the realm of micro-optimization, so don't expect a terrible boost.

You can also avoid passing 6 parameters for each function call - just pass a const reference to the RGBTRIPLE and a non-const reference to a HSVTRIPLE:
struct HSVTRIPLE
{
int h, s, v;
}

void ConvertRGB2HSV(const RGBTRIPLE& rgb, HSVTRIPLE& hsv)
{
// ...
}

RGBTRIPLE *pSrc = (RGBTRIPLE *)pSourceBuffer;
RGBTRIPLE *pSrcEnd = pPixel + (768 * 576)
// let's say we're storing the results in an image
HSVTRIPLE *pDestBuffer = new HSVTRIPLE[768*576];
HSVTRIPLE *pDest = pDestBuffer;

while(pSrc < pSrcEnd)
{
ConvertRGB2HSV(*pPixel, *pDest);
// update the max and min values
// ...
++pDestBegin;
++pSrc;
}

You'll avoid as much as 4 push per function call - and that's quite important for a function which is called that often.

Share this post


Link to post
Share on other sites
Quote:
Original post by Emmanuel Deloget
If the compiler is smart, it's not going to be inlined any time soon - the function is far too heavy.


If the compiler managed to determine that the ConvertRGB2HSV function was only called within this code, and was always called exactly once (per loop iteration) when that code was run, then it could inline it to save both time and space.

So, not inlining would be the situation of a compiler that is smart, but not smart enough [wink]

Share this post


Link to post
Share on other sites
Entire algorithm could also be rewritten like this:
void ConvertRGB2HSV(unsigned char sR,
unsigned char sG,
unsigned char sB,
int *dH,int *dS,int *dV)
{
float fTemp, fDelta;
int nMin, nMax;

nMin = std::min(sR, std::min(sG, SB));
nMax = std::max(sR, std::max(sG, SB));
*dV = nMax;
if (nMin == nMax) {
*dH = *dS = 0;
} else {
// no need to use any float to get dS
*dS = ((nMax - nMin) * 255) / nMax;
}
}




Hue does not have minimum or maximum, it's a cyclical value. It's also not independant of other two - the relations between H, S and V are not linear in the way R, G and B are.

Whatever the resulting value ranges will be, they will have no real meaning - increasing single H, S or V value is non-linear to changes in RGB - as such, HSV min/max values are meaningless.

See the definition of hue for why finding min/max for it doesn't make sense.

Given the (17, 55, 18) and (56, 7, 100) H, S and V values, the minimum would be: 17, 7, 18) - which is meaningless color in RGB space. The only thing that would matter would be the last one (18 or 100), but you cannot calculate the values for H and S from that value alone, due to non-linearity.

And for finding brightness, you should use the YUV (YCrCb) transform, which gives meaningful value (from human perspective).

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Quote:
Original post by Emmanuel Deloget
If the compiler is smart, it's not going to be inlined any time soon - the function is far too heavy.


If the compiler managed to determine that the ConvertRGB2HSV function was only called within this code, and was always called exactly once (per loop iteration) when that code was run, then it could inline it to save both time and space.

So, not inlining would be the situation of a compiler that is smart, but not smart enough [wink]


Thou be ruining my expectations!

Thanks god, I'm drunk, so I don't care much [smile]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!